## System Solution

The approximate solution to the original PDE is obtained by solving a linear system of equations

A*x = b

x = A

^{-1}*b

However, this approach is not practical. The matrix A is sparse, i.e., the fraction of non-zero entries is low. On the other hand, the inverse matrix A

^{-1}is a dense matrix even for the simplest tri-diagonal system of equations. Hence, the large memory requirements severely limit the systems that can be solved in practice. A system of 1 million degrees of freedom (unknowns) would require around 8 terabytes of RAM to store the inverse matrix (1e6^2*8).

There are numerous efficient algorithms that limit the number of arithmetic instructions and bytes of memory required to solve sparse systems of linear equations on modern computers . Possibly the most basic algorithm is the LU factorization – a variant of the Guassian elimination. The system matrix A is represented as a product of two matrices L and U

A = L*U

(L*U)*x = b

x = U

^{-1}*(L

^{-1}*b)

x = U\(L\b)

Compared to the naive approach of computing A^{-1} sparse factorization can only perform better if that LU factors are sparse. MILAMIN renumbers the unknowns and the matrix to minimize the number of non-zero entries in the factor using various fill-reducing reordering techniques. Essentially, the unknowns are renumbered and the matrix is permuted using a permutation vector

A = L*L’

In short, the algorithm for the solution of the global system of linear equations used in MILAMIN consists of four major steps:

- Application of the boundary conditions to A
- Reordering of the unknowns to reduce the number of non-zero entries in the factor (A=A(perm,perm))
- Factorization of the reordered system matrix (A=L*L’)
- Solution of the factorized system using backward-forward substitution (x = L\(L’\b))

We present several approaches to perform the above steps using MATLAB and the MILAMIN software package. Our major interest is the efficiency of the implementation, and the memory consumption. We show how to solve large systems in a reasonable time on modern multi-core shared memory CPU architectures.