4.6 Making ASPECT run faster

When developing ASPECT, we are guided by the principle that the default for all settings should be safe. In particular, this means that you should get errors when something goes wrong, the program should not let you choose an input file parameter so that it doesn’t make any sense, and we should solve the equations to best ability without cutting corners. The goal is that when you start working with ASPECT that we give you the best answer we can. The downside is that this also makes ASPECT run slower than may be possible. This section describes ways of making ASPECT run faster – assuming that you know what you are doing and are making conscious decisions.

4.6.1 Debug vs. optimized mode

Both deal.II and ASPECT by default have a great deal of internal checking to make sure that the code’s state is valid. For example, if you write a new postprocessing plugin (see Section 6.1)) in which you need to access the solution vector, then deal.II’s Vector class will make sure that you are only accessing elements of the vector that actually exist and are available on the current machine if this is a parallel computation. We do so because it turns out that by far the most bugs one introduces in programs are of the kind where one tries to do something that obviously doesn’t make sense (such as accessing vector element 101 when it only has 100 elements). These kinds of bugs are more frequent than implementing a wrong algorithm, but they are fortunately easy to find if you have a sufficient number of assertions in your code. The downside is that assertions cost run time.

As mentioned above, the default is to have all of these assertions in the code to catch those places where we may otherwise silently access invalid memory locations. However, once you have a plugin running and verified that your input file runs without problems, you can switch off all of these checks by switching from debug to optimized mode. This means re-compiling ASPECT and linking against a version of the deal.II library without all of these internal checks. Because this is the first thing you will likely want to do, we have already discussed how to do all of this in Section 4.3.

4.6.2 Adjusting solver tolerances

At the heart of every time step lies the solution of linear systems for the Stokes equations, the temperature field, and possibly for compositional fields. In essence, each of these steps requires us to solve a linear system of the form Ax = b which we do through iterative solvers, i.e., we try to find a sequence of approximations x(k) where x(k) x = A1b. This iteration is terminated at iteration k if the approximation is “close enough” to the exact solution. The solvers we use determine this by testing after every iteration whether the residual, r(k) = A(x x(k)) = b Ax(k), satisfies r(k) εr(0) where ε is called the (relative) tolerance.

Obviously, the smaller we choose ε, the more accurate the approximation x(k) will be. On the other hand, it will also take more iterations and, consequently, more CPU time to reach the stopping criterion with a smaller tolerance. The default value of these tolerances are chosen so that the approximation is typically sufficient. You can make ASPECT run faster if you choose these tolerances larger. The parameters you can adjust are all listed in Section ?? and are located at the top level of the input file. In particular, the parameters you want to look at are Linear solver tolerance, Temperature solver tolerance and Composition solver tolerance.

All this said, it is important to understand the consequences of choosing tolerances larger. In particular, if you choose tolerances too large, then the difference between the exact solution of a linear system x and the approximation x(k) may become so large that you do not get an accurate output of your model any more. A rule of thumb in choosing tolerances is to start with a small value and then increase the tolerance until you come to a point where the output quantities start to change significantly. This is the point where you will want to stop.

4.6.3 Adjusting solver preconditioner tolerances

To solve the Stokes equations it is necessary to lower the condition number of the Stokes matrix by preconditioning it. In ASPECT a right preconditioner Y 1 = A1 ̃A1 ̃BTS1 ̃ 0 S1 ̃ is used to precondition the system, where A1 ̃ is the approximate inverse of the A block and S1 ̃ is the approximate inverse of the Schur complement matrix. Matrix A1 ̃ and S1 ̃ are calculated through a CG solve, which requires a tolerance to be set. In comparison with the solver tolerances of the previous section, these parameters are relatively safe to use, since they only change the preconditioner, but can speed up or slow down solving the Stokes system considerably.

In practice A1 ̃ takes by far the most time to compute, but is also very important in conditioning the system. The accuracy of the computation of A1 ̃ is controlled by the parameter Linear solver A block tolerance which has a default value of 1e 2. Setting this tolerance to a less strict value will result in more outer iterations, since the preconditioner is not as good, but the amount of time to compute A1 ̃ can drop significantly resulting in a reduced total solve time. The cookbook crustal deformation (Section 5.3.8) for example can be computed much faster by setting the Linear solver A block tolerance to 5e 1. The calculation of S1 ̃ is usually much faster and the conditioning of the system is less sensitive to the parameter Linear solver S block tolerance, but for some problems it might be worth it to investigate.

4.6.4 Using lower order elements for the temperature/compositional discretization

The default settings of ASPECT use quadratic finite elements for the velocity. Given that the temperature and compositional fields essentially (up to material parameters) satisfy advection equations of the kind tT + u T = , it seems appropriate to also use quadratic finite element shape functions for the temperature and compositional fields.

However, this is not mandatory. If you do not care about high accuracy in these fields and are mostly interested in the velocity or pressure field, you can select lower-order finite elements in the input file. The polynomial degrees are controlled with the parameters in the discretization section of the input file, see Section ??, in particular by Temperature polynomial degree and Composition polynomial degree.

As with the other parameters discussed above and below, it is worthwhile comparing the results you get with different values of these parameters when making a decision whether you want to save on accuracy in order to reduce compute time. An example of how this choice affects the accuracy you get is discussed in Section 5.2.1.

4.6.5 Limiting postprocessing

ASPECT has a lot of postprocessing capabilities, from generating graphical output to computing average temperatures or temperature fluxes. To see what all is possible, take a look at the List of postprocessors parameter that can be set in the input file, see Section ??.

Many of these postprocessors take a non-negligible amount of time. How much they collectively use can be inferred from the timing report ASPECT prints periodically among its output, see for example the output shown in Section 5.2.1. So, if your computations take too long, consider limiting which postprocessors you run to those you really need. Some postprocessors – for example those that generate graphical output, see Section ?? – also allow you to run them only once every once in a while, rather than at every time step.

4.6.6 Switching off pressure normalization

In most practically relevant cases, the Stokes equations (1)–(2) only determine the pressure up to a constant because only the pressure gradient appears in the equations, not the actual value of it. However, unlike this “mathematical” pressure, we have a very specific notion of the “physical” pressure: namely a well-defined quantity that at the surface of Earth equals the air pressure, which compared to the hydrostatic pressure inside Earth is essentially zero.

As a consequence, the default in ASPECT is to normalize the computed “mathematical” pressure in such a way that either the mean pressure at the surface is zero (where the geometry model describes where the “surface” is, see Section 6.3.3), or that the mean pressure in the domain is zero. This normalization is important if your model describes densities, viscosities and other quantities in dependence of the pressure – because you almost certainly had the “physical” pressure in mind, not some unspecified “mathematical” one. On the other hand, if you have a material model in which the pressure does not enter, then you don’t need to normalize the pressure at all – simply go with whatever the solver provides. In that case, you can switch off pressure normalization by looking at the Pressure normalization parameter at the top level of the input file, see Section ??.

4.6.7 Regularizing models with large coefficient variation

Models with large jumps in viscosity and other coefficients present significant challenges to both discretizations and solvers. In particular, they can lead to very long solver times. Section 5.2.8 presents parameters that can help regularize models and these typically also include significant improvements in run-time.

4.6.8 Using multithreading

In most cases using as many MPI processes as possible is the optimal parallelization strategy for ASPECT models, but if you are limited by the amount of MPI communication it can be beneficial to use multiple threads per MPI process. While not utilized by our linear solvers, this parallelization can speed up the assembly of the system matrices, e.g. by around 10-15% if you utilize unused logical cores, or nearly linearly if you use otherwise unused physical cores. This can also reduce the performance cost if you are memory limited and need to run your model on less than the available number of cores per node on a cluster to increase the available memory per core. Running with for example two threads per process will offset some of the performance loss you will see in these situations.

Multithreading is controlled by setting the command line parameter -j or --threads. If the parameter is not set, ASPECT will create exactly one thread per MPI process, i.e. multithreading is disabled. Appending the parameter allows ASPECT to spawn several threads per MPI process. Note that the internally used TBB library will determine the number of threads based on the number of available cores, i.e., if you start 2 MPI processes on a quadcore machine with hyperthreading (8 logical cores), ASPECT will spawn 4 threads on each MPI process. Also note that there is no guarantee that the final number of threads will exactly match the number of available logical cores if you start with a number of processes that is not a divisor of your logical cores (e.g. 3 MPI processes for 8 logical cores).