SHTns 3.7
Optimizations implemented in SHTns

SHTns is an implementation of the Spherical Harmonic Transform which aims at being accurate and fast, with direct numerical simulations in mind. As such, the following optimizations are implemented :

Use the Fast-Fourier Transform

Any serious SHT should use the FFT, as it improves accuracy and speed. SHTns uses the FFTW library for the fast Fourier transform, a portable, flexible and blazingly fast FFT implementation.

Take advantage of mirror symmetry

This is also a classical optimization. Due to the defined symmetry of spherical harmonics with respect to a reflection about the equator, one can reduce by a factor 2 the operation count of both direct and reverse transforms.

Polar optimization

A less common, but still classic optimization : high m's spherical harmonics have their magnitude decrease exponentially when approaching the poles. SHTns use a threshold below which the SH is taken to be zero. The default value for this threshold is defined by SHT_DEFAULT_POLAR_OPT, but you can also choose it with the eps parameter of the shtns_set_grid function, trading some accuracy for more speed. Around 5% to 15% speed increase are typical values for a SHT with a large MMAX.

Cache optimizations

Cache optimizations have been carried out throughout the code. This means ordering coefficients and stripping out systematic zeros.

Spatial Size optimization

When using shtns_set_grid_auto, you can let SHTns choose the optimal spatial sizes for the spherical harmonic truncation you specified with shtns_create. The spatial sizes are chosen so that no aliasing occurs, and ensuring FFTW will perform as fast as possible.

Explicit Vectorization

Most x86 CPUs have support for Single-Instruction-Multiple-Data (SIMD) operations in double precision, allowing to perform the same double precision arithmetic operations on a vector of 2 (SSE2) or 4 (AVX) double precision numbers, effectively multiplying by 2 or 4 the computing power. Modern compilers attempt to vectorize the code, but they cannot do miracles. SHTns uses explicit vectorization extensively (thanks to the GCC vector extensions) to use the full computing power of your CPU.

Runtime tuning of algorithm

The default mode used by SHTns is to measure performance of the different algorithms, and choose the one that performs best (it will also check that the accuracy is good enough). However, there are situation where either the Gauss-Legendre algorithm or a regular grid is required, and you can choose to do so using the adequate shtns_type when calling shtns_init or shtns_set_grid.