SHTns 3.7
|
SHTns is an implementation of the Spherical Harmonic Transform which aims at being accurate and fast, with direct numerical simulations in mind. As such, the following optimizations are implemented :
Any serious SHT should use the FFT, as it improves accuracy and speed. SHTns uses the FFTW library for the fast Fourier transform, a portable, flexible and blazingly fast FFT implementation.
This is also a classical optimization. Due to the defined symmetry of spherical harmonics with respect to a reflection about the equator, one can reduce by a factor 2 the operation count of both direct and reverse transforms.
A less common, but still classic optimization : high m's spherical harmonics have their magnitude decrease exponentially when approaching the poles. SHTns use a threshold below which the SH is taken to be zero. The default value for this threshold is defined by SHT_DEFAULT_POLAR_OPT, but you can also choose it with the eps
parameter of the shtns_set_grid function, trading some accuracy for more speed. Around 5% to 15% speed increase are typical values for a SHT with a large MMAX.
Cache optimizations have been carried out throughout the code. This means ordering coefficients and stripping out systematic zeros.
When using shtns_set_grid_auto, you can let SHTns choose the optimal spatial sizes for the spherical harmonic truncation you specified with shtns_create. The spatial sizes are chosen so that no aliasing occurs, and ensuring FFTW will perform as fast as possible.
Most x86 CPUs have support for Single-Instruction-Multiple-Data (SIMD) operations in double precision, allowing to perform the same double precision arithmetic operations on a vector of 2 (SSE2) or 4 (AVX) double precision numbers, effectively multiplying by 2 or 4 the computing power. Modern compilers attempt to vectorize the code, but they cannot do miracles. SHTns uses explicit vectorization extensively (thanks to the GCC vector extensions) to use the full computing power of your CPU.
The default mode used by SHTns is to measure performance of the different algorithms, and choose the one that performs best (it will also check that the accuracy is good enough). However, there are situation where either the Gauss-Legendre algorithm or a regular grid is required, and you can choose to do so using the adequate shtns_type when calling shtns_init or shtns_set_grid.