SHTns 3.7
GPU transforms.

CUDA transforms working on GPU memory, without transfers, but NOT thread-safe. More...

Functions

void cushtns_profiling (shtns_cfg, int on)
 
double cushtns_profiling_read_time (shtns_cfg, double *time_1, double *time_2)
 
const char * cushtns_get_cfg_info (shtns_cfg)
 

Transforms

void cu_spat_to_SH (shtns_cfg shtns, double *Vr, cplx *Qlm, int ltr)
 Same as spat_to_SH, but working on data residing on the GPU.
 
void cu_spat_to_SH_float (shtns_cfg shtns, float *Vr, cplx_f *Qlm, int ltr)
 Same as cu_spat_to_SH, but working on single precision data.
 
void cu_SH_to_spat (shtns_cfg shtns, cplx *Qlm, double *Vr, int ltr)
 Same as SH_to_spat, but working on data residing on the GPU.
 
void cu_SH_to_spat_float (shtns_cfg shtns, cplx_f *Qlm, float *Vr, int ltr)
 Same as cu_SH_to_spat, but working on single precision data.
 
void cu_spat_to_SHsphtor (shtns_cfg, double *Vt, double *Vp, cplx *Slm, cplx *Tlm, int ltr)
 Same as spat_to_SHsphtor, but working on data residing on the GPU.
 
void cu_spat_to_SHsphtor_float (shtns_cfg, float *Vt, float *Vp, cplx_f *Slm, cplx_f *Tlm, int ltr)
 
void cu_SHsphtor_to_spat (shtns_cfg, cplx *Slm, cplx *Tlm, double *Vt, double *Vp, int ltr)
 Same as SHsphtor_to_spat, but working on data residing on the GPU.
 
void cu_SHsphtor_to_spat_float (shtns_cfg, cplx_f *Slm, cplx_f *Tlm, float *Vt, float *Vp, int ltr)
 
void cu_SHsph_to_spat (shtns_cfg, cplx *Slm, double *Vt, double *Vp, int ltr)
 Same as SHsph_to_spat, but working on data residing on the GPU.
 
void cu_SHsph_to_spat_float (shtns_cfg, cplx_f *Slm, float *Vt, float *Vp, int ltr)
 
void cu_SHtor_to_spat (shtns_cfg, cplx *Tlm, double *Vt, double *Vp, int ltr)
 Same as SHtor_to_spat, but working on data residing on the GPU.
 
void cu_SHtor_to_spat_float (shtns_cfg, cplx_f *Tlm, float *Vt, float *Vp, int ltr)
 
void cu_spat_to_SHqst (shtns_cfg, double *Vr, double *Vt, double *Vp, cplx *Qlm, cplx *Slm, cplx *Tlm, int ltr)
 Same as spat_to_SHqst, but working on data residing on the GPU.
 
void cu_spat_to_SHqst_float (shtns_cfg, float *Vr, float *Vt, float *Vp, cplx_f *Qlm, cplx_f *Slm, cplx_f *Tlm, int ltr)
 
void cu_SHqst_to_spat (shtns_cfg, cplx *Qlm, cplx *Slm, cplx *Tlm, double *Vr, double *Vt, double *Vp, int ltr)
 Same as SHqst_to_spat, but working on data residing on the GPU.
 
void cu_SHqst_to_spat_float (shtns_cfg, cplx_f *Qlm, cplx_f *Slm, cplx_f *Tlm, float *Vr, float *Vt, float *Vp, int ltr)
 

Initialization

int cushtns_init_gpu (shtns_cfg shtns)
 Initialize given config to work on the current (or default) GPU, allowing to call GPU transforms cu_* above, working on data residing in the memory of this GPU.
 
shtns_cfg cushtns_clone (shtns_cfg shtns, shtns_gpu_stream_t compute_stream, shtns_gpu_stream_t transfer_stream)
 Clone a gpu-enabled shtns config, and assign it to different streams (to allow compute overlap and/or usage from multiple threads).
 
void cushtns_set_streams (shtns_cfg shtns, shtns_gpu_stream_t compute_stream, shtns_gpu_stream_t transfer_stream)
 Set user-specified streams for compute (including fft) and transfer.
 
void cushtns_release_gpu (shtns_cfg)
 Release resources needed for GPU transforms, which won't work after this call.
 

Detailed Description

CUDA transforms working on GPU memory, without transfers, but NOT thread-safe.

Warning
These transforms are NOT thread-safe. Use one distinct shtns_cfg per thread. Use cushtns_clone to clone them. The transforms are Non-blocking, working with their own streams. Each clone has distinct streams.
See also
Using SHTns with GPU (nvida and AMD)

Function Documentation

◆ cushtns_clone()

shtns_cfg cushtns_clone ( shtns_cfg shtns,
shtns_gpu_stream_t compute_stream,
shtns_gpu_stream_t transfer_stream )

Clone a gpu-enabled shtns config, and assign it to different streams (to allow compute overlap and/or usage from multiple threads).

This implies allocation of memory on the GPU and other limited resources, and thus may fail.

Parameters
[in]shtnsis a valid shtns configuration created with shtns_create and with an associated grid (see shtns_set_grid_auto )
[in]compute_streamis a cuda Stream that will be used for transforms. If 0, the default (0) stream will be used.
[in]transfer_streamis a cuda Stream that will be used for data transfers between host and device for auto-offload mode. If 0, a new stream will be created and used.
Returns
a new shtns_cfg that can safely be used concurrently with the original one.

◆ cushtns_init_gpu()

int cushtns_init_gpu ( shtns_cfg shtns)

Initialize given config to work on the current (or default) GPU, allowing to call GPU transforms cu_* above, working on data residing in the memory of this GPU.

This does not enable auto-offload. Use cudaSetDevice() or hipSetDevice() to set the target GPU before calling this function. Note that it is the user's responsibility to ensure the current device will be the same for subsequent calls to transform functions with this configuration.

Parameters
[in]shtnsis a valid shtns configuration created with shtns_create and with an associated grid (see shtns_set_grid_auto )
Returns
device_id on success, or -1 on failure.