CUDA transforms working on GPU memory, without transfers, but NOT thread-safe. More...

Functions
void	cushtns_profiling (shtns_cfg, int on)

double	cushtns_profiling_read_time (shtns_cfg, double time_1, double time_2)

const char *	cushtns_get_cfg_info (shtns_cfg)

Transforms
void	cu_spat_to_SH (shtns_cfg shtns, double Vr, cplx Qlm, int ltr)
	Same as spat_to_SH, but working on data residing on the GPU.

void	cu_spat_to_SH_float (shtns_cfg shtns, float Vr, cplx_f Qlm, int ltr)
	Same as cu_spat_to_SH, but working on single precision data.

void	cu_SH_to_spat (shtns_cfg shtns, cplx Qlm, double Vr, int ltr)
	Same as SH_to_spat, but working on data residing on the GPU.

void	cu_SH_to_spat_float (shtns_cfg shtns, cplx_f Qlm, float Vr, int ltr)
	Same as cu_SH_to_spat, but working on single precision data.

void	cu_spat_to_SHsphtor (shtns_cfg, double Vt, double Vp, cplx Slm, cplx Tlm, int ltr)
	Same as spat_to_SHsphtor, but working on data residing on the GPU.

void	cu_spat_to_SHsphtor_float (shtns_cfg, float Vt, float Vp, cplx_f Slm, cplx_f Tlm, int ltr)

void	cu_SHsphtor_to_spat (shtns_cfg, cplx Slm, cplx Tlm, double Vt, double Vp, int ltr)
	Same as SHsphtor_to_spat, but working on data residing on the GPU.

void	cu_SHsphtor_to_spat_float (shtns_cfg, cplx_f Slm, cplx_f Tlm, float Vt, float Vp, int ltr)

void	cu_SHsph_to_spat (shtns_cfg, cplx Slm, double Vt, double *Vp, int ltr)
	Same as SHsph_to_spat, but working on data residing on the GPU.

void	cu_SHsph_to_spat_float (shtns_cfg, cplx_f Slm, float Vt, float *Vp, int ltr)

void	cu_SHtor_to_spat (shtns_cfg, cplx Tlm, double Vt, double *Vp, int ltr)
	Same as SHtor_to_spat, but working on data residing on the GPU.

void	cu_SHtor_to_spat_float (shtns_cfg, cplx_f Tlm, float Vt, float *Vp, int ltr)

void	cu_spat_to_SHqst (shtns_cfg, double Vr, double Vt, double Vp, cplx Qlm, cplx Slm, cplx Tlm, int ltr)
	Same as spat_to_SHqst, but working on data residing on the GPU.

void	cu_spat_to_SHqst_float (shtns_cfg, float Vr, float Vt, float Vp, cplx_f Qlm, cplx_f Slm, cplx_f Tlm, int ltr)

void	cu_SHqst_to_spat (shtns_cfg, cplx Qlm, cplx Slm, cplx Tlm, double Vr, double Vt, double Vp, int ltr)
	Same as SHqst_to_spat, but working on data residing on the GPU.

void	cu_SHqst_to_spat_float (shtns_cfg, cplx_f Qlm, cplx_f Slm, cplx_f Tlm, float Vr, float Vt, float Vp, int ltr)

Initialization
int	cushtns_init_gpu (shtns_cfg shtns)
	Initialize given config to work on the current (or default) GPU, allowing to call GPU transforms cu_* above, working on data residing in the memory of this GPU.

shtns_cfg	cushtns_clone (shtns_cfg shtns, shtns_gpu_stream_t compute_stream, shtns_gpu_stream_t transfer_stream)
	Clone a gpu-enabled shtns config, and assign it to different streams (to allow compute overlap and/or usage from multiple threads).

void	cushtns_set_streams (shtns_cfg shtns, shtns_gpu_stream_t compute_stream, shtns_gpu_stream_t transfer_stream)
	Set user-specified streams for compute (including fft) and transfer.

void	cushtns_release_gpu (shtns_cfg)
	Release resources needed for GPU transforms, which won't work after this call.

Detailed Description

CUDA transforms working on GPU memory, without transfers, but NOT thread-safe.

Warning: These transforms are NOT thread-safe. Use one distinct shtns_cfg per thread. Use cushtns_clone to clone them. The transforms are Non-blocking, working with their own streams. Each clone has distinct streams.

See also: Using SHTns with GPU (nvida and AMD)

Function Documentation

◆ cushtns_clone()

shtns_cfg cushtns_clone	(	shtns_cfg	shtns,
		shtns_gpu_stream_t	compute_stream,
		shtns_gpu_stream_t	transfer_stream )

Clone a gpu-enabled shtns config, and assign it to different streams (to allow compute overlap and/or usage from multiple threads).

This implies allocation of memory on the GPU and other limited resources, and thus may fail.

Parameters

[in]	shtns	is a valid shtns configuration created with shtns_create and with an associated grid (see shtns_set_grid_auto )
[in]	compute_stream	is a cuda Stream that will be used for transforms. If 0, the default (0) stream will be used.
[in]	transfer_stream	is a cuda Stream that will be used for data transfers between host and device for auto-offload mode. If 0, a new stream will be created and used.

Returns: a new shtns_cfg that can safely be used concurrently with the original one.

◆ cushtns_init_gpu()

int cushtns_init_gpu ( shtns_cfg shtns )

Initialize given config to work on the current (or default) GPU, allowing to call GPU transforms cu_* above, working on data residing in the memory of this GPU.

This does not enable auto-offload. Use cudaSetDevice() or hipSetDevice() to set the target GPU before calling this function. Note that it is the user's responsibility to ensure the current device will be the same for subsequent calls to transform functions with this configuration.

Parameters

[in] shtns is a valid shtns configuration created with shtns_create and with an associated grid (see shtns_set_grid_auto )

Returns: device_id on success, or -1 on failure.

Functions

Transforms

Initialization

Detailed Description

Function Documentation

◆ cushtns_clone()

◆ cushtns_init_gpu()