New changes in this release:
- Adding
gpu_bfloat16
type. Available with CUDA backend if the C++ compiler supports std::bfloat16_t. - Many performance optimisations, especially for gpu_
half
and gpu_bfloat16
types. - Adding
gpu_return()
andallow_preload()
functions.