New changes in this release:
- Improving vectorisation of 16 bit floating point types
gpu_half
andgpu_bfloat16
. - Improving performance of CPU debug mode.
- Improving performance of kernel creation.
- Setting thread names to make debugging more comfortable (not available on windows).
- Adding optional
width
parameter to thework_group_any
andwork_group_all
functions. goopax_future
callback function can access the return value of the kernel.- Adding scripts to build external libraries for example programs.