Language-embedded programming directly in C++

Our language-embedded programming technique (patent pending) gives you a completely new programming experience. Host- and Device-code are combined in a single program and compiled by a standard C++ compiler. This simplifies the inclusion into existing projects, it improves portability, and simplifies the programming.


Code reusability

Functions can be written that run both on the CPU and on the GPU. They only have to be programmed once. This makes programming more reliable, and it reduces code size. It is even possible to use some already-existing libraries and functions, even if they have never been designed to run on the GPU (e.g. std::array, std::complex, Eigen::Matrix, …)!



The boundaries between CPU code and GPU code become blurred. GPU/CPU Instructions can be mixed. Calculation paths are optimized away and fully computed in registers, leading to great performance benefits. GOOPAX allows whole new programming methods. Try it, you will be amazed!


HaRDware Features

Most common hardware features are supported: local memory, shuffle/ballot/reduce functions, atomic memory access, memory pointers across devices, direct memory access on other GPUs (of supported by the hardware), hardware-accelerated math functions, special integer functions, etc.


Automatic error detection mechanisms

Do you also spend a lot of time looking for errors in your program? You are not alone. Programmers routinely spend hours looking for bugs. However, much of this effort can be done automatically. GOOPAX offers very sophisticated ways of automatic error detection and can find the programming error for you. This improves reliability of your results, and it significantly speeds up your code development, reducing the time you have to spend looking for programming errors.

local_mem<float> A(local_size());

A[local_id()] = x;


gpu_float y = A[local_id() ^ 1];

"Race condition! Thread 1 is writing a value that was read by thread 0!"aaa

local_mem<float> A(local_size());

A[local_id()] = x;


gpu_float y = A[local_id() ^ 1];



static type checking

Many common programming mistakes are already be detected at compile time. This makes it even easier for you to locate the error.



GOOPAX is designed to deliver maximum performance. All the functionalities necessary for writing high-performance programs are provided. Sophisticated optimizations will make sure that your program runs at best speed. 


OpenGL interoperability

GOOPAX offers interoperability with OpenGL. Data can be passed directly to OpenGL via shared memory resources. This opens up possibilities in graphics and computer games, bringing high level programming to graphic applications. Direct3D and Vulkan interoperability is planned for future releases.


Hardware independence, Performance Portability

GOOPAX supports a wide range of hardware devices. By following a plug-and-play strategy, the hardware is detected at run-time, and GPU kernels are specifically made for that hardware. This allows performance-portable applications that will run with maximum performance on all hardware.


System requirements

    •    OS: Linux, Windows, macOS, iOS, android

    •    C++ Compiler, min. C++-11

    •    CPU: x86, ARM, PowerPC

    •    GPU: Most modern GPUs are supported from AMD, NVIDIA, Intel, ARM, vivante, qualcomm, etc

    •    OpenCL >= 1.1, CUDA >= 8.0, or Metal

Support for other systems on request.