Our language-embedded programming technique (patent pending) gives you a completely new programming experience. Host- and Device-code are combined in a single program and compiled by a standard C++ compiler. This simplifies the inclusion into existing projects, it improves portability, and simplifies the programming.


Functions can be written that run both on the CPU and on the GPU. They only have to be programmed once. This makes programming more reliable, and it reduces code size. It is even possible to use some already-existing libraries and functions, even if they have never been designed to run on the GPU (e.g. std::array, std::complex, Eigen::Matrix, …)!


The boundaries between CPU code and GPU code become blurred. GPU/CPU Instructions can be mixed. Calculation paths are optimized away and fully computed in registers, leading to great performance benefits. GOOPAX allows whole new programming methods. Try it, you will be amazed!

Hardware Features

Most common hardware features are supported: local memory, shuffle/ballot/reduce functions, atomic memory access, memory pointers across devices, direct memory access on other GPUs (of supported by the hardware), hardware-accelerated math functions, special integer functions, etc.

Automatic error detection mechanisms

Do you also spend a lot of time looking for errors in your program? You are not alone. Programmers routinely spend hours looking for bugs. However, much of this effort can be done automatically. GOOPAX offers very sophisticated ways of automatic error detection and can find the programming error for you. This improves reliability of your results, and it significantly speeds up your code development, reducing the time you have to spend looking for programming errors.

local_mem<float> A(local_size());
A[local_id()] = x;

gpu_float y = A[local_id() ^ 1];
local_mem<float> A(local_size());
A[local_id()] = x;
gpu_float y = A[local_id() ^ 1];
“Race condition! Thread 1 is writing a value that was read by thread 0!”OK
Static type checking

Many common programming mistakes are already be detected at compile time. This makes it even easier for you to locate the error.


GOOPAX is designed to deliver maximum performance. All the functionalities necessary for writing high-performance programs are provided. Sophisticated optimizations will make sure that your program runs at best speed. 

OpenCL / CUDA / Metal / OPenGL interoperability

Goopax can share resources with other GPU environments.

System requirements
  • OS:
    • Linux (glibc >= 2.19, CPU: x64, x32, arm64, arm32, Power8)
    • Windows (windows >= 7, CPU: x64, x32)
    • macOS (>= 10.14, CPU: x64, arm64)
    • iOS (>= 9.0)
    • android (>= 21)
  • C++ Compiler, min. C++-17
  • GPU: Most modern GPUs are supported from AMD, NVIDIA, Intel, ARM, Apple, vivante, qualcomm, etc
  • OpenCL >= 1.1, CUDA >= 8.0, or Metal

Support for other systems on request.