Kernel Developer
Kernel Developer: design and optimize high-performance user-space compute kernels for AI accelerators in C/C++. Shape latency, throughput, and efficiency at the hardware–software edge.
We usually respond within a day
Kernel Developer
About the Role
We are looking for a C/C++ engineer to design and optimize high-performance compute kernels for AI accelerators.
This is not Linux kernel development, and it does not involve drivers, operating systems, or kernel modules.
In this context, “kernels” refer to user-space compute kernels for tensor operations used in deep learning workloads.
You will work at the lowest level of AI performance engineering — where software meets specialized hardware.
Responsibilities
Design and implement high-performance compute (operator) kernels in C/C++
Develop core tensor operations
Optimize performance for AI accelerators (latency, throughput, and efficiency)
Apply low-level optimization techniques
Profile, benchmark, and tune kernels to eliminate performance bottlenecks
Contribute to internal libraries and runtime systems for AI workloads
Requirements
Strong proficiency in C/C++
Experience with performance-critical software development
Strong understanding of low-level optimization techniques
Understanding of CPU/GPU or accelerator architecture fundamentals
Ability to analyze and debug complex systems
Experience working with large, complex codebases
Strong communication and teamwork skills
Nice to Have
Experience with GPU kernel programming (CUDA / ROCm / OpenCL)
Experience with Triton or similar kernel programming frameworks
Knowledge of instruction set architectures (ISA)
Familiarity with compiler technologies (e.g., LLVM-based stacks)
Experience with distributed communication frameworks (NCCL, MPI, libfabric)
Understanding of deep learning models
What We Offer
Highly competitive salary, employment contract (Umowa o Pracę), and a comprehensive benefits package, including Medicover healthcare coverage.
Work on the performance-critical compute layer for next-generation AI accelerators
Direct impact on deep learning model efficiency and latency
Collaboration with experts in hardware, compilers, and systems
Challenging low-level performance engineering problems at the hardware–software boundary
- Locations
- Gdańsk
- Remote status
- Hybrid
- Employment type
- Full-time