Workloads Engineer
Workloads Engineer for AI systems: translate models into fast, scalable workloads on next-gen HW/SW; optimize from model to hardware and deploy robust, efficient runtimes.
We usually respond within a day
Workloads Engineer
(AI Systems / HW-SW Optimization)
Role Overview
This is not a traditional software engineering role.
We are looking for a Workloads Engineer responsible for translating AI models into efficient, production-ready execution on a new hardware + software stack. The role sits at the intersection of AI model understanding, systems engineering, and low-level performance optimization.
You will work across the full stack — from AI model structure down to hardware execution, ensuring that workloads are efficient, scalable, accurate, and robust on next-generation compute platforms.
Key Responsibilities
Analyze AI model architectures (including LLMs) and translate them into optimized execution workloads for custom HW/SW platforms
Design and implement high-performance software components for AI frameworks and runtime environments
Optimize AI workloads for:
performance (latency / throughput)
memory efficiency
parallel execution
numerical accuracy and stability
Identify and remove performance bottlenecks across the stack (model → runtime → hardware)
Contribute to design decisions for AI execution stack and system architecture
Support deployment and scaling of AI workloads in real-world environments
Required Qualifications
Bachelor’s or Master’s degree in Computer Science, Mathematics, Engineering, or related field
5+ years of hands-on software engineering experience (or AI model development experience)
Strong programming skills in Python and C++
Strong algorithmic thinking and ability to solve complex computational problems
Solid understanding of AI model architectures, especially transformers and LLMs
Experience in performance optimization (compute, memory, and parallelization techniques)
Strong communication skills and ability to work in cross-functional teams
Nice to Have
Experience with AI frameworks such as PyTorch, JAX, TensorFlow (training or inference)
GPU programming experience (CUDA, OpenCL) or parallel computing systems
Experience with AI performance tuning (latency, throughput, memory footprint optimization)
Familiarity with distributed systems and model deployment pipelines
Understanding of computer architecture (CPUs, GPUs, accelerators, memory hierarchies)
Experience working close to hardware / compilers / runtime systems
What We Offer
Highly competitive salary, employment contract (Umowa o Pracę), and a comprehensive benefits package, including Medicover healthcare coverage.
Work on the performance-critical compute layer for next-generation AI accelerators
Direct impact on deep learning model efficiency and latency
Collaboration with experts in hardware, compilers, and systems
Challenging low-level performance engineering problems at the hardware–software boundary
- Locations
- Gdańsk
- Remote status
- Hybrid
- Employment type
- Full-time