Workloads Engineer

(AI Systems / HW-SW Optimization)

Role Overview

This is not a traditional software engineering role.

We are looking for a Workloads Engineer responsible for translating AI models into efficient, production-ready execution on a new hardware + software stack. The role sits at the intersection of AI model understanding, systems engineering, and low-level performance optimization.

You will work across the full stack — from AI model structure down to hardware execution, ensuring that workloads are efficient, scalable, accurate, and robust on next-generation compute platforms.

Key Responsibilities

Analyze AI model architectures (including LLMs) and translate them into optimized execution workloads for custom HW/SW platforms
Design and implement high-performance software components for AI frameworks and runtime environments
Optimize AI workloads for:
- performance (latency / throughput)
- memory efficiency
- parallel execution
- numerical accuracy and stability
Identify and remove performance bottlenecks across the stack (model → runtime → hardware)
Contribute to design decisions for AI execution stack and system architecture
Support deployment and scaling of AI workloads in real-world environments

Required Qualifications

Bachelor’s or Master’s degree in Computer Science, Mathematics, Engineering, or related field
5+ years of hands-on software engineering experience (or AI model development experience)
Strong programming skills in Python and C++
Strong algorithmic thinking and ability to solve complex computational problems
Solid understanding of AI model architectures, especially transformers and LLMs
Experience in performance optimization (compute, memory, and parallelization techniques)
Strong communication skills and ability to work in cross-functional teams

Nice to Have

Experience with AI frameworks such as PyTorch, JAX, TensorFlow (training or inference)
GPU programming experience (CUDA, OpenCL) or parallel computing systems
Experience with AI performance tuning (latency, throughput, memory footprint optimization)
Familiarity with distributed systems and model deployment pipelines
Understanding of computer architecture (CPUs, GPUs, accelerators, memory hierarchies)
Experience working close to hardware / compilers / runtime systems

What We Offer

Highly competitive salary, employment contract (Umowa o Pracę), and a comprehensive benefits package, including Medicover healthcare coverage.
Work on the performance-critical compute layer for next-generation AI accelerators
Direct impact on deep learning model efficiency and latency
Collaboration with experts in hardware, compilers, and systems
Challenging low-level performance engineering problems at the hardware–software boundary

Workloads Engineer