↗⚡
Principal Machine Learning Engineer, Mobile AI Inference Optimization
Negotiable
The opportunity We are building the next generation of mobile game AI experiences, deploying world models to mobile on-device. As our Principal Machine Learning Engineer, you will be the foremost technical authority on bringing state-of-the-art multi-modal models (transformers, diffusion networks, and JAPE-style architectures) from research to production on mobile hardware.
This is a deeply hands-on, high-impact role. You will define the inference strategy, drive architectural decisions across the full mobile ML stack, and mentor a team of senior and mid-level engineers. Your work will directly determine the latency, quality, and power profile of AI-driven features experienced by billions of mobile game players.
What you'll be doing
- Technical Leadership:
- Set the technical vision and roadmap for deploying multi-modal AI models to iOS and Android, spanning transformers, diffusion models, and JAPE-style generative architectures.
- Make authoritative decisions on model compression, quantization, pruning, and knowledge distillation strategies to meet mobile latency and memory budgets.
- Evaluate and select inference runtimes (e.g., CoreML, ONNX Runtime Mobile, TFLite, ExecuTorch) and drive adoption across the team.
- Own the end-to-end optimization pipeline: from model export and graph transformation to hardware-specific kernel tuning on NPU, GPU, and CPU.
- Architecture & Research Translation:
- Collaborate directly with research scientists to translate novel model architectures into deployable, mobile-optimized implementations.
- Design scalable systems for multi-modal inference that process diverse inputs — images, text, primitives, and metadata — and produce pixel-level outputs with real-time performance.
- Pioneer new approaches to dynamic resolution, token reduction, and speculative decoding tailored to mobile constraints.
- Track and rapidly adopt breakthroughs in efficient diffusion (e.g., consistency models, flow matching) and efficient attention (e.g., FlashAttention, linear attention variants).
- Team & Cross-Functional Leadership:
- Lead and mentor a team of ML engineers; define engineering best practices, code review standards, and on-device benchmarking methodology.
- Partner with platform engineers, product managers, and runtime teams to align ML capabilities with device SKU constraints and product roadmaps.
- Champion a culture of measurement: define KPIs for latency, accuracy, memory, and power consumption and ensure the team tracks them rigorously.
What we're looking for
- 8+ years in ML engineering, with at least 3 years focused on on-device / edge inference optimization.
- Proven production deployment of transformer-based models (e.g., ViT, LLaMA, Stable Diffusion) and/or JAPE-style generative architectures on mobile or embedded hardware.
- Hands-on expertise with CoreML, TFLite, ONNX Runtime, and/or ExecuTorch; deep understanding of operator fusion, memory layout, and runtime scheduling.
- Expert-level command of INT8/INT4/FP16 quantization, weight sharing, structured/unstructured pruning, and knowledge distillation.
- Strong understanding of mobile SoC architectures (Apple Neural Engine, Qualcomm Hexagon/Adreno, ARM Mali) and how to target each for peak throughput.
- Proficiency in C++ / Objective-C / Swift for runtime integration; solid Python for training-side tooling and export pipelines.
- Ability to read, imple
👤 HumanFull-time
By Unity TechnologiesMay 26, 2026