Unlocking Multimodal Reasoning with Apriel-1.5-15B-Thinker
The ServiceNow AI Research Lab has introduced a groundbreaking multimodal reasoning model, Apriel-1.5-15B-Thinker. With its impressive set of features and capabilities, this 15-billion-parameter model is designed to enhance AI applications while offering significant cost savings.
What’s New in Apriel-1.5-15B-Thinker?
Frontier-Level Performance
Apriel-1.5-15B-Thinker achieves an Artificial Analysis Intelligence Index (AAI) score of 52, demonstrating a competitive performance while being significantly smaller than other models like DeepSeek-R1-0528. This score is derived from ten third-party evaluations, ensuring its reliability across various benchmarks.
Single-GPU Deployability
Designed for efficiency, this model can be deployed on a single GPU, making it ideal for on-premises and air-gapped environments. This feature addresses fixed memory and latency budgets while delivering powerful processing capabilities.
Open Weights and Reproducibility
For transparency and independent verification, Apriel-1.5-15B-Thinker ships with open weights and a reproducible training pipeline. Users can access the weights, training recipes, and evaluation protocols, promoting collaborative advancements in AI research.
Training Mechanism: A Closer Look
Base and Upscaling
Apriel-1.5-15B-Thinker builds upon Mistral’s Pixtral-12B-Base-2409 multimodal decoder-vision stack. The research team applied depth upscaling, enhancing the decoder layers while optimizing the vision encoder, which preserves the model’s deployability.
Continual Pretraining (CPT)
The training mechanism involves two main stages:
- Mixed text and image data to establish foundational reasoning and understanding.
- Synthetic visual tasks that refine spatial and compositional reasoning. The model manages to process lengthy sequence lengths of up to 32k and 16k tokens, respectively.
Supervised Fine-Tuning (SFT)
High-quality, reasoning-focused instruction data across various domains, such as math and coding, are utilized during the fine-tuning phase. Notably, the model avoids reinforcement learning or reinforcement learning from AI feedback, ensuring a streamlined training process.
Impressive Benchmark Results
Apriel-1.5-15B-Thinker has demonstrated remarkable performance across various benchmarks, including:
- AIME 2025: 87.5% – 88%
- GPQA Diamond: Approximately 71%
- IFBench: Around 62%
- τ²-Bench (Telecom): About 68%
- LiveCodeBench: 72.8%
Using the VLMEvalKit for reproducibility, Apriel shows competitive scores across multiple multimodal understanding tasks and excels in document and diagram processing.
Conclusion
Apriel-1.5-15B-Thinker exemplifies the potential of mid-training techniques to achieve high performance on critical benchmarks while remaining deployable on a single GPU. The model’s open weights and training methods make it an attractive option for enterprises looking to explore effective AI solutions without the overhead of larger, closed systems.
Related Keywords
- Multimodal reasoning
- Artificial Intelligence Index
- Open-source AI models
- GPU deployment
- Continual pretraining
- Supervised fine-tuning
- AI benchmarks