Saturday, April 5th 2025
MangoBoost Achieves Record-Breaking MLPerf Inference v5.0 Results with AMD Instinct MI300X
MangoBoost, a provider of cutting-edge system solutions designed to maximize AI data center efficiency, has set a new industry benchmark with its latest MLPerf Inference v5.0 submission. The company's Mango LLMBoost AI Enterprise MLOps software has demonstrated unparalleled performance on AMD Instinct MI300X GPUs, delivering the highest-ever recorded results for Llama2-70B in the offline inference category. This milestone marks the first-ever multi-node MLPerf inference result on AMD Instinct MI300X GPUs. By harnessing the power of 32 MI300X GPUs across four server nodes, Mango LLMBoost has surpassed all previous MLPerf inference results, including those from competitors using NVIDIA H100 GPUs.
Unmatched Performance and Cost Efficiency
MangoBoost's MLPerf submission demonstrates a 24% performance advantage over the best-published MLPerf result from Juniper Networks utilizing 32 NVIDIA H100 GPUs. Mango LLMBoost achieved 103,182 tokens per second (TPS) in the offline scenario and 93,039 TPS in the server scenario on AMD MI300X GPUs, outperforming the previous best result of 82,749 TPS on NVIDIA H100 GPUs. In addition to superior performance, Mango LLMBoost + MI300X offers significant cost advantages. With AMD MI300X GPUs priced between $15,000 and $17,000—compared to the $32,000-$40,000 cost of NVIDIA H100 GPUs (source: Tom's Hardware—H100 vs. MI300X Pricing)—Mango LLMBoost delivers up to 62% cost savings while maintaining industry-leading inference throughput.In terms of cost-efficiency, the Mango LLMBoost + MI300X system delivers approximately 2.8× more inference throughput per $1,000 spent than the H100-based system, making it the clear choice for high-performance, budget-conscious deployments.
Mango LLMBoost: A Scalable and Hardware-Flexible MLOps Solution
Mango LLMBoost is an enterprise-grade AI inference software that provides seamless scalability and cross-platform compatibility. It supports over 50 open models, including Llama, Qwen, and DeepSeek, with one-line deployment via Docker and built-in OpenAI-compatible APIs. The software is cloud-ready—available on AWS Marketplace, Microsoft Azure Marketplace, and Google Cloud Platform—and is also available for on-premise deployment for enterprises requiring full control and security.
Key capabilities of Mango LLMBoost include:
MangoBoost's record-breaking results were achieved through a close partnership with AMD, leveraging the ROCm software stack to maximize MI300X GPU performance. This collaboration has resulted in a scalable and efficient AI inference solution that can be deployed across single-node or multi-node clusters with ease.Extending Performance Leadership to AWS and Beyond
Beyond the MLPerf results, Mango LLMBoost has been extensively tested on various cloud and on-premises configurations. On an 8×NVIDIA A100 GPU setup from AWS, Mango LLMBoost achieved up to 138x faster inference compared to Ollama and significantly outperformed HuggingFace TGI and vLLM across multiple model sizes, including LLaMA3.1-70B, DeepSeek-R1-Distill-Qwen-32B, and LLaMA3.1-8B. In terms of cost-efficiency, Mango LLMBoost also leads the pack with the lowest GPU cost per million tokens, reducing inference cost by over 99% compared to Ollama, and by over 30% even compared to vLLM on high-throughput workloads.
Expanding AI Infrastructure Solutions
In addition to the Mango LLMBoost software, MangoBoost offers hardware acceleration solutions based on Data Processing Units (DPUs) to enhance AI and cloud infrastructure, including:
Source:
MangoBoost PR
Unmatched Performance and Cost Efficiency
MangoBoost's MLPerf submission demonstrates a 24% performance advantage over the best-published MLPerf result from Juniper Networks utilizing 32 NVIDIA H100 GPUs. Mango LLMBoost achieved 103,182 tokens per second (TPS) in the offline scenario and 93,039 TPS in the server scenario on AMD MI300X GPUs, outperforming the previous best result of 82,749 TPS on NVIDIA H100 GPUs. In addition to superior performance, Mango LLMBoost + MI300X offers significant cost advantages. With AMD MI300X GPUs priced between $15,000 and $17,000—compared to the $32,000-$40,000 cost of NVIDIA H100 GPUs (source: Tom's Hardware—H100 vs. MI300X Pricing)—Mango LLMBoost delivers up to 62% cost savings while maintaining industry-leading inference throughput.In terms of cost-efficiency, the Mango LLMBoost + MI300X system delivers approximately 2.8× more inference throughput per $1,000 spent than the H100-based system, making it the clear choice for high-performance, budget-conscious deployments.
Mango LLMBoost: A Scalable and Hardware-Flexible MLOps Solution
Mango LLMBoost is an enterprise-grade AI inference software that provides seamless scalability and cross-platform compatibility. It supports over 50 open models, including Llama, Qwen, and DeepSeek, with one-line deployment via Docker and built-in OpenAI-compatible APIs. The software is cloud-ready—available on AWS Marketplace, Microsoft Azure Marketplace, and Google Cloud Platform—and is also available for on-premise deployment for enterprises requiring full control and security.
Key capabilities of Mango LLMBoost include:
- Auto Parallelization - Efficiently distributes large models across GPUs and nodes.
- Auto Config Tuning - Optimizes runtime parameters based on workload characteristics.
- Auto Context Scaling - Dynamically adapts memory usage to maximize GPU utilization.
- Auto Disaggregated Deployment - Ensures flexible deployment across multiple inference stages.
MangoBoost's record-breaking results were achieved through a close partnership with AMD, leveraging the ROCm software stack to maximize MI300X GPU performance. This collaboration has resulted in a scalable and efficient AI inference solution that can be deployed across single-node or multi-node clusters with ease.Extending Performance Leadership to AWS and Beyond
Beyond the MLPerf results, Mango LLMBoost has been extensively tested on various cloud and on-premises configurations. On an 8×NVIDIA A100 GPU setup from AWS, Mango LLMBoost achieved up to 138x faster inference compared to Ollama and significantly outperformed HuggingFace TGI and vLLM across multiple model sizes, including LLaMA3.1-70B, DeepSeek-R1-Distill-Qwen-32B, and LLaMA3.1-8B. In terms of cost-efficiency, Mango LLMBoost also leads the pack with the lowest GPU cost per million tokens, reducing inference cost by over 99% compared to Ollama, and by over 30% even compared to vLLM on high-throughput workloads.
Expanding AI Infrastructure Solutions
In addition to the Mango LLMBoost software, MangoBoost offers hardware acceleration solutions based on Data Processing Units (DPUs) to enhance AI and cloud infrastructure, including:
- Mango GPUBoost - RDMA acceleration for multi-node inference and training via RoCEv2.
- Mango NetworkBoost - TCP/IP stack offloading for enhanced CPU efficiency.
- Mango StorageBoost - High-performance NVMe/TCP initiator and target solutions for scalable AI storage.
Comments on MangoBoost Achieves Record-Breaking MLPerf Inference v5.0 Results with AMD Instinct MI300X
There are no comments yet.