Real-World Performance on Common AI Pipelines Show Up to 90% Cost Savings and 15x Better Energy Efficiency Over Today’s AI Inference Servers

NeuReality, a leader in AI technology, has announced remarkable performance results from its commercially available NR1-S™ AI Inference Appliance, which significantly cuts costs and energy use in AI data centers, offering a much-needed solution to the growing concerns over AI’s high expenses and energy consumption. As governments, environmental organizations, and businesses raise alarms over AI’s unsustainable power consumption and exorbitant costs, NeuReality's breakthrough comes at a critical time with the explosive growth of generative AI. The NR1-S solution provides a responsible and affordable option for the 65% of global and 75% of U.S. businesses and governments struggling to adopt AI today.

The NR1-S does not compete with GPUs or other AI accelerators but rather boosts their output and complements them. NeuReality’s published results compare the NR1-S inference appliance paired with Qualcomm® Cloud AI 100 Ultra and Pro accelerators against traditional CPU-centric inference servers with Nvidia® H100 or L40S GPUs. The NR1-S achieves dramatically improved cost savings and energy efficiency in AI data centers across common AI applications compared to the CPU-centric systems currently relied upon by large-scale cloud service providers (hyperscalers), server OEMs and manufacturers such as Nvidia.

Key Benefits from NR1-S Performance

According to a technical blog shared on Medium this morning, NeuReality’s real-world performance findings show the following improvements:

  • Massive Cost Savings: When paired with AI 100 Ultra, NR1-S achieves up to 90% cost savings across various AI data types, such as image, audio and text. These are the key building blocks for generative AI, including large language models, mixture of experts (MoE), retrieval-augmented generation (RAG) and multimodality.
  • Critical Energy Efficiency: Besides saving on the capital expenditure (CAPEX) of AI use cases, the NR1-S shows up to 15 times better energy efficiency compared to traditional CPU-centric systems, further reducing operational expenditure (OPEX).
  • Optimal AI Accelerator Use: Unlike traditional CPU-centric systems, NR1-S ensures 100% utilization of the integrated AI accelerators without performance drop-offs or delays observed in today’s CPU-reliant systems.

Critical Impact for Ever-Evolving Real-World AI Applications

The performance data included key metrics like AI queries per dollar, queries per watt, and total cost of 1 million queries (both CAPEX and OPEX). The data zone in on natural language processing (NLP), automatic speech recognition (ASR), and computer vision (CV) commonly used in medical imaging, fraud detection, customer call centers, online assistants and much more:

  • Cost Efficiency: One of the ASR tests shows NR1-S cutting the cost of processing 1 million audio seconds from 43 cents to only 5 cents, making voice bots and other audio-based NLP applications more affordable and capable of handling more intelligence per query.
  • Energy Savings: The tests also measured energy consumption, with ASR showing seven seconds of audio processing per watt with NR1-S, compared to 0.7 seconds in traditional CPU-centric systems. This translates to a 10-fold increase in performance for the energy used.
  • Linear scalability: The NR1-S demonstrates the same performance output regardless of the number of AI accelerators used, allowing customers to efficiently scale their AI infrastructure up or down with zero performance loss. This ensures maximum return on investment without the diminishing returns typically caused by adding more GPUs or other accelerators in CPU-centric servers.

The NR1-S offers a practical solution for businesses and governments looking to adopt AI without breaking the bank or overloading power grids. It supports a variety of AI applications commonly used in the financial services, healthcare, biotechnology, entertainment, content creation, government, public safety and transportation sectors.

These real-world performance results provide a welcome remedy to the energy crisis facing AI infrastructure providers and next-generation hyperscalers’ supercomputers. “While faster and faster GPUs drive innovation in new AI capabilities, the current systems that support them also move us further away from the budget and carbon reduction goals of most companies,” said NeuReality Chief R&D Officer Ilan Avital. “Our NR1-S is designed to reverse that trend, enabling sustainable AI growth without sacrificing performance.”

“As the industry keeps racing forward with a narrow focus on raw performance for the biggest AI models, energy consumption and costs keep skyrocketing,” said NeuReality co-founder and CEO Moshe Tanach. “The NR1-S technology allows our customers to scale AI applications affordably and sustainably, ensuring they can achieve their business objectives and environmental targets. NeuReality was built from inception to solve the cost and energy problem in AI inferencing, and our new data clearly show we have developed a viable solution. It’s an exciting step forward for the AI industry.”

About NeuReality

Founded in 2019, NeuReality creates AI systems that are easy to use, affordable, and energy efficient. Their goal is to make AI accessible to everyone, from small businesses to large governments.

Learn More: For detailed test results and more information, visit the Full Blog.

Media Contact Treble Will Kruisbrink neureality@treblepr.com