– By Arpit Agrawal, CEO, Elastiq & Anil Khichar, CTO, Elastiq
When it comes to AI systems, we’ve seen this one trend time and time again. Out-of-the-box solutions shine in proofs of concept and pilots. But they hit a wall when it’s time to scale in production. That’s where most of our customer engagements focus: building AI systems that can truly scale in production. All while addressing governance and performance needs!
But what does it really take to put an AI system in production?
Identifying AI-Ready Problems
Before thinking about planning, first determine whether your problem really requires an AI solution. While Generative AI has captured attention across industries, it’s important to stay focused on the business problem that you’re trying to solve and identify the best solution for it. AI may or may not be the answer. Let’s find the best answer for your problem.
Before writing a single line of code, ask yourself:
- What’s the business challenge that you’re trying to solve?
- What’s the best solution for this challenge?
Use AI where this challenge can not be solved with a deterministic solution like a simple piece of code.
You have to think hard – “What is the right solution to my problem?” And not – “Oh I have an LLM, where can I use it?”
Arpit Agrawal, CEO, Elastiq
If you can’t articulate a clear and compelling link between your AI system and tangible business value, it’s time to reassess.
Testing the AI Systems
When is an AI system ready for production?
The Testing Paradigm
Traditionally AI systems have been tested using Technical metrics like ROC, AUC, Precision, and Recall. But even though they’re important, they aren’t enough on their own. Business Testing is crucial for testing AI systems, but is often forgotten!
Large Language Models (LLMs) hallucinate and other AI models may have varying degrees of accuracy. So it becomes imperative to test these systems, not only for performance but also for business impact.
A model that’s only 75% accurate but can help you reduce the operations cost by 50% is far more valuable than a model that’s 99% accurate but can’t be put into production!
The Human Parallel
Here’s a useful way to think about AI system testing: Consider how you’d measure the accuracy of a human doing the same job. If you have a human generating content or searching through records, what metrics would you use? You wouldn’t take a junior intern and immediately put them on complex tasks without validation. The same applies to AI systems.
Start with human-in-the-loop oversight, similar to supervising a junior employee. Give the system smaller tasks, build confidence gradually, and then scale up to more complex operations. This approach helps establish trust and reliability before full production deployment.
Handling Subjectivity in Outputs
One of the biggest challenges with Generative AI is testing subjectivity in outputs. If you ask for a travel itinerary or an RFP response, how do you determine if it’s good or bad?
One person might love the output while another may find it useless. You need structured evaluation criteria to make objective assessments.
We’ve developed a framework that breaks down feedback into four major categories:
- Context: Does the output contain relevant information?
- Wordiness: Is the response appropriately concise or detailed?
- Authenticity: Does it feel genuine and appropriate?
- Repetitiveness: Does it avoid unnecessary duplication?
This framework requires diverse feedback to avoid individual bias. Collect feedback from at least 5 customer stakeholders to validate the subjective performance of the AI system.
Scaling: The Production Challenge
When clients tell us their systems aren’t scaling fast enough, can’t handle sufficient concurrent requests, or become prohibitively expensive at scale, we focus on three critical components:
Data Processing Scale
Think about processing millions of datapoints and documents, different data formats, and varying content types. Your system needs to handle this efficiently and reliably.
Query Processing Scale
Ensure the LLM deployments can scale to handle millions of user queries per second while maintaining performance.
System Infrastructure Scale
This includes your compute resources, GPUs, CPUs, and overall architecture. As Anil Kumar Khichar, CTO at Elastiq puts it, “Design a system for failure.” This mindset helps create truly scalable solutions.
Vector Database Selection: A Real-World Example
Let’s take our experience with Elastic Pixels, our multi-modal semantic search solution. When choosing a vector database, we had to choose between many options from cloud providers and open-source solutions.
After extensive benchmarking with 1 billion vector embeddings across PGVector (PostgreSQL), Pinecone, OpenSearch, BigQuery and various other Vector DBs, we landed on OpenSearch. Here’s why:
- It’s a true distributed system by design
- Scales both horizontally and vertically
- Maintains performance with billion-scale datasets
- Offers excellent price-performance ratio
- Provides fast ingestion and reindexing capabilities
Many systems fall apart at scale – they’re either too slow, resource-hungry, or expensive. OpenSearch hit the sweet spot for production workloads.
This was just one example and each use-case is different. Remember, there are a number of options for each component in your architecture, benchmark and evaluate what works best for you, based on your data volume, query volume, scale and cost requirements.
The Cost Equation
A token is a unit that’s important to understand how to price an AI system’s cost. For text processing, the general rule is four characters per token. For convenience, consider 1 word = 1 token.
The cost of AI models is generally documented per million tokens. See pricing of Gemini and OpenAI, for example. If you can estimate the number of tokens you’ll process for input and output, across your data volume and query volume, it’ll give you an estimate of your costs.
What are the levers that allow us to optimize cost for an AI system:
Model Selection
Don’t default to the most powerful LLM for everything. Consider using the right model based on the task at hand. For example, you may decide to use cheaper models for one-time data processing and the most premium flagship model for user queries. . Also consider open source models where the volume and cost make sense.
Governance and FinOps
- Implement quotas and thresholds at department, project and user-group level.
- Implement API interceptor layers
- Monitor and control token usage
Architecture Optimization
Decouple components of your architecture to scale independently. This is important to ensure you can optimize the price-performance of each component as well as scale them up or down based on actual usage patterns, even shut down resources when not needed.
Team Investment
Invest in upskilling solution architects and enterprise architects. Create a “GEN AI architect champion team” for architectural review. Skilled architects can save millions in unnecessary AI costs.
To all CXOs of enterprise customers: Upskill your Solution Architects & Enterprise Architects. They will review and validate whether models are right for specific use cases. They will be your gateway to save a lot of costs – their ROI is high!
Anil Khichar, CTO, Elastiq
Conclusion
While AI offers powerful capabilities, not every problem requires an AI solution. It should be used only where it truly justifies the business case and adds real, measurable business value.
Building production-ready AI systems requires careful consideration of quality, scale, and cost. The key is starting with the right problem framing, implementing comprehensive testing frameworks, choosing appropriate infrastructure components, and managing price-performance trade-offs.