Posts

Showing posts from November, 2025

How to Use Vibe Coding and LLMs to Write Reliable, Production Ready Code

Image
Vibe coding is a simple way of building software with the help of an LLM. You guide the model with clear instructions, it generates code and you refine it. This creates a fast loop that makes development quicker and more flexible. It speeds up development, helps you test ideas faster and lets you build with more momentum. But many developers still raise concerns about LLM generated code.  Common complaints include: The model misunderstands requirements Code quality is inconsistent Long term maintenance becomes hard Bugs slip through easily LLM based coding is new and it comes with limitations. Still, you can get reliable, production ready code by following a few simple techniques. These tips work with any AI coding tool, including: Cursor GitHub Copilot Google Antigravity Windsurf I use GitHub Copilot and Python, but these techniques apply to any coding agent or language. Techniques to Improve Code Quality Plan Before You Code Write a clear plan ...

Semantic caching for LLM Applications and AI Agents

Image
Caching is one of the easiest ways to speed up applications and control cost. But LLM based systems don’t work well with traditional caching because users phrase the same idea in many different ways. This means most queries turn into cache misses. Still, caching is important. LLM agents can take time to run, and inference is expensive. A smarter caching method is needed. Why Traditional Caching Fails Works on exact matching. Natural language rarely matches exactly. Same intent written differently becomes a cache miss. Result: almost no benefit for LLM workloads. What Semantic caching Does Focuses on meaning instead of exact text. Convert each query into an embedding. Store embeddings in a vector database like Redis, Qdrant, or Milvus. Add TTL to control freshness. For each new query: Convert to embedding. Run similarity search. If similar enough, return cached output. Otherwise, run the agent. Redis Code Sample for Semantic caching Belo...

Deploying AI Agents in Production Using Open Source Architecture

Image
AI agents are becoming core parts of modern applications, but deploying them in production at scale is challenging. The diversity of agent frameworks, unpredictable latency, and the need for streaming responses make it hard to use simple REST-only patterns. Most proof-of-concept deployments break down when real workloads, traffic, and observability requirements hit. This blog post explains how to design a production-ready, open-source architecture for AI agents using FastAPI, Celery, Redis, Kubernetes, KEDA, Prometheus, Grafana, LangFuse, and LangGraph . Why Deploying AI Agents Is Hard Agent frameworks differ widely in required infrastructure. Latency varies from milliseconds to minutes depending on workflow complexity. Real-time streaming is needed for modern AI UX. REST-only patterns can’t handle long execution, retries, or async scheduling. Scaling compute-heavy agents is fundamentally different from scaling API servers. Why the Traditional REST Architecture Fail...