Pods and Prompts

Posts

Showing posts from 2025

How to Use Vibe Coding and LLMs to Write Reliable, Production Ready Code

By Nithin K Anil - November 23, 2025

Vibe coding is a simple way of building software with the help of an LLM. You guide the model with clear instructions, it generates code and you refine it. This creates a fast loop that makes development quicker and more flexible. It speeds up development, helps you test ideas faster and lets you build with more momentum. But many developers still raise concerns about LLM generated code. Common complaints include: The model misunderstands requirements Code quality is inconsistent Long term maintenance becomes hard Bugs slip through easily LLM based coding is new and it comes with limitations. Still, you can get reliable, production ready code by following a few simple techniques. These tips work with any AI coding tool, including: Cursor GitHub Copilot Google Antigravity Windsurf I use GitHub Copilot and Python, but these techniques apply to any coding agent or language. Techniques to Improve Code Quality Plan Before You Code Write a clear plan ...

Semantic caching for LLM Applications and AI Agents

By Nithin K Anil - November 20, 2025

Caching is one of the easiest ways to speed up applications and control cost. But LLM based systems don’t work well with traditional caching because users phrase the same idea in many different ways. This means most queries turn into cache misses. Still, caching is important. LLM agents can take time to run, and inference is expensive. A smarter caching method is needed. Why Traditional Caching Fails Works on exact matching. Natural language rarely matches exactly. Same intent written differently becomes a cache miss. Result: almost no benefit for LLM workloads. What Semantic caching Does Focuses on meaning instead of exact text. Convert each query into an embedding. Store embeddings in a vector database like Redis, Qdrant, or Milvus. Add TTL to control freshness. For each new query: Convert to embedding. Run similarity search. If similar enough, return cached output. Otherwise, run the agent. Redis Code Sample for Semantic caching Belo...

Deploying AI Agents in Production Using Open Source Architecture

By Nithin K Anil - November 16, 2025

AI agents are becoming core parts of modern applications, but deploying them in production at scale is challenging. The diversity of agent frameworks, unpredictable latency, and the need for streaming responses make it hard to use simple REST-only patterns. Most proof-of-concept deployments break down when real workloads, traffic, and observability requirements hit. This blog post explains how to design a production-ready, open-source architecture for AI agents using FastAPI, Celery, Redis, Kubernetes, KEDA, Prometheus, Grafana, LangFuse, and LangGraph . Why Deploying AI Agents Is Hard Agent frameworks differ widely in required infrastructure. Latency varies from milliseconds to minutes depending on workflow complexity. Real-time streaming is needed for modern AI UX. REST-only patterns can’t handle long execution, retries, or async scheduling. Scaling compute-heavy agents is fundamentally different from scaling API servers. Why the Traditional REST Architecture Fail...

Welcome to Pods and Prompts

By Nithin K Anil - September 21, 2025

Hi everyone 👋, Welcome to my blog, Pods and Prompts ! This is the space where I’ll be sharing my thoughts and experiences on Generative AI, Large Language Models (LLMs), and the infrastructure that powers AI applications . Everything I write here reflects my personal views and is in no way related to my past or present employers. A little about me: I’m Nithin Anil , an AI/ML engineer with over 13 years in the software industry . Over the years, I’ve worked with AWS, Azure, and GCP , built and managed large-scale distributed big data systems , and designed microservice-based architectures . For the past couple of years, my focus has been on Generative AI applications, and one of my strengths is managing infrastructure for highly available, low-latency AI systems . I’m also deeply experienced in Kubernetes and Infrastructure as Code . System design for millisecond or even microsecond latency is one of my favourite challenges. Beyond work, I’m pass...