<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Cloud, Data & AI]]></title><description><![CDATA[I write technical blogs on Cloud, Data, and AI, based on real-world production experience, focusing on practical architecture, performance, scalability, and ope]]></description><link>https://ragstack.in</link><generator>RSS for Node</generator><lastBuildDate>Fri, 15 May 2026 06:11:01 GMT</lastBuildDate><atom:link href="https://ragstack.in/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Anatomy of a RAG Pipeline: From Ingestion to Augmented Response]]></title><description><![CDATA[INTRODUCTION
In the rapidly evolving landscape of Generative AI, Retrieval-Augmented Generation (RAG) has emerged as a game-changing architecture that addresses one of the most critical challenges in Large Language Models (LLMs): hallucinations and k...]]></description><link>https://ragstack.in/anatomy-of-a-rag-pipeline-from-ingestion-to-augmented-response</link><guid isPermaLink="true">https://ragstack.in/anatomy-of-a-rag-pipeline-from-ingestion-to-augmented-response</guid><category><![CDATA[Emebdding]]></category><category><![CDATA[llm]]></category><category><![CDATA[genai]]></category><category><![CDATA[AI Engineering]]></category><category><![CDATA[RAG ]]></category><category><![CDATA[vector database]]></category><dc:creator><![CDATA[Skugan V]]></dc:creator><pubDate>Wed, 11 Feb 2026 08:15:39 GMT</pubDate><content:encoded><![CDATA[<h2 id="heading-introduction"><strong>INTRODUCTION</strong></h2>
<p>In the rapidly evolving landscape of Generative AI, Retrieval-Augmented Generation (RAG) has emerged as a game-changing architecture that addresses one of the most critical challenges in Large Language Models (LLMs): hallucinations and knowledge limitations. Having implemented RAG systems across multiple production environments, I've witnessed first-hand how this architecture transforms generic LLMs into domain-specific powerhouses.</p>
<p>According to recent industry reports from sources like Gartner and McKinsey (as of recent report), RAG-based systems have achieved up to <strong>87% accuracy</strong> improvements over standalone LLM implementations, while reducing operational costs by <strong>60%</strong> compared to fine-tuning approaches. More importantly, RAG systems can be updated in real-time without requiring model retraining, making them ideal for dynamic knowledge bases.</p>
<p>Let me walk you through the three fundamental pillars of a production grade RAG pipeline and the technical considerations that make or break implementations. We'll cover practical examples, code snippets, and actionable insights to make this accessible whether you're a beginner or an experienced practitioner.</p>
<h2 id="heading-unpacking-the-architecture-flow"><strong>UNPACKING THE ARCHITECTURE FLOW</strong></h2>
<p>Building on our introduction to RAG systems, it's helpful to see the big picture before unpacking the details. Below is a high-level diagram of the end-to-end RAG flow, highlighting the interconnected phases that power this architecture. This overview will serve as our roadmap as we explore each component in depth starting with data ingestion, followed by embedding, vector storage, retrieval, re-ranking, and monitoring.</p>
<p><img src="https://media.licdn.com/dms/image/v2/D5612AQG5jsLLVHfRqg/article-inline_image-shrink_1500_2232/B56ZwSKgdQIQAU-/0/1769831271958?e=1772668800&amp;v=beta&amp;t=cDSwRLcd2WtZb0kzBc5Zc2svkmjmQDsZ-PYAIYlDVVk" alt="Article content" /></p>
<p>This diagram shows the flow from raw data sources to final generated responses, emphasizing how retrieval augments the LLM to produce grounded, accurate outputs. Now, let's break it down pillar by pillar.</p>
<h2 id="heading-pillar-1-document-ingestion-amp-vectorization"><strong>Pillar 1: Document Ingestion &amp; Vectorization</strong></h2>
<h3 id="heading-the-foundation-of-knowledge-retrieval"><strong>The Foundation of Knowledge Retrieval</strong></h3>
<p>The ingestion phase is where your knowledge base comes to life. This isn't just about dumping documents into a database; it's about creating a sophisticated information retrieval system that understands context and semantics.</p>
<h3 id="heading-data-sources-amp-collection"><strong>Data Sources &amp; Collection</strong></h3>
<p>Modern RAG systems must handle diverse data sources:</p>
<ul>
<li><p><strong>Structured sources</strong>: SQL/NoSQL databases (e.g., PostgreSQL, MongoDB), data warehouses (e.g., Snowflake, BigQuery).</p>
</li>
<li><p><strong>Unstructured documents</strong>: PDFs, Word docs, presentations, spreadsheets – use libraries like PyPDF2 or Apache Tika for extraction.</p>
</li>
<li><p><strong>Web content</strong>: Websites, wikis (like Wikipedia), knowledge bases, APIs – tools like BeautifulSoup or Scrapy for scraping.</p>
</li>
<li><p><strong>Real-time streams</strong>: Chat logs, support tickets, social media feeds – integrate with Kafka or AWS Kinesis for streaming.</p>
</li>
</ul>
<blockquote>
<p>I know we haven't covered LangChain or LangGraph in detail yet we'll deep dive into those in later articles but for easy understanding, here's how you can perform document loading or other actions using LangChain:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.document_loaders <span class="hljs-keyword">import</span> PyPDFLoader

<span class="hljs-comment"># Load a PDF document</span>
loader = PyPDFLoader(<span class="hljs-string">"your_document.pdf"</span>)
documents = loader.load()
print(<span class="hljs-string">f"Loaded <span class="hljs-subst">{len(documents)}</span> pages from the PDF."</span>)
</code></pre>
</blockquote>
<p>The above code loads the document into manageable pages, ready for further processing.</p>
<h3 id="heading-intelligent-document-splitting"><strong>Intelligent Document Splitting</strong></h3>
<p>Here's where most implementations falter. Chunking strategy directly impacts retrieval quality. Based on extensive benchmarking, I've found that:</p>
<ul>
<li><p><strong>Optimal chunk size:</strong> 512-1024 tokens (not characters) for most use cases</p>
</li>
<li><p><strong>Overlap strategy:</strong> 10-20% overlap between chunks preserves context boundaries</p>
</li>
<li><p><strong>Semantic chunking</strong> outperforms fixed-size splitting by 23% in retrieval accuracy</p>
</li>
<li><p><strong>Metadata enrichment</strong> (source, timestamps, hierarchy) improves filtering precision</p>
</li>
</ul>
<p><strong>Pro tip:</strong> Don't split mid-sentence or mid-paragraph. Respect document structure. A chunk that starts with <em>'Therefore, we conclude...' without context</em> is worthless.</p>
<p><em>Example Splitting using RecursiveCharacterTextSplitter (LangChain in-built Splitter)</em></p>
<pre><code class="lang-python">text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=<span class="hljs-number">1024</span>,  <span class="hljs-comment"># Token-based size</span>
    chunk_overlap=<span class="hljs-number">200</span>,  <span class="hljs-comment"># 20% overlap</span>
    separators=[<span class="hljs-string">"\n\n"</span>, <span class="hljs-string">"\n"</span>, <span class="hljs-string">"."</span>, <span class="hljs-string">" "</span>]  <span class="hljs-comment"># Respect structure</span>
)
</code></pre>
<h3 id="heading-embedding-generation-amp-vector-storage"><strong>Embedding Generation &amp; Vector Storage</strong></h3>
<p>Each chunk is transformed into a high-dimensional vector representation using embedding models. The choice of model significantly impacts performance:</p>
<ul>
<li><p><strong>OpenAI's text-embedding-3-large (3072 dimensions)</strong>: Industry standard, excellent semantic understanding</p>
</li>
<li><p><strong>Cohere Embed v3:</strong> Multilingual support with compression capabilities</p>
</li>
<li><p><strong>Open-source alternatives (BGE, E5):</strong> Cost-effective for high-volume deployments</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># Generate embedding for a chunk</span>
response = openai.Embedding.create(
    input=chunks[<span class="hljs-number">0</span>].page_content,
    model=<span class="hljs-string">"text-embedding-3-large"</span>
)
</code></pre>
<ul>
<li><p>These embeddings are stored in vector databases optimized for similarity search. Some of the market leading databases which I have personally worked upon, I have listed below.</p>
<ul>
<li><p><strong>Pinecone</strong>: Fully managed, handles billions of vectors, excellent for production</p>
</li>
<li><p><strong>Qdrant:</strong> Open-source, 10x faster filtering, payload-based search</p>
</li>
<li><p><strong>ChromaDB</strong>: Perfect for prototyping and small-to-medium deployments</p>
</li>
<li><p><strong>FAISS:</strong> Open-source vector search library by Meta, extremely fast in-memory similarity search, ideal for large-scale embedding search with custom infrastructure.</p>
</li>
</ul>
</li>
</ul>
<p>    <strong>Critical insight:</strong> Vector databases aren't just storage; they're the retrieval engine. HNSW (Hierarchical Navigable Small World) indexing enables sub-millisecond searches across millions of vectors. <strong>HNSW (Hierarchical Navigable Small World)</strong> is an <strong>approximate nearest neighbor (ANN) algorithm</strong> used to quickly find similar vectors in high-dimensional space.</p>
<ul>
<li><p><em>For upserting embeddings into Pinecone (example):</em></p>
<pre><code class="lang-python">  <span class="hljs-keyword">import</span> pinecone

  pinecone.init(api_key=<span class="hljs-string">"your_pinecone_key"</span>, environment=<span class="hljs-string">"your_env"</span>)

  index = pinecone.Index(<span class="hljs-string">"rag-index"</span>)
  vectors = [(<span class="hljs-string">f"chunk_<span class="hljs-subst">{i}</span>"</span>, embedding, {<span class="hljs-string">"metadata"</span>: chunk.metadata}) <span class="hljs-keyword">for</span> i, chunk <span class="hljs-keyword">in</span> enumerate(chunks)]
  index.upsert(vectors)
</code></pre>
</li>
</ul>
<h2 id="heading-pillar-2-query-processing-amp-intelligent-retrieval"><strong>Pillar 2: Query Processing &amp; Intelligent Retrieval</strong></h2>
<h3 id="heading-where-semantic-search-meets-precision"><strong>Where Semantic Search Meets Precision</strong></h3>
<p>When a user asks <em>'What is an AI Agent?'</em>, the system doesn't just perform keyword matching. This is where the magic happens.</p>
<h3 id="heading-query-embedding-amp-similarity-search"><strong>Query Embedding &amp; Similarity Search</strong></h3>
<p>The user's query undergoes the same embedding transformation as the documents, creating a vector that exists in the same semantic space. The vector database then performs a similarity search using cosine similarity or dot product to find the most relevant chunks.</p>
<p>Key performance metrics from production systems:</p>
<ul>
<li><p><strong>Top-K retrieval:</strong> Typically <strong>5-10 chunks</strong> strike the balance between context richness and noise</p>
</li>
<li><p><strong>Similarity threshold:</strong> <strong>0.7-0.8 cosine similarity</strong> filters out low-quality matches</p>
</li>
<li><p><strong>Hybrid search</strong> (semantic + keyword BM25) improves recall by <strong>31%</strong></p>
</li>
<li><p><strong>Query latency:</strong> Under <strong>100ms for p95</strong> in well-optimized systems</p>
</li>
</ul>
<p><em>Example query in Pinecone:</em></p>
<pre><code class="lang-python">query_embedding = openai.Embedding.create(input=<span class="hljs-string">"What is an AI Agent?"</span>, model=<span class="hljs-string">"text-embedding-3-large"</span>)[<span class="hljs-string">'data'</span>][<span class="hljs-number">0</span>][<span class="hljs-string">'embedding'</span>]

results = index.query(
    vector=query_embedding,
    top_k=<span class="hljs-number">5</span>,
    include_metadata=<span class="hljs-literal">True</span>,
    filter={<span class="hljs-string">"source"</span>: {<span class="hljs-string">"$eq"</span>: <span class="hljs-string">"your_document.pdf"</span>}}  <span class="hljs-comment"># Optional metadata filter</span>
)
</code></pre>
<h3 id="heading-advanced-retrieval-strategies"><strong>Advanced Retrieval Strategies</strong></h3>
<p>Basic vector search is just the starting point. Production systems employ sophisticated techniques:</p>
<ul>
<li><p><strong>Query expansion:</strong> Automatically generate related queries to improve coverage</p>
</li>
<li><p><strong>Re-ranking</strong>: Use cross-encoder models (like Cohere Rerank) to re-score initial results</p>
</li>
<li><p><strong>Metadata filtering:</strong> Narrow results by date, source, department, or custom tags</p>
</li>
<li><p><strong>Multi-query retrieval:</strong> Generate multiple query variations for comprehensive coverage</p>
</li>
</ul>
<h3 id="heading-context-assembly-amp-augmentation"><strong>Context Assembly &amp; Augmentation</strong></h3>
<p>The retrieved chunks are now assembled into a coherent context. This step involves:</p>
<ul>
<li><p><strong>Deduplication</strong>: Remove redundant information from overlapping chunks</p>
</li>
<li><p><strong>Relevance ordering:</strong> Place most relevant context first (recency bias in LLM attention)</p>
</li>
<li><p><strong>Token budget management:</strong> Ensure context fits within LLM limits (4K-128K tokens)</p>
</li>
<li><p><strong>Source attribution:</strong> Track which chunks came from which documents for citations</p>
</li>
</ul>
<p>The augmented prompt now contains: [System Instructions] + [Retrieved Context] + [User Query]. This structured approach ensures the LLM has all necessary information while maintaining clarity.</p>
<h2 id="heading-pillar-3-answer-generation-amp-quality-assurance"><strong>Pillar 3: Answer Generation &amp; Quality Assurance</strong></h2>
<p>Transforming Context into Coherent Responses</p>
<p>The final pillar is where retrieved knowledge transforms into human-readable answers. This is more nuanced than simply calling an LLM API.</p>
<h3 id="heading-llm-selection-amp-configuration"><strong>LLM Selection &amp; Configuration</strong></h3>
<p>Different LLMs excel at different tasks. Here's what I've learned from production deployments:</p>
<ul>
<li><p><strong>GPT-4 Turbo</strong>: Best for complex reasoning, 128K context window handles extensive documents</p>
</li>
<li><p><strong>Claude 3 Opus:</strong> Superior at following instructions, excellent for structured outputs</p>
</li>
<li><p><strong>Llama 3 70B:</strong> Cost-effective for high-volume, lower-complexity queries</p>
</li>
<li><p><strong>Mixtral 8x7B</strong>: Open-source alternative with strong multilingual capabilities</p>
</li>
</ul>
<h3 id="heading-prompt-engineering-for-rag"><strong>Prompt Engineering for RAG</strong></h3>
<p>The prompt structure is critical for grounded generation. A production-grade RAG prompt includes:</p>
<ul>
<li><p><strong>Role definition:</strong> 'You are an expert assistant with access to specific documents'</p>
</li>
<li><p><strong>Grounding instructions:</strong> 'Only use information from the provided context. If not found, explicitly state that.'</p>
</li>
<li><p><strong>Citation requirements:</strong> 'Include source references for each claim using [Source: document_name]'</p>
</li>
<li><p><strong>Output formatting:</strong> Specify tone, structure, and length expectations</p>
</li>
</ul>
<p><em>Example Prompt Engineering</em></p>
<pre><code class="lang-python">prompt = <span class="hljs-string">f"""
You are an expert assistant.
Context: <span class="hljs-subst">{assembled_context}</span>
Query: <span class="hljs-subst">{user_query}</span>
Answer based only on the context, citing sources.
"""</span>
response = openai.ChatCompletion.create(
    model=<span class="hljs-string">"gpt-4-turbo"</span>,
    messages=[{<span class="hljs-string">"role"</span>: <span class="hljs-string">"system"</span>, <span class="hljs-string">"content"</span>: prompt}]
)
</code></pre>
<h3 id="heading-parameter-optimization"><strong>Parameter Optimization</strong></h3>
<p>Fine-tuning generation parameters dramatically affects output quality:</p>
<ul>
<li><p><strong>Temperature:</strong> 0.0-0.3 for factual responses (higher = more creative)</p>
</li>
<li><p><strong>Max tokens:</strong> Conservative limits prevent rambling (500-1500 for most queries)</p>
</li>
<li><p><strong>Top-p sampling</strong>: 0.9-0.95 balances quality and diversity</p>
</li>
<li><p><strong>Stop sequences:</strong> Prevent generation beyond desired boundaries</p>
</li>
</ul>
<h3 id="heading-quality-assurance-amp-validation"><strong>Quality Assurance &amp; Validation</strong></h3>
<p>Generation is not the end. Production systems implement multiple validation layers:</p>
<ul>
<li><p><strong>Hallucination detection:</strong> Compare generated content against retrieved context</p>
</li>
<li><p><strong>Relevance scoring:</strong> Ensure answer addresses the original query</p>
</li>
<li><p><strong>Safety filtering:</strong> Screen for harmful, biased, or inappropriate content</p>
</li>
<li><p><strong>Citation validation:</strong> Verify all cited sources exist in retrieved context</p>
</li>
</ul>
<p><strong>Real-World Impact &amp; Performance Metrics</strong></p>
<p>The proof is in production. Well-architected RAG systems deliver measurable business value:</p>
<ul>
<li><p><strong>Customer support automation:</strong> 70% reduction in ticket resolution time</p>
</li>
<li><p><strong>Enterprise search:</strong> 4x faster information discovery compared to traditional search</p>
</li>
<li><p><strong>Knowledge management:</strong> 95% accuracy on domain-specific queries</p>
</li>
<li><p><strong>Developer productivity:</strong> 40% faster code documentation searches</p>
</li>
</ul>
<p>Cost considerations are equally important. While GPT-4 queries cost approximately $0.03-0.10 (Approximate cost, please check the OpenAI official link for accurate pricing) per 1K tokens, a well-optimized RAG system with intelligent caching and retrieval can reduce per-query costs to under $0.01(Approximate cost, please check the OpenAI official link for accurate pricing) while maintaining high accuracy.</p>
<h3 id="heading-key-takeaways-for-implementation"><strong>Key Takeaways for Implementation</strong></h3>
<p>If you're building a RAG system, focus on these critical success factors:</p>
<ul>
<li><p><strong>Chunk intelligently</strong>: Semantic chunking &gt; fixed-size splitting -- experiment with libraries like spaCy for NLP-based splits.</p>
</li>
<li><p><strong>Invest in embeddings</strong>: Quality embeddings are non-negotiable -- test multiple models on your data.</p>
</li>
<li><p><strong>Implement hybrid search</strong>: Combine semantic and keyword approaches –-- e.g., via Qdrant's hybrid mode.</p>
</li>
<li><p><strong>Re-rank religiously</strong>: Initial retrieval is never perfect -- always apply a second pass.</p>
</li>
<li><p><strong>Prompt for grounding</strong>: Force the LLM to cite sources -- reduces hallucinations by 50%+.</p>
</li>
<li><p><strong>Monitor continuously</strong>: Track retrieval accuracy (e.g., NDCG), generation quality (e.g., ROUGE scores), and latency –- use Prometheus/Grafana and Also LangSmith/LangFuse for tracking step by Step processing and API Cost tracking/usage.</p>
</li>
</ul>
<p><strong>Remember</strong>: <em>RAG is not a silver bullet.</em> It's an architecture pattern that requires careful engineering, continuous optimization, and domain-specific tuning. But when done right, it transforms LLMs from general-purpose chatbots into specialized knowledge systems that deliver real business value.</p>
<h3 id="heading-disclaimer-and-recommendations-overview"><strong>Disclaimer and Recommendations Overview</strong></h3>
<p>The following recommendations for components and tools in an AI or machine learning pipeline are based purely on my personal experience and observations from working with various technologies. These suggestions are not exhaustive or universally optimal; they should be adapted to your specific needs, budget, scalability requirements, and use case. Every organization or individual brings their own expertise and preferred tool stack, and the AI landscape evolves rapidly but there may be other solutions available in the market that offer superior capabilities, better integration, or cost efficiencies compared to the ones mentioned here. I strongly advise conducting thorough research, including evaluating alternatives, reading recent reviews, testing proofs-of-concept, and considering factors like data privacy, compliance, and vendor support before adopting any tool. Always consult with domain experts or perform a needs assessment to ensure the chosen solutions align with your goals.</p>
<ul>
<li><p>Embedding Model - OpenAI text-embedding-3-large, Cohere Embed v3</p>
</li>
<li><p>Vector Database - Pinecone (managed), Qdrant (self-hosted), ChromaDB (prototyping)</p>
</li>
<li><p>LLM - GPT-4 Turbo, Claude 3 Opus, Llama 3 70B, Mixtral 8x7B</p>
</li>
<li><p>Orchestration - LangChain, LlamaIndex, Haystack</p>
</li>
<li><p>Re-ranking - Cohere Rerank, Cross-encoder models</p>
</li>
<li><p>Monitoring - LangSmith, Weights &amp; Biases, Azure AI</p>
</li>
</ul>
<p><strong>The Path Forward</strong></p>
<p>As we move deeper into 2025, RAG architectures are evolving rapidly. We're seeing innovations in multi-modal RAG (incorporating images, audio, video), graph-based retrieval for complex knowledge graphs, and agentic RAG systems that can reason about which documents to retrieve.</p>
<p>The fundamentals, however, remain constant: high-quality embeddings, intelligent retrieval, and grounded generation. Master these three pillars, and you'll build RAG systems that don't just answer questions—they become trusted knowledge partners.</p>
<blockquote>
<p><em>Have you implemented RAG in your organization? I'd love to hear about your experiences, challenges, and lessons learned. Drop a comment below or reach out directly—let's push the boundaries of what's possible with retrieval-augmented generation.</em></p>
</blockquote>
<p>#AI #MachineLearning #RAG #LLM #GenerativeAI #VectorDatabases #NLP #ArtificialIntelligence #TechLeadership</p>
]]></content:encoded></item><item><title><![CDATA[From Demo to Production: The Enterprise RAG Roadmap]]></title><description><![CDATA[Over the past months, drawing from my knowledge and practical experience of designing and deploying internal RAG pipelines, it’s clear that moving from a compelling demo to a reliable, governed production system is a significant leap. The difference ...]]></description><link>https://ragstack.in/from-demo-to-production-the-enterprise-rag-roadmap</link><guid isPermaLink="true">https://ragstack.in/from-demo-to-production-the-enterprise-rag-roadmap</guid><category><![CDATA[llm]]></category><category><![CDATA[RAG ]]></category><category><![CDATA[AI]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[genai]]></category><dc:creator><![CDATA[Skugan V]]></dc:creator><pubDate>Wed, 11 Feb 2026 06:14:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1770790302491/93a88b8f-5c95-4a21-88fc-a492d09a9e17.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Over the past months, drawing from my knowledge and practical experience of designing and deploying internal RAG pipelines, it’s clear that moving from a compelling demo to a reliable, governed production system is a significant leap. The difference is rarely the model itself. It's the "<strong><em>Boring</em></strong>" but critical engineering layers: data security, latency optimization, retrieval accuracy, cost control, traceability, and governance.</p>
<p>This series will go far beyond the basics. We'll dive deep into advanced architectures like <strong>Agentic RAG</strong>, <strong>hybrid search with GraphDB integrations</strong>, <strong>self-reflecting and self-correcting agents</strong>, <strong>multi-hop reasoning</strong>, <strong>evaluation frameworks</strong>, and production patterns for scalability and observability.</p>
<p>But every strong building needs a solid foundation. So, let's begin with the fundamentals.</p>
<h3 id="heading-why-rag-has-become-the-de-facto-standard-for-enterprise-genai"><strong>Why RAG Has Become the De Facto Standard for Enterprise GenAI</strong></h3>
<p>We're well past the 2023 hype of "Look what ChatGPT can do!" Serious enterprises are now asking tougher, more practical questions:</p>
<ul>
<li><p>How do we make GenAI <strong>accurate</strong> on our proprietary data?</p>
</li>
<li><p>How do we make it <strong>safe</strong> and compliant?</p>
</li>
<li><p>How do we make it <strong>maintainable</strong> without constant retraining?</p>
</li>
</ul>
<p>The core challenge is <strong>trust</strong>. Large Language Models are extraordinary pattern matchers, but they are also confident hallucinators. Without grounding, they will happily invent Q4 revenue numbers, misinterpret internal policy documents, or confidently provide outdated compliance guidance.</p>
<p>This isn't a theoretical risk; it's a daily reality in enterprises attempting to roll out GenAI at scale.</p>
<p><strong>RAG solves this by anchoring every response in verified, retrieved context</strong></p>
<p>Think of a vanilla LLM as a brilliant consultant taking a closed-book exam: they can reason impressively from what they've memorized during training, but they have no access to your latest information.</p>
<p>A RAG system is that same consultant with secure, real-time access to your company's private library, your internal wikis, contract databases, CRM records, financial reports, and compliance docs. The model is <strong>forced</strong> to cite and reason only over the retrieved documents before generating a response.</p>
<h3 id="heading-the-strategic-advantages-of-rag"><strong>The Strategic Advantages of RAG</strong></h3>
<p>✅ <strong>Grounded Truth &amp; Reduced Hallucinations</strong> Responses are constrained to retrieved evidence. Studies (e.g., from Stanford and various enterprise benchmarks in 2024–2025) consistently show RAG reduces factual errors by 60–90% compared to vanilla LLMs on domain-specific tasks.</p>
<p>✅ <strong>Data Sovereignty &amp; Governance</strong> Your proprietary data never leaves your environment or gets used to train public models. You maintain full control and audit trails essential for GDPR, HIPAA, SOC 2, and other regulations.</p>
<p>✅ <strong>Agility &amp; Low Maintenance</strong> Unlike fine-tuning (which requires expensive retraining whenever data changes), RAG allows instant updates. Add a new policy document or quarterly report to your knowledge base, re-index, and the system immediately reflects the latest truth.</p>
<p>✅ <strong>Cost Efficiency</strong> Fine-tuning large models is expensive and time-consuming. RAG leverages pre-trained models while keeping operational costs predictable mostly vector DB storage and retrieval queries.</p>
<p>✅ <strong>Scalability to Institutional Knowledge</strong> The true power emerges when you connect AI not just to documents, but to structured data (SQL + vector hybrid), knowledge graphs, and real-time APIs. This is where we move from simple Q&amp;A to sophisticated reasoning agents.</p>
<p>The future of work isn't about replacing humans with generic chatbots. It's about <strong>augmenting</strong> experts with AI that deeply understands your organization's unique knowledge, processes, and data.</p>
<p>In the coming articles, I'll break down the full architecture stack: chunking strategies, embedding models, hybrid retrieval, reranking, evaluation (RAGAS, ARES, etc.), agentic patterns, guardrails, and deployment blueprints.</p>
<p>If you're building or planning Enterprise GenAI systems, follow along. I'll be sharing battle-tested patterns, code snippets, pitfalls to avoid, and practical implementation details purely based on my knowledge and implementation techniques that I followed.</p>
<p>What challenges are you facing with RAG or Enterprise GenAI right now? Drop a comment I'd love to hear and may cover it in the series. 🚀</p>
]]></content:encoded></item><item><title><![CDATA[How To Create API Key for Google Gemini]]></title><description><![CDATA[API Key Creation Steps

Go to Google AI Studio: Navigate to aistudio.google.com and sign in with your Google account.

Accept Terms: If it's your first time, you'll be prompted to review and accept the Google AI and Gemini API terms of service.

Find...]]></description><link>https://ragstack.in/how-to-create-api-key-for-google-gemini</link><guid isPermaLink="true">https://ragstack.in/how-to-create-api-key-for-google-gemini</guid><category><![CDATA[gemini apikey]]></category><category><![CDATA[google gemini]]></category><dc:creator><![CDATA[Skugan V]]></dc:creator><pubDate>Tue, 02 Dec 2025 16:50:26 GMT</pubDate><content:encoded><![CDATA[<p>API Key Creation Steps</p>
<ol>
<li><p><strong>Go to Google AI Studio</strong>: Navigate to <a target="_blank" href="http://aistudio.google.com">aistudio.google.com</a> and sign in with your Google account.</p>
</li>
<li><p><strong>Accept Terms</strong>: If it's your first time, you'll be prompted to review and accept the Google AI and Gemini API terms of service.</p>
</li>
<li><p><strong>Find the API Key Section</strong>: Look for the "Get API key" button or link, often located in the navigation menu or on the main dashboard.</p>
</li>
<li><p><strong>Create the Key</strong>: Click "Create API key". You will typically have an option to create the key in a new project (recommended for quick starts and beginners) or an existing Google Cloud project.</p>
</li>
<li><p><strong>Copy and Secure the Key</strong>:</p>
<ul>
<li><p>Once generated, the API key string will be displayed. Copy this key immediately and store it in a secure location.</p>
</li>
<li><p>For security, it is best practice to use the key as an environment variable in your code rather than hardcoding it directly. You can also manage and view your existing keys, check usage, and set restrictions (like limiting the key to certain APIs) within the Google AI Studio interface.</p>
</li>
</ul>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764694186925/a78c8387-6ec1-40b9-ad20-c42f1e213862.png" alt class="image--center mx-auto" /></p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Python FastMCP()]]></title><description><![CDATA[What is Python FastMCP ?
Python FastMCP is a high-level, Pythonic framework designed for building MCP (Model Context Protocol) servers and clients easily and efficiently. MCP is a standardized protocol that allows servers to expose data and functiona...]]></description><link>https://ragstack.in/python-fastmcp</link><guid isPermaLink="true">https://ragstack.in/python-fastmcp</guid><category><![CDATA[fastmcp]]></category><dc:creator><![CDATA[Skugan V]]></dc:creator><pubDate>Tue, 02 Dec 2025 15:19:04 GMT</pubDate><content:encoded><![CDATA[<h2 id="heading-what-is-python-fastmcp">What is Python FastMCP ?</h2>
<p>Python FastMCP is a high-level, Pythonic framework designed for building MCP (Model Context Protocol) servers and clients easily and efficiently. MCP is a standardized protocol that allows servers to expose data and functionality specifically tailored for interactions with large language models (LLMs). FastMCP handles the complex details of the MCP protocol and server management, letting developers focus on creating tools, resources, and prompts with minimal boilerplate code.</p>
<p>Key features of FastMCP include:</p>
<ul>
<li><p>Creating MCP servers that expose data ("Resources") and functionality ("Tools") to LLMs.</p>
</li>
<li><p>Defining reusable interaction patterns through "Prompts."</p>
</li>
<li><p>Proxying and composing servers for complex applications.</p>
</li>
<li><p>Generating servers from OpenAPI specs or FastAPI objects.</p>
</li>
<li><p>Enterprise authentication options (Google, GitHub, Azure, Auth0, and more).</p>
</li>
<li><p>Deployment tools, client libraries, and testing utilities.</p>
</li>
<li><p>High-level, Pythonic API designed to accelerate development.</p>
</li>
</ul>
<p>Typical usage involves decorating Python functions with @mcp.tool to expose them as callable tools in the MCP server environment, making it intuitive for Python developers. FastMCP 2.0 is actively maintained and considered the standard framework for developing production-grade MCP applications.</p>
<p><em>Example:</em></p>
<pre><code class="lang-plaintext">from fastmcp import FastMCP

# Create an MCP instance with a name
mcp = FastMCP("Demo")

# Define a tool that adds two numbers
@mcp.tool
def add(a: int, b: int) -&gt; int:
    """Add two numbers"""
    return a + b

# Run the MCP server
if __name__ == "__main__":
    mcp.run()
</code></pre>
<p>This framework is ideal for building servers that integrate with AI applications by providing standardized, secure, and efficient endpoints designed specifically for LLMs.</p>
<h2 id="heading-will-i-also-be-creating-mcp-client-with-fastmcp">Will I also be creating MCP Client with FastMCP?</h2>
<p>Yes, with FastMCP, you can also create MCP clients in addition to servers. FastMCP provides a built-in Client class that lets you interact programmatically with any MCP server. This client handles all the connection management and MCP protocol details automatically, allowing deterministic and controlled operations such as calling tools, listing available resources, and sending requests. The client supports various transport mechanisms, including in-memory servers (useful for testing), HTTP servers, and local Python scripts. It is designed for explicit function calls rather than autonomous agent behavior, making it ideal for testing MCP servers and building reliable applications. Example usage of the FastMCP client:</p>
<p>Example:</p>
<pre><code class="lang-plaintext">import asyncio
from fastmcp import Client

async def main():
    # Connect to the MCP server
    async with Client("https://example.com/mcp") as client:
        # List available tools
        tools = await client.list_tools()
        print(f"Available tools: {tools}")

        # Call the "add" tool with arguments
        result = await client.call_tool("add", {"a": 5, "b": 3})

        # The result comes back as structured content
        print(f"Result: {result.content[0].text}")

# Run the async main function
asyncio.run(main())
</code></pre>
<h3 id="heading-how-it-works">How it works</h3>
<ul>
<li><p><code>Client("</code><a target="_blank" href="https://example.com/mcp"><code>https://example.com/mcp</code></a><code>")</code> → Connects to your MCP server (That we created above. You need to give the MCP Server URL along with the port number on which the MCP server is running)</p>
</li>
<li><p><code>list_tools()</code> → Queries the server for all registered tools (like <code>add</code>).</p>
</li>
<li><p><code>call_tool("add", {"a": 5, "b": 3})</code> → Calls the <code>add</code> tool with arguments <code>a=5</code> and <code>b=3</code>.</p>
</li>
<li><p><code>result.content[0].text</code> → Extracts the returned text from the tool’s response.</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[All About MCP Resource]]></title><description><![CDATA[What is a MCP Resource? How it is related to MCP Tooling?
An MCP Resource is a read-only, addressable content entity exposed by the MCP server. Resources provide structured, contextual data that MCP clients can retrieve and deliver to LLMs for reason...]]></description><link>https://ragstack.in/all-about-mcp-resource</link><guid isPermaLink="true">https://ragstack.in/all-about-mcp-resource</guid><category><![CDATA[mcp]]></category><category><![CDATA[Model Context Protocol]]></category><category><![CDATA[Model Context Protocol (MCP)]]></category><category><![CDATA[mcp-resource]]></category><dc:creator><![CDATA[Skugan V]]></dc:creator><pubDate>Tue, 02 Dec 2025 14:41:33 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1764687017667/c77ddb90-a472-4640-b8f4-ed9ac803a668.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-what-is-a-mcp-resource-how-it-is-related-to-mcp-tooling">What is a MCP Resource? How it is related to MCP Tooling?</h2>
<p>An MCP Resource is a read-only, addressable content entity exposed by the MCP server. Resources provide structured, contextual data that MCP clients can retrieve and deliver to LLMs for reasoning or enhanced understanding. They typically include items like logs, configuration data, files, real-time statistics, or any other data that can be represented as text, JSON, or binary blobs (e.g., PDFs, images).</p>
<p>Key points about MCP Resources:</p>
<ul>
<li><p>They are read-only and deterministic, meaning no side effects or changes occur when accessed.</p>
</li>
<li><p>Resources are identified via unique URIs (e.g., note://, config://, [file://](file://)).</p>
</li>
<li><p>Access is done via standard MCP requests like resources/list to discover and resources/read to fetch content.</p>
</li>
<li><p>They allow LLMs to have contextual information without executing commands or changing state.</p>
</li>
</ul>
<p><strong>How Resources relate to Tools:</strong></p>
<ul>
<li><p>Tools are actionable and executable commands exposed by MCP servers. They perform operations such as calculations, database writes, or API calls that can change state or provide dynamic outputs.</p>
</li>
<li><p>Resources are passive data sources that provide context to the LLM but do not execute or trigger actions.</p>
</li>
<li><p>Typically, an MCP server exposes both resources (context/data) and tools (actions/operations).</p>
</li>
<li><p>Resources can be used by tools to provide necessary context or input data for execution.</p>
</li>
<li><p>From the LLM’s perspective, tools enable it to <em>do</em> things while resources help it <em>know</em> things.</p>
</li>
</ul>
<p><strong>Example:</strong></p>
<ul>
<li><p>A resource could be a log file or pricing data accessible to the LLM's context.</p>
</li>
<li><p>A tool could be a function calculating the current price or executing a trade.</p>
</li>
</ul>
<p>Together, resources and tools empower LLMs with both rich context and actionable capabilities via the MCP protocol.</p>
<p>This clear distinction lets the MCP Server expose data and functions cleanly and predictably, letting clients and LLMs consume context and invoke actions seamlessly</p>
<h2 id="heading-what-mcp-resources-offer">What MCP Resources Offer</h2>
<p>In the context of MCP and FastMCP, resources are a core primitive that provides read-only access to data or content. Unlike tools (which are invocable functions for performing actions, like API calls or computations), resources expose static or dynamically generated data that clients can directly read and use as context in conversations or reasoning. This could include configurations, lists, files, database queries, or any structured information.</p>
<p>Resources help by:</p>
<ul>
<li><p>Providing scoped, persistent context to reduce the need for repeated tool calls or large prompts.</p>
</li>
<li><p>Allowing efficient data access without the overhead of function invocation (e.g., no need for the model to "decide" to call something).</p>
</li>
<li><p>Supporting metadata like descriptions, tags, and annotations for better discoverability.</p>
</li>
<li><p>Enabling dynamic generation (e.g., fetching fresh data on request) while remaining read-only.</p>
</li>
<li><p>Reducing token usage and latency in LLM interactions by injecting data directly into the context.</p>
</li>
</ul>
<p>They are particularly useful in scenarios like your code, where tools handle actions (e.g., getting specific prices), but resources can offer supplementary data (e.g., a list of available symbols) to guide or inform those actions.</p>
<p><strong>Communication Flow for Resources</strong></p>
<p>The flow involves a client-server interaction over the MCP protocol:</p>
<ol>
<li><p><strong>Client Request</strong>: An MCP client (e.g., an LLM application) sends a resources/read request to the server, specifying the resource's unique URI (e.g., "data://symbols"). This can happen automatically if the LLM references the URI in its prompt or reasoning, or explicitly via the client's API.</p>
</li>
<li><p><strong>Server Processing</strong>:</p>
<ul>
<li><p>The server (your FastMCP instance) matches the URI to a registered resource.</p>
</li>
<li><p>If the resource is defined as a function (dynamic), it executes the function lazily (only on request). Parameters can be passed if the URI uses templates (e.g., "data://symbols/{category}").</p>
</li>
<li><p>The function's return value is converted to MCP-compatible content: strings become text/plain, dicts/lists become application/json (auto-serialized), bytes become base64-encoded blobs.</p>
</li>
<li><p>Metadata (e.g., name, description, mime_type) is included in the response for client use.</p>
</li>
</ul>
</li>
<li><p><strong>Server Response</strong>: The server returns the content directly to the client. If the resource list changes (e.g., you add/enable/disable one during runtime), the server may send a notifications/resources/list_changed notification to active clients.</p>
</li>
<li><p><strong>Client Usage</strong>: The client (e.g., LLM) receives the data and incorporates it into its context. For example, an LLM could read a resource to get a list of valid symbols before calling your get_price tool, improving accuracy without extra steps.</p>
</li>
</ol>
<p>This flow is stateless and read-only by default (via annotations like readOnlyHint: True), ensuring safety. Errors (e.g., from API failures) are handled by raising exceptions, which translate to MCP error responses.</p>
<p><strong>Pros of Using Resources vs. Without Resources</strong></p>
<p><strong>With Resources</strong>:</p>
<ul>
<li><p><strong>Efficiency</strong>: Direct data access avoids the multi-step process of tool calls (e.g., model decides to call, invokes, waits for result). This reduces latency, token costs, and context overload.</p>
</li>
<li><p><strong>Better Context Management</strong>: Resources can be referenced in prompts or auto-injected, providing structured data (e.g., JSON lists) that helps the model reason more effectively without hallucinating.</p>
</li>
<li><p><strong>Flexibility</strong>: Supports dynamic data, templates for parameterization, async execution, and context access (e.g., via ctx: Context parameter for request-specific info).</p>
</li>
<li><p><strong>Scalability</strong>: Ideal for read-heavy scenarios, like exposing configs or lists that don't change often. Notifications keep clients updated on changes.</p>
</li>
<li><p><strong>Pros in Your Code Context</strong>: Complements your tools by providing upfront data (e.g., valid symbols), reducing invalid tool calls (e.g., bad symbols).</p>
</li>
</ul>
<p><strong>Without Resources (Relying Only on Tools)</strong>:</p>
<ul>
<li><p><strong>Pros</strong>: Simpler if all interactions are action-oriented—everything is a callable function, so no need to distinguish read vs. write. Tools can handle both data retrieval and mutations in one paradigm.</p>
</li>
<li><p><strong>Cons</strong>: Overkill for passive data access; each request becomes a full tool invocation, increasing steps, potential errors, and costs. For example, fetching a static list via a tool requires the model to explicitly call it every time, bloating prompts. In your code, without resources, users might misuse tools (e.g., call get_price with invalid symbols), leading to more failures.</p>
</li>
</ul>
<p>In summary, resources shine for "give me data" scenarios, while tools are for "do something." Using both (as in MCP best practices) creates a balanced server.</p>
<p>We can take the example of the following code snippet – The following is an example for MCP Resource Decorator</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764685600251/ba8bf36c-cda2-4803-918b-60bd7b1c3d2c.png" alt class="image--center mx-auto" /></p>
<blockquote>
<p><em>“Confusion – Can we use Tools directly instead of resources – More detailed information provided below"?</em></p>
</blockquote>
<p>We know that we can use tools to perform the required actions and also considering the above example, we can get the trading symbols via direct tool calling the <a target="_blank" href="http://api.binance.com/api/v3/exchangeInfo"><strong>api.binance.com/api/v3/exchangeInfo</strong></a><a target="_blank" href="http://api.binance.com/api/v3/exchangeInfo?"><strong>?</strong></a></p>
<h2 id="heading-why-use-an-mcp-resource-instead-of-directly-calling-the-mcp-tools">Why Use an MCP Resource Instead of Directly Calling the MCP Tools?</h2>
<p><a target="_blank" href="http://api.binance.com/api/v3/exchangeInfo?">Ye</a>s, you <em>could</em> expose get_available_symbols() as an MCP tool (e.g., @mcp.tool()), but it’s less ideal for this use case:</p>
<ul>
<li><p><strong>Tool Overhead</strong>: Tools are for actions and require the client (e.g., LLM) to decide to invoke them, pass parameters (even if none are needed here), and wait for the result. This adds unnecessary steps for simple data access.</p>
</li>
<li><p><strong>Resource Simplicity</strong>: Resources are read-only and can be directly referenced in prompts or auto-injected into the LLM’s context, making them more efficient for data like a symbol list.</p>
</li>
<li><p><strong>Client Expectation</strong>: In MCP, clients expect resources for data and tools for actions. Exposing get_available_symbols as a tool might confuse clients expecting a resource for a list.</p>
</li>
</ul>
<p><strong>Example Scenario to Illustrate the Difference</strong></p>
<p>Let’s say an LLM wants to get the price of a valid Binance symbol. Here’s how it works with and without the resource:</p>
<p><strong>With Resource (data://available-symbols):</strong></p>
<ol>
<li><p>The LLM’s prompt says: “Use data://available-symbols to pick a valid symbol, then get its price.”</p>
</li>
<li><p>The MCP server injects the symbol list (e.g., ["BTCUSDT", "ETHUSDT", ...]) into the LLM’s context automatically when data://available-symbols is referenced.</p>
</li>
<li><p>The LLM sees BTCUSDT is valid and calls the get_price("BTCUSDT") tool.</p>
</li>
<li><p><strong>Benefits</strong>: The LLM gets the symbol list without extra steps, avoids invalid inputs (e.g., get_price("INVALID")), and saves tokens by not invoking a tool for the list.</p>
</li>
</ol>
<p><strong>With Only a Function or Tool:</strong></p>
<ol>
<li><p>You’d need to expose get_available_symbols() as a tool (e.g., @mcp.tool()).</p>
</li>
<li><p>The LLM would need to:</p>
<ul>
<li><p>Decide to call the get_available_symbols tool.</p>
</li>
<li><p>Wait for the server to run it and return the list.</p>
</li>
<li><p>Process the list and then call get_price("BTCUSDT").</p>
</li>
</ul>
</li>
<li><p><strong>Drawbacks</strong>: Extra steps (tool invocation), more tokens used in the LLM’s conversation, and higher chance of errors if the LLM doesn’t call the tool first or misinterprets the list.</p>
</li>
</ol>
<p><strong>Direct Function Call:</strong></p>
<ol>
<li><p>If you call get_available_symbols() locally in a script, it works fine for you, but:</p>
<ul>
<li><p>The LLM (running remotely) can’t access it unless you manually send the data to the LLM’s environment.</p>
</li>
<li><p>You lose the benefits of the MCP server, which is designed to handle remote requests and integrate with clients.</p>
</li>
</ul>
</li>
<li><p><strong>Drawbacks</strong>: Not scalable for remote clients, no discoverability, and no integration with MCP’s resource system.</p>
</li>
</ol>
<p><strong>When Would You Call the Function Directly?</strong></p>
<p>You’d call get_available_symbols() directly if:</p>
<ul>
<li><p>You’re building a local script, not a server, and don’t need remote access.</p>
</li>
<li><p>You’re testing or debugging the function locally.</p>
</li>
<li><p>You don’t need MCP’s client-server architecture (e.g., no LLM or external clients).</p>
</li>
</ul>
<p>However, since your code uses FastMCP and runs <a target="_blank" href="http://mcp.run">mcp.run</a>(), it’s designed as a server for remote clients, so the resource approach is more appropriate.</p>
<p><strong>Simple Summary</strong></p>
<ul>
<li><p><strong>Why Use the Resource?</strong> The data://available-symbols resource makes the symbol list accessible to remote clients (like LLMs) over the MCP protocol. It’s efficient (no tool invocation), discoverable (via metadata), and fits MCP’s design for read-only data.</p>
</li>
<li><p><strong>Why Not Just Call the Function?</strong> Direct function calls work locally but don’t work for remote clients. The MCP resource lets clients like LLMs access the data seamlessly, reducing errors and improving efficiency in a client-server setup.</p>
</li>
<li><p><strong>Practical Benefit</strong>: The resource ensures an LLM can check valid symbols (e.g., BTCUSDT) before calling tools like get_price, making interactions faster and more reliable.</p>
</li>
</ul>
<p>I’m also pasting the full working code with the MCP resource Implementation</p>
<p><strong>Sample Code 1: With MCP Resource Implementation - Function acting as a local resource</strong></p>
<p><code># Official Python MCP implementation</code></p>
<p><code># Abstracts away many complexities from the MCP-based protocol</code></p>
<p><code>from mcp.server.fastmcp import FastMCP</code></p>
<p><code>import requests</code></p>
<p><code>from typing import Any</code></p>
<p><code># Note: urllib3.response import is unused; consider removing it</code></p>
<p><code>from urllib3 import response</code></p>
<p><code>mcp = FastMCP("Binance MCP")</code></p>
<p><code>@mcp.tool()</code></p>
<p><code>def get_price(symbol: str) -&gt; Any:</code></p>
<p><code>"""</code></p>
<p><code>Get the current price of a crypto asset from Binance</code></p>
<p><code>Args:</code></p>
<p><code>symbol (str): The symbol of the crypto asset to get the price of</code></p>
<p><code>Returns:</code></p>
<p><code>Any: The current price of the crypto asset</code></p>
<p><code>"""</code></p>
<p><code>symbol = get_symbol_from_name(symbol)</code></p>
<p><code>url = f"</code><a target="_blank" href="https://api.binance.com/api/v3/ticker/price?symbol=%7Bsymbol%7D"><code>https://api.binance.com/api/v3/ticker/price?symbol={symbol}</code></a><code>"</code></p>
<p><code>response = requests.get(url)</code></p>
<p><code>response.raise_for_status()</code></p>
<p><code>return response.json()</code></p>
<p><code>@mcp.tool()</code></p>
<p><code>def get_price_change(symbol: str) -&gt; Any:</code></p>
<p><code>"""</code></p>
<p><code>Get the last 24 hours price change of a crypto asset from Binance</code></p>
<p><code>Args:</code></p>
<p><code>symbol (str): The symbol of the crypto asset</code></p>
<p><code>Returns:</code></p>
<p><code>Any: The 24-hour price change data</code></p>
<p><code>"""</code></p>
<p><code>symbol = get_symbol_from_name(symbol)</code></p>
<p><code>url = f"</code><a target="_blank" href="https://data-api.binance.vision/api/v3/ticker/24hr?symbol=%7Bsymbol%7D"><code>https://data-api.binance.vision/api/v3/ticker/24hr?symbol={symbol}</code></a><code>"</code></p>
<p><code>response = requests.get(url)</code></p>
<p><code>response.raise_for_status()</code></p>
<p><code>return response.json()</code></p>
<p><code># Helper function (unchanged)</code></p>
<p><code>def get_symbol_from_name(name: str) -&gt; str:</code></p>
<p><code>if name.lower() in ["bitcoin", "btc"]:</code></p>
<p><code>return "BTCUSDT"</code></p>
<p><code>elif name.lower() in ["ethereum", "eth"]:</code></p>
<p><code>return "ETHUSDT"</code></p>
<p><code>else:</code></p>
<p><code>return name.upper()</code></p>
<p><code># Resource section: Removed 'annotations' to fix TypeError</code></p>
<p><code>@mcp.resource(</code></p>
<p><code>uri="data://available-symbols",</code></p>
<p><code>name="Available Trading Symbols",</code></p>
<p><code>description="Provides a list of active trading symbols available on Binance.",</code></p>
<p><code>mime_type="application/json"</code></p>
<p><code>)</code></p>
<p><code>def get_available_symbols() -&gt; list:</code></p>
<p><code>"""</code></p>
<p><code>Fetches and returns a list of active trading symbols from Binance.</code></p>
<p><code>Returns:</code></p>
<p><code>list: A list of strings representing active symbols (e.g., ['BTCUSDT', 'ETHUSDT']).</code></p>
<p><code>"""</code></p>
<p><code>url = "</code><a target="_blank" href="https://api.binance.com/api/v3/exchangeInfo"><code>https://api.binance.com/api/v3/exchangeInfo</code></a><code>"</code></p>
<p><code>response = requests.get(url)</code></p>
<p><code>response.raise_for_status()</code></p>
<p><code>data = response.json()</code></p>
<p><code># Filter for active symbols (status == 'TRADING')</code></p>
<p><code>symbols = [s['symbol'] for s in data['symbols'] if s['status'] == 'TRADING']</code></p>
<p><code>return symbols</code></p>
<p><code>if name == "__main__":</code></p>
<p><a target="_blank" href="http://mcp.run"><code>mcp.run</code></a><code>()</code></p>
<p><strong>Sample Code 2: With MCP Resource Implementation (Local CSV File acting as the resource)</strong></p>
<p><code># Official Python MCP implementation</code></p>
<p><code># Abstracts away many complexities from the MCP-based protocol</code></p>
<p><code>from mcp.server.fastmcp import FastMCP</code></p>
<p><code>import requests</code></p>
<p><code>from typing import Any</code></p>
<p><code>import csv</code></p>
<p><code># Note: urllib3.response import is unused; consider removing it</code></p>
<p><code>from urllib3 import response</code></p>
<p><code>mcp = FastMCP("Binance MCP")</code></p>
<p><code>@mcp.tool()</code></p>
<p><code>def get_price(symbol: str) -&gt; Any:</code></p>
<p><code>"""</code></p>
<p><code>Get the current price of a crypto asset from Binance</code></p>
<p><code>Args:</code></p>
<p><code>symbol (str): The symbol of the crypto asset to get the price of (e.g., BTCUSDT)</code></p>
<p><code>Returns:</code></p>
<p><code>Any: The current price of the crypto asset</code></p>
<p><code>"""</code></p>
<p><code># Note: MCP Client should use the symbol-mapping resource to validate/convert symbol</code></p>
<p><code>url = f"</code><a target="_blank" href="https://api.binance.com/api/v3/ticker/price?symbol=%7Bsymbol%7D"><code>https://api.binance.com/api/v3/ticker/price?symbol={symbol}</code></a><code>"</code></p>
<p><code>response = requests.get(url)</code></p>
<p><code>response.raise_for_status()</code></p>
<p><code>return response.json()</code></p>
<p><code>@mcp.tool()</code></p>
<p><code>def get_price_change(symbol: str) -&gt; Any:</code></p>
<p><code>"""</code></p>
<p><code>Get the last 24 hours price change of a crypto asset from Binance</code></p>
<p><code>Args:</code></p>
<p><code>symbol (str): The symbol of the crypto asset (e.g., BTCUSDT)</code></p>
<p><code>Returns:</code></p>
<p><code>Any: The 24-hour price change data</code></p>
<p><code>"""</code></p>
<p><code># Note: MCP Client should use the symbol-mapping resource to validate/convert symbol</code></p>
<p><code>url = f"</code><a target="_blank" href="https://data-api.binance.vision/api/v3/ticker/24hr?symbol=%7Bsymbol%7D"><code>https://data-api.binance.vision/api/v3/ticker/24hr?symbol={symbol}</code></a><code>"</code></p>
<p><code>response = requests.get(url)</code></p>
<p><code>response.raise_for_status()</code></p>
<p><code>return response.json()</code></p>
<p><code># New resource: Reads symbols from a CSV file</code></p>
<p><code>@mcp.resource(</code></p>
<p><code>uri="data://crypto-symbols",</code></p>
<p><code>name="Crypto Symbols from CSV",</code></p>
<p><code>description="Provides a list of crypto trading symbols from a local CSV file.",</code></p>
<p><code>mime_type="application/json"</code></p>
<p><code>)</code></p>
<p><code>def get_crypto_symbols() -&gt; list:</code></p>
<p><code>"""</code></p>
<p><code>Reads a list of crypto trading symbols from a CSV file.</code></p>
<p><code>Returns:</code></p>
<p><code>list: A list of strings representing trading symbols (e.g., ['BTCUSDT', 'ETHUSDT']).</code></p>
<p><code>"""</code></p>
<p><code>file_path = r"C:\Users\Skugan\Desktop\github-cursor\mcp-course\crypto.csv"</code></p>
<p><code>symbols = []</code></p>
<p><code>try:</code></p>
<p><code>with open(file_path, mode='r', encoding='utf-8') as file:</code></p>
<p><code>reader = csv.DictReader(file)</code></p>
<p><code>for row in reader:</code></p>
<p><code>if 'symbol' in row:</code></p>
<p><code>symbols.append(row['symbol'])</code></p>
<p><code>except FileNotFoundError:</code></p>
<p><code>raise Exception(f"CSV file not found at {file_path}")</code></p>
<p><code>except Exception as e:</code></p>
<p><code>raise Exception(f"Error reading CSV file: {str(e)}")</code></p>
<p><code>return symbols</code></p>
<p><code># New resource: Provides symbol mapping</code></p>
<p><code>@mcp.resource(</code></p>
<p><code>uri="data://symbol-mapping",</code></p>
<p><code>name="Symbol Mapping",</code></p>
<p><code>description="Provides a mapping of crypto names/aliases to Binance trading symbols.",</code></p>
<p><code>mime_type="application/json"</code></p>
<p><code>)</code></p>
<p><code>def get_symbol_mapping() -&gt; dict:</code></p>
<p><code>"""</code></p>
<p><code>Returns a mapping of crypto names/aliases to Binance trading symbols.</code></p>
<p><code>Returns:</code></p>
<p><code>dict: A dictionary mapping names to symbols (e.g., {'bitcoin': 'BTCUSDT', 'eth': 'ETHUSDT'}).</code></p>
<p><code>"""</code></p>
<p><code>return {</code></p>
<p><code>"bitcoin": "BTCUSDT",</code></p>
<p><code>"btc": "BTCUSDT",</code></p>
<p><code>"ethereum": "ETHUSDT",</code></p>
<p><code>"eth": "ETHUSDT"</code></p>
<p><code>}</code></p>
<p><code>if name == "__main__":</code></p>
<p><a target="_blank" href="http://mcp.run"><code>mcp.run</code></a><code>()</code></p>
]]></content:encoded></item></channel></rss>