How to build a chatbot for internal docs using RAG

Hongbo Tian

Jul 7, 2025

Someone misses a meeting, another forgets where the sprint notes live, and a new team member has questions about the company's holiday policy. It happens. Teams spend way too much time searching for answers that should be readily available.

And while it’s tempting to ping that one teammate who always seems to have the answers, what if there was a better way? A way to ask questions and instantly get accurate answers from across your docs, without having to dig for 20+ minutes?

That’s exactly what we’re going to explore in this post: how to build a smart internal search experience over your company knowledge. One that helps your team move faster, stay aligned, and spend less time hunting for information.

The stack

Frontend | HTML
Backend | FastAPI
Docs source | Markdown
LLM | OpenAI, Anthropic or Groq via Ducky
Retrieval| Ducky (handles chunking, reranking, storage)

Step-by-step walkthrough

This setup reads a transcripts file and an employee handbook, loads their content into a vector store, making it searchable via semantic retrieval. When a question is asked, the system finds the most relevant snippets and passes them to the LLM for a grounded, contextual response. With the right tools, you can have this running in under 10 minutes.

Here’s a simple 5-step walkthrough to get your internal search up and running with a simple RAG setup.

Step 1 - Add your environment variables

The first step sets the required API keys for DuckyAI and the Groq LLM. These keys enable secure access to the vector store and language model for processing queries.

# API key for DuckyAI service - replace with your actual key
DUCKY_API_KEY=your-ducky-api-key-here
# API key for Groq API service - replace with your actual key
GROQ_API_KEY=your-groq-api-key-here
# Name of the index in DuckyAI - can be customized
DUCKY_INDEX_NAME=ducky-test
# Name of the model to use for Groq API - can be changed to other supported models
GROQ_MODEL_NAME=llama3-70b-8192
# Path to the transcript file - ensure this file exists or update the path
TRANSCRIPT_FILE_PATH=data/transcript.txt
# Path to the Employee Handbook file
POLICIES_FILE_PATH

Step 2 - Load your docs

The snippet first authenticates the DuckyAI service by creating a client with your API key, then uploads the text "Hello,world!" to an index called test, storing it so it can be searched or retrieved later.

# Import the DuckyAI library
from duckyai import DuckyAI
# Import os for accessing environment variables
import os
# Import dotenv for loading environment variables from a .env file
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()

# Instantiate a Ducky AI Client
# The API key is retrieved from the environment variables
client = DuckyAI(api_key=os.getenv("DUCKY_API_KEY"))

# Get DUCKY_INDEX_NAME from environment variables
ducky_index_name = os.getenv("DUCKY_INDEX_NAME")
if ducky_index_name is None:
    raise ValueError("DUCKY_INDEX_NAME environment variable not set")

# Get TRANSCRIPT_FILE_PATH from environment variables
transcript_file_path = os.getenv("TRANSCRIPT_FILE_PATH")
if transcript_file_path is None:
    raise ValueError("TRANSCRIPT_FILE_PATH environment variable not set")

# Get POLICIES_FILE_PATH from environment variables
policies_file_path = os.getenv("POLICIES_FILE_PATH")
if policies_file_path is None:
    raise ValueError("POLICIES_FILE_PATH environment variable not set")

# Initialize an empty string to store the transcript content
transcript_content= ""
# Open the transcript.txt file in read mode with utf-8 encoding
with open(transcript_file_path, 'r', encoding='utf-8') as f:
    # Read the entire content of the file into transcript_content
    transcript_content = f.read()

# Index the transcript document using the Ducky AI client
client.documents.index(
    # Specify the name of the index, this can be found in the Ducky AI dashboard
    index_name=ducky_index_name,
    # Provide the content to be indexed
    content=transcript_content,
)

# Initialize an empty string to store the policies content
policies_content= ""
# Open the policies.txt file in read mode with utf-8 encoding
with open(policies_file_path, 'r', encoding='utf-8') as f:
    # Read the entire content of the file into policies_content
    policies_content = f.read()

# Index the policies document using the Ducky AI client
client.documents.index(
    # Specify the name of the index, this can be found in the Ducky AI dashboard
    index_name=ducky_index_name,
    # Provide the content to be indexed
    content=policies_content,
)

You can index content from various sources and formats: Markdown files, Notion exports, Slack messages, or any JSON data, but for this app we decided to focus on Markdown.

Note: By default, Ducky auto-chunks and embeds.

Step 3 - Retrieve the right information

This step searches the specified index using the user's message as a query. It then returns the single most relevant matching document.

# Retrieve relevant documents from DuckyAI based on the user's message
results = client.documents.retrieve(
     index_name=index_name,  # The name of the DuckyAI index to search
     query=msg.message,      # The user's message to use as the search query
     top_k=1                 # Retrieve only the top 1 most relevant doc
    )

You can obtain your DuckyAI API key by creating a project and API key in your DuckyAI dashboard. See the Ducky docs for instructions.
DuckyAI automatically handles chunking your data, embedding each chunk, and storing them in a vector database (vector store) under the specified index. This enables efficient semantic search without requiring you to manage chunking, embedding, or vector storage manually.
This step may involve semantic search indexing or other preprocessing to make the data easily searchable by the bot.

Step 4 - Turn retrieved information into an answer

The fourth step checks if any documents were retrieved from the vector store. If found, it extracts and joins their content to form the context, which is then passed to the Groq LLM for generating a response. The model replies based on the context only if it’s relevant to the user’s query.

@app.post("/chat")
async def chat(msg: ChatMessage):
    # Retrieve relevant documents from DuckyAI based on the user's message
    results = client.documents.retrieve(
        index_name=index_name,  # The name of the DuckyAI index to search
        query=msg.message,      # The user's message to use as the search query
        top_k=1                 # Retrieve only the top 1 most relevant document
    )

    # Check if any documents were found
    if results.documents:
        # If documents are found, extract context from the first document's content_chunks
        # It joins all content_chunks into a single string.
        # If there are no content_chunks, it defaults to an empty string.
        context = " ".join(results.documents[0].content_chunks) if results.documents[0].content_chunks else ""

        # Use the Groq API to generate a chat completion (response)
        completion = groq_client.chat.completions.create(
            model=os.getenv("GROQ_MODEL_NAME", "llama3-70b-8192"),
            messages=[
            {
                "role": "system",
                "content": f"""You are a helpful assistant. 
                Always respond in markdown format.
                Use the provided context to answer questions accurately.
                For casual greetings, general conversation, or questions 
                unrelated to the context, respond naturally without referencing the context. 
                Context (use only if relevant): {context}"""
            },
            {"role": "user", "content": msg.message}
            ]
        )
        # Extract the reply from the model's response
        reply = completion.choices[0].message.content
    else:
        # If no relevant documents are found by DuckyAI, provide a default response
        reply = "Sorry, I don't know how to respond yet."

    # Return the reply as a JSON response
    return JSONResponse(content={"response": reply})

Step 5 - Create a UI for semantic search

Finally, this step defines the frontend interface for the chatbot using HTML, CSS, and JavaScript. It captures user input, sends it to the backend via a POST request, and displays the AI’s response in real-time. The chat auto-scrolls and also supports sending messages using the Enter key.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>Ducky Chat</title>

  <!-- Markdown parser -->
  <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
<!-- add style.css here -->
 <link rel="stylesheet" href="/static/styles.css" />

</head>
<body>
  <div id="chat"></div>

  <form id="inputBar">
    <input id="messageInput" type="text" placeholder="Type your message…" autocomplete="off" />
    <button id="sendBtn" type="submit">Send</button>
  </form>

  <script src="/static/script.js"></script>
</body>
</html>

Here's an example:

Why Ducky leads in RAG infrastructure

Built-in chunking: Gives higher precision and more relevant results. Ducky automatically chunks your documents in a way that balances context and cost. This means better search results, fewer hallucinations, and more precise answers without you having to fine-tune a model or guess the right chunk size.

Fully managed vector storage and retrieval: Forget about spinning up a vector DB, managing indexes, or tuning infrastructure. Ducky handles storage, retrieval, and indexing for you, so you can ship faster and skip the tedious infrastructure work.

No DevOps or ML expertise required: Our platform abstracts away the complexity of retrieval-augmented generation, so you don’t need to worry about embeddings, re-rankers, or model tuning. You get production-quality results with just a few lines of code.

Handles messy, unstructured data: Ducky handles data from multiple sources like Notion exports, meeting notes, and Slack threads. No need to pre-clean or reformat your documents before indexing.

Flexible enough to plug into any frontend: Whether you're building a Slackbot for internal Q&A or embedding semantic search into your dashboard, Ducky fits right in. It’s designed to be easy to integrate into any stack or interface.

Real world benefits

Time to deploy: Less than 10 minutes
Tokens saved: Thanks to smart context filtering
Reduced hallucination: Compared to non-retrieval baselines
Setup requires fewer than 10 lines of code

Want to build this RAG use case for your team?

Sign up and get your Ducky API key here to use this recipe and start using semantic search for your internal documents in minutes.

No fine-tuning or DevOps, just simple, accurate, semantic search that works.

Other RAG examples