Building a RAG Application: Lessons from Customer Support

In customer support, scattered information slows teams down and frustrates customers. Imagine if your support team could instantly retrieve accurate, relevant responses from a unified knowledge base. That’s what we set out to build at Ducky, leveraging Retrieval-Augmented Generation (RAG).

Our goal with Ducky was to create a seamless integration with knowledge bases to provide quick and coherent responses. During the last year, we discovered four key insights that are crucial for developing effective RAG applications. Here’s what we found and what every RAG developer should consider.

Lesson 1: Accurate Information Retrieval is Key to Success

It's simple - garbage in, garbage out; good retrieval is a prerequisite for generation. Without precise retrieval, the RAG system risks pulling in irrelevant, outdated, or incorrect information, resulting in responses that miss the mark or even mislead. For a customer support team, this means increased frustration, wasted time, and even worse, telling customers the wrong thing. The two critical factors we prioritize are coverage and efficiency.

Coverage is about ensuring that all necessary information is accessible. In customer support, this often means drawing from multiple sources: previous support resolutions, internal knowledge bases, engineering logs, or even structured data like customer accounts or shipment details. For Ducky, we had to build a system that integrates across these sources to ensure comprehensive coverage, allowing us to retrieve relevant information no matter where it’s stored. Without this information coverage, critical data might be left out making it impossible to generate accurate responses.

Efficiency speaks to how well the retrieval system filters out noise and targets the essential information. For Ducky, this meant ensuring that, even in large datasets (100k+ support tickets), our retrieval system could zero in on the specific tickets relevant to each query. Since the hardest support tickets may have a low relevance ratio (like 1 in 10,000), it was essential to build a retrieval system that could sift through duplicates and prioritize the most relevant entries.

Lesson 2: Develop Quality Metrics as Soon as Possible

In building RAG applications, it’s tempting to assess performance based on standout cases. But to gain a reliable view of quality, it’s critical to establish quantitative metrics early. Moving to quantitative metrics gave us a clear and actionable view of generation quality, which in turn informed our development direction with focus.

For customer support, this meant identifying specific metrics tied to business impact. We started with hallucination rates, ensuring the system wasn’t generating incorrect information. Then measuring retrieval quality allowed us to track and benchmark accuracy across all queries objectively. We measured everything from the most basic metrics such as content length to more complex subjects such as tone and style. All of which helped us build a holistic but nuanced understanding of system performance, beyond individual examples.

Lesson 3: Prioritize Key Needs in the RAG system

When building a RAG application, it’s crucial to identify the primary type of problem it needs to solve. For us, we broke it down by knowledge, reasoning, or execution. This understanding directed our development toward the capabilities that matter most.

Knowledge is about accessing the right information. In customer support, no two companies operate in the same way, thus Ducky needed extensive access to each business’s unique policies, workflows, and past resolutions. Without this, even the best LLM would lack relevance. We needed to ensure good coverage by building integrations with multiple sources and good retrieval efficiency by improving how we index and search.

Reasoning involves making accurate deductions from available information. While complex reasoning may be essential in some fields, we were pleasantly surprised by the existing reasoning capability of LLMs for the customer support use case; previous resolutions offered enough precedent to solve a significant number of cases.

Execution is the ability to act, often involving backend systems to manual work. For support agents, this can mean issuing refunds or updating customer accounts. However, building a reliable execution system requires critical thinking on what can be automated and what demands human oversight. For example, in customer support, tasks like tagging and data entry, or creating bug report tickets are ideal candidates for automation. While actions that involve sensitive data like payment processing or customer memberships would require either human supervision or flawless precision.

I encourage you to consider which of these areas your business problems depend on the most. If the problem depends heavily on domain-specific knowledge, then focus on retrieval quality. If the problem requires complex reasoning, then experiment with generative models and reasoning agents. If the problem requires execution, then you need to think about how to codify the actions.

Lesson 4: Communication Tone drastically impacts usability

One of the first challenges in customer support is generating responses that not only provide correct information but also match the expected tone. Early on, we noticed that while Ducky generated correct answers, it struggled to capture the way a support agent would actually communicate with customers. This issue was critical; without the right tone, agents would need to rewrite messages, nullifying the value of automatically generating responses.

It's not just what you say, but also how you say it. Tone is nuanced. Some companies have a defined brand voice, specifying everything from formal language to the use of emojis in customer interactions. Even within the same team, agents can vary in how they prefer to write. Good generation had to reflect the tone that agents would naturally use. Luckily, LLMs already have this capability to mimic tone; even the earliest models can write in diverse styles.

To harness this, we developed a tone engine for Ducky. Although this felt like a hack in some ways, we saw unreasonable improvements in response acceptance and software usability, even without major changes to the core retrieval or generative models.

Building a RAG system for customer support is as much about understanding the problem as implementing the technology. Setting up retrieval systems, prioritizing metrics, and ensuring that generative responses match the desired tone are essential parts in our development of Ducky. However, this is just one approach. As you design your own RAG system, you may find yourself drawn to different priorities depending on your unique challenges. By applying these principles thoughtfully, you’re well on your way to creating a RAG application that’s both powerful and adaptable.

In customer support, scattered information slows teams down and frustrates customers. Imagine if your support team could instantly retrieve accurate, relevant responses from a unified knowledge base. That’s what we set out to build at Ducky, leveraging Retrieval-Augmented Generation (RAG).

Our goal with Ducky was to create a seamless integration with knowledge bases to provide quick and coherent responses. During the last year, we discovered four key insights that are crucial for developing effective RAG applications. Here’s what we found and what every RAG developer should consider.

Lesson 1: Accurate Information Retrieval is Key to Success

It's simple - garbage in, garbage out; good retrieval is a prerequisite for generation. Without precise retrieval, the RAG system risks pulling in irrelevant, outdated, or incorrect information, resulting in responses that miss the mark or even mislead. For a customer support team, this means increased frustration, wasted time, and even worse, telling customers the wrong thing. The two critical factors we prioritize are coverage and efficiency.

Coverage is about ensuring that all necessary information is accessible. In customer support, this often means drawing from multiple sources: previous support resolutions, internal knowledge bases, engineering logs, or even structured data like customer accounts or shipment details. For Ducky, we had to build a system that integrates across these sources to ensure comprehensive coverage, allowing us to retrieve relevant information no matter where it’s stored. Without this information coverage, critical data might be left out making it impossible to generate accurate responses.

Efficiency speaks to how well the retrieval system filters out noise and targets the essential information. For Ducky, this meant ensuring that, even in large datasets (100k+ support tickets), our retrieval system could zero in on the specific tickets relevant to each query. Since the hardest support tickets may have a low relevance ratio (like 1 in 10,000), it was essential to build a retrieval system that could sift through duplicates and prioritize the most relevant entries.

Lesson 2: Develop Quality Metrics as Soon as Possible

In building RAG applications, it’s tempting to assess performance based on standout cases. But to gain a reliable view of quality, it’s critical to establish quantitative metrics early. Moving to quantitative metrics gave us a clear and actionable view of generation quality, which in turn informed our development direction with focus.

For customer support, this meant identifying specific metrics tied to business impact. We started with hallucination rates, ensuring the system wasn’t generating incorrect information. Then measuring retrieval quality allowed us to track and benchmark accuracy across all queries objectively. We measured everything from the most basic metrics such as content length to more complex subjects such as tone and style. All of which helped us build a holistic but nuanced understanding of system performance, beyond individual examples.

Lesson 3: Prioritize Key Needs in the RAG system

When building a RAG application, it’s crucial to identify the primary type of problem it needs to solve. For us, we broke it down by knowledge, reasoning, or execution. This understanding directed our development toward the capabilities that matter most.

Knowledge is about accessing the right information. In customer support, no two companies operate in the same way, thus Ducky needed extensive access to each business’s unique policies, workflows, and past resolutions. Without this, even the best LLM would lack relevance. We needed to ensure good coverage by building integrations with multiple sources and good retrieval efficiency by improving how we index and search.

Reasoning involves making accurate deductions from available information. While complex reasoning may be essential in some fields, we were pleasantly surprised by the existing reasoning capability of LLMs for the customer support use case; previous resolutions offered enough precedent to solve a significant number of cases.

Execution is the ability to act, often involving backend systems to manual work. For support agents, this can mean issuing refunds or updating customer accounts. However, building a reliable execution system requires critical thinking on what can be automated and what demands human oversight. For example, in customer support, tasks like tagging and data entry, or creating bug report tickets are ideal candidates for automation. While actions that involve sensitive data like payment processing or customer memberships would require either human supervision or flawless precision.

I encourage you to consider which of these areas your business problems depend on the most. If the problem depends heavily on domain-specific knowledge, then focus on retrieval quality. If the problem requires complex reasoning, then experiment with generative models and reasoning agents. If the problem requires execution, then you need to think about how to codify the actions.

Lesson 4: Communication Tone drastically impacts usability

One of the first challenges in customer support is generating responses that not only provide correct information but also match the expected tone. Early on, we noticed that while Ducky generated correct answers, it struggled to capture the way a support agent would actually communicate with customers. This issue was critical; without the right tone, agents would need to rewrite messages, nullifying the value of automatically generating responses.

It's not just what you say, but also how you say it. Tone is nuanced. Some companies have a defined brand voice, specifying everything from formal language to the use of emojis in customer interactions. Even within the same team, agents can vary in how they prefer to write. Good generation had to reflect the tone that agents would naturally use. Luckily, LLMs already have this capability to mimic tone; even the earliest models can write in diverse styles.

To harness this, we developed a tone engine for Ducky. Although this felt like a hack in some ways, we saw unreasonable improvements in response acceptance and software usability, even without major changes to the core retrieval or generative models.

Building a RAG system for customer support is as much about understanding the problem as implementing the technology. Setting up retrieval systems, prioritizing metrics, and ensuring that generative responses match the desired tone are essential parts in our development of Ducky. However, this is just one approach. As you design your own RAG system, you may find yourself drawn to different priorities depending on your unique challenges. By applying these principles thoughtfully, you’re well on your way to creating a RAG application that’s both powerful and adaptable.