How to build semantic search over Hubspot in less than an hour
How to build semantic search over Hubspot in less than an hour

Picture this: You open HubSpot to find a quick answer: What’s the latest update on the Client A deal?
Instead of a direct answer, you get lots of tabs, including a contact record, the company page, an email thread from February, and a custom object labeled “Q4 Playbook – Client A” created by someone who has since left the company.
Each one holds a piece of the question, but none of them tells you the full story.
So, you try to get more specific and type “client A negotiation terms” into the search bar. But it lists out every mention of “terms” across your CRM. Again, not helpful.
As a result, you start scrolling. And clicking. And copy-pasting. Ten minutes later, you still don’t have an answer.
It’s not a data problem. It’s a retrieval problem. HubSpot knows everything, but it can’t tell you anything (not without searching for an hour at least). That’s why we built a semantic search for HubSpot. So you can finally talk to your sales data and ask it questions like you would a teammate and it answers questions like a teammate would.
Here’s your step-by-step guide to get from CRM chaos to clarity in under an hour, without custom scripts, ETL pipelines, or vector DB guesswork.
Stack overview
Component | Purpose |
---|---|
Python | Core scripting and orchestration |
Hubspot SDK | Access and interact with Hubspot data |
Ducky API | Semantic search, embeddings, retrieval |
API Framework (optional) | Build integration endpoints (e.g., FastAPI/Flask) |
Slack API | Slack API (frontend integration) | slack-bolt and slack-sdk |
Step-by-step walkthrough of building semantic search over Hubspot
Here is a step-by-step walkthrough of building semantic search over Hubspot. You can apply this to any CRM you have, but for this walkthrough, we have chosen Hubspot as an example.
1. Set up your environment
Start by preparing your local machine by installing dependencies and configuring API credentials for HubSpot and Ducky.
Install Python and required libraries:
pip install hubspot-api-client ducky-client
Obtain API credentials for:
HubSpot
Ducky
2. Connect to HubSpot
Once your environment is ready, use the HubSpot SDK to authenticate and fetch CRM data (contacts, deals, notes, etc.).
from hubspot import HubSpot hs = HubSpot(api_key="YOUR_KEY")
Example: Fetching and exporting HubSpot data
You can also use the included fetch_hubspot.py script to handle pagination, associations, and CSV output automatically:
python src/fetch_hubspot.py # Generates data/hubspot_multi_with_activities.csv with deals, contacts, companies, and activities
Understanding the CSV columns
This example outputs a CSV with fields like Deal ID, Deal Name, Amount, Deal Stage, Pipeline, Close Date, Contact ID, Activity Type, Activity Date, Activity Subject, and Activity Body. You can modify the csv_headers list in fetch_hubspot.py to fetch any HubSpot properties you need.
Snippet from fetch_hubspot.py showing header definition and row construction:
csv_headers = [ 'Deal ID', 'Deal Name', 'Amount', 'Deal Stage', 'Pipeline', 'Close Date', # ... other contact/company fields ... 'Activity ID', 'Activity Type', 'Activity Date', 'Activity Subject', 'Activity Body' ] # When writing rows: row = { 'Deal ID': deal.id, 'Deal Name': get_property_value(deal, 'dealname'), # ... 'Activity Date': format_date(get_property_value(activity_details, 'hs_timestamp'), '%m/%d/%Y %H:%M'), 'Activity Subject': get_property_value(activity_details, 'hs_call_title') or get_property_value(activity_details, 'hs_meeting_title'), 'Activity Body': get_property_value(activity_details, f"hs_{activity_type}_body"), }
3. Extract and preprocess data
The next step is to select and clean relevant CRM fields… or don't. Ducky will do the chunking for you.
Select relevant fields:
Notes
Emails
Deal descriptions
Optional cleaning of data before sending to Ducky.
Quick tip: The fetch_hubspot.py script already formats and writes a comprehensive CSV file (hubspot_multi_with_activities.csv) with all necessary properties, so you can jump straight into preprocessing.
4. Send data to Ducky for semantic indexing
Use the Ducky API to upload your CRM data. Ducky will handle:
Embedding generation
Vector storage
Semantic search infrastructure, including chunking & smart re-ranking
Example: Indexing HubSpot activities
Here’s a Python snippet that reads your HubSpot activity CSV and indexes each record:
from duckyai import DuckyAI import os from dotenv import load_dotenv import csv # Load environment variables load_dotenv() client = DuckyAI(api_key=os.getenv("DUCKY_API_KEY")) index_name = os.getenv("DUCKY_INDEX_NAME") with open("data/hubspot_multi_with_activities.csv", encoding="utf-8") as f: reader = csv.DictReader(f) for row in reader: content = f"Deal: {row['Deal Name']} (ID: {row['Deal ID']})\n..." metadata = {"deal_id": row['Deal ID'], "activity_id": row['Activity ID'], /* ...other fields... */} client.documents.index( index_name=index_name, content=content, metadata=metadata, )
5. Query Ducky for semantic search
Use the Ducky API to send search queries and receive relevant CRM records. Integrate results into your workflow, dashboard, or assistant as needed.
6. Integration with Slack as UI
Use Slack Bolt (Socket Mode) to listen for mentions and forward messages to your FastAPI endpoint:
import os, threading, requests from slack_bolt import App as SlackApp from slack_bolt.adapter.socket_mode import SocketModeHandler # Load environment variables from dotenv import load_dotenv load_dotenv() # Initialize Bolt app bolt = SlackApp(token=os.getenv("SLACK_BOT_TOKEN")) app_token = os.getenv("SLACK_APP_TOKEN") @bolt.event("app_mention") def handle_mention(body, say): user_text = body["event"]["text"].split(maxsplit=1)[1] # Call local FastAPI /chat endpoint resp = requests.post( "http://localhost:8000/chat", json={"message": user_text}, timeout=15 ).json() reply = resp.get("response", "Sorry, I don't know how to respond yet.") say(text=reply, thread_ts=body["event"]["ts"]) # Start Bolt in background def _start_bolt(): SocketModeHandler(bolt, app_token).start() threading.Thread(target=_start_bolt, daemon=True).start()
This runs the Slack listener alongside your FastAPI server, enabling a fully interactive Slack UI. Feel free to customize event handlers or add interactive components as needed.
7. Test and iterate
Finally, test your setup with real queries. Use the feedback to fine-tune your data extraction and indexing to improve results over time.

How Ducky takes you from Hubspot chaos to search clarity in under an hour
Adding semantic search to your CRM doesn't have to be complicated. Ducky is built to remove the usual roadblocks like data cleanup, infrastructure overhead, ML complexity, making advanced search and retrieval simple..
Here's how it all comes together in under an hour.
Doesn't require data cleaning
No need to clean data, write custom chunkers, or fiddle with vector DBs. Ducky is designed to work with the real-world messiness of CRM systems. Instead of forcing you to normalize and clean everything before it’s usable, Ducky ingests your raw data as-is. Just use the API or SDK to upload your data, and Ducky will chunk, embed, and index it automatically.
Handles the messiness of your CRM
CRM records usually include notes scribbled during calls, half-filled custom objects, and fragmented timelines. Ducky handles it all. It automatically chunks and indexes any format within a vector database for you, so your search actually works across the messy middle.
Ship in under an hour
With Ducky, there’s no waiting on infrastructure setup, provisioning databases, or spending days writing glue code. You can pull your HubSpot data, send it to our API, and start querying almost immediately. This means faster iteration, faster user feedback, and faster value delivery.
No ML expertise required
You don’t need to understand the nuances of machine learning to deliver powerful search. Ducky abstracts complex retrieval logic like reranking, embedding selection, and vector search tuning. You get high-quality semantic results without touching a single model. Just plug in your data and let Ducky handle the rest
Works for any CRM
Finally, Ducky allows you to build a semantic search for your specific CRM, whether it’s HubSpot, Salesforce, or something custom.
While this walkthrough focused on HubSpot, the same approach applies to other CRMs too. Ducky’s infrastructure is flexible enough to handle a wide variety of CRM tools and internal data sources without changes to your underlying stack.
Need help with your specific CRM? Reach out to our experts and set up semantic search in less than an hour.
Or, get your Ducky API key and hook up your Hubspot to semantic search.
Other RAG example you can build today
Here are some other practical ways teams are already using Ducky today.
Picture this: You open HubSpot to find a quick answer: What’s the latest update on the Client A deal?
Instead of a direct answer, you get lots of tabs, including a contact record, the company page, an email thread from February, and a custom object labeled “Q4 Playbook – Client A” created by someone who has since left the company.
Each one holds a piece of the question, but none of them tells you the full story.
So, you try to get more specific and type “client A negotiation terms” into the search bar. But it lists out every mention of “terms” across your CRM. Again, not helpful.
As a result, you start scrolling. And clicking. And copy-pasting. Ten minutes later, you still don’t have an answer.
It’s not a data problem. It’s a retrieval problem. HubSpot knows everything, but it can’t tell you anything (not without searching for an hour at least). That’s why we built a semantic search for HubSpot. So you can finally talk to your sales data and ask it questions like you would a teammate and it answers questions like a teammate would.
Here’s your step-by-step guide to get from CRM chaos to clarity in under an hour, without custom scripts, ETL pipelines, or vector DB guesswork.
Stack overview
Component | Purpose |
---|---|
Python | Core scripting and orchestration |
Hubspot SDK | Access and interact with Hubspot data |
Ducky API | Semantic search, embeddings, retrieval |
API Framework (optional) | Build integration endpoints (e.g., FastAPI/Flask) |
Slack API | Slack API (frontend integration) | slack-bolt and slack-sdk |
Step-by-step walkthrough of building semantic search over Hubspot
Here is a step-by-step walkthrough of building semantic search over Hubspot. You can apply this to any CRM you have, but for this walkthrough, we have chosen Hubspot as an example.
1. Set up your environment
Start by preparing your local machine by installing dependencies and configuring API credentials for HubSpot and Ducky.
Install Python and required libraries:
pip install hubspot-api-client ducky-client
Obtain API credentials for:
HubSpot
Ducky
2. Connect to HubSpot
Once your environment is ready, use the HubSpot SDK to authenticate and fetch CRM data (contacts, deals, notes, etc.).
from hubspot import HubSpot hs = HubSpot(api_key="YOUR_KEY")
Example: Fetching and exporting HubSpot data
You can also use the included fetch_hubspot.py script to handle pagination, associations, and CSV output automatically:
python src/fetch_hubspot.py # Generates data/hubspot_multi_with_activities.csv with deals, contacts, companies, and activities
Understanding the CSV columns
This example outputs a CSV with fields like Deal ID, Deal Name, Amount, Deal Stage, Pipeline, Close Date, Contact ID, Activity Type, Activity Date, Activity Subject, and Activity Body. You can modify the csv_headers list in fetch_hubspot.py to fetch any HubSpot properties you need.
Snippet from fetch_hubspot.py showing header definition and row construction:
csv_headers = [ 'Deal ID', 'Deal Name', 'Amount', 'Deal Stage', 'Pipeline', 'Close Date', # ... other contact/company fields ... 'Activity ID', 'Activity Type', 'Activity Date', 'Activity Subject', 'Activity Body' ] # When writing rows: row = { 'Deal ID': deal.id, 'Deal Name': get_property_value(deal, 'dealname'), # ... 'Activity Date': format_date(get_property_value(activity_details, 'hs_timestamp'), '%m/%d/%Y %H:%M'), 'Activity Subject': get_property_value(activity_details, 'hs_call_title') or get_property_value(activity_details, 'hs_meeting_title'), 'Activity Body': get_property_value(activity_details, f"hs_{activity_type}_body"), }
3. Extract and preprocess data
The next step is to select and clean relevant CRM fields… or don't. Ducky will do the chunking for you.
Select relevant fields:
Notes
Emails
Deal descriptions
Optional cleaning of data before sending to Ducky.
Quick tip: The fetch_hubspot.py script already formats and writes a comprehensive CSV file (hubspot_multi_with_activities.csv) with all necessary properties, so you can jump straight into preprocessing.
4. Send data to Ducky for semantic indexing
Use the Ducky API to upload your CRM data. Ducky will handle:
Embedding generation
Vector storage
Semantic search infrastructure, including chunking & smart re-ranking
Example: Indexing HubSpot activities
Here’s a Python snippet that reads your HubSpot activity CSV and indexes each record:
from duckyai import DuckyAI import os from dotenv import load_dotenv import csv # Load environment variables load_dotenv() client = DuckyAI(api_key=os.getenv("DUCKY_API_KEY")) index_name = os.getenv("DUCKY_INDEX_NAME") with open("data/hubspot_multi_with_activities.csv", encoding="utf-8") as f: reader = csv.DictReader(f) for row in reader: content = f"Deal: {row['Deal Name']} (ID: {row['Deal ID']})\n..." metadata = {"deal_id": row['Deal ID'], "activity_id": row['Activity ID'], /* ...other fields... */} client.documents.index( index_name=index_name, content=content, metadata=metadata, )
5. Query Ducky for semantic search
Use the Ducky API to send search queries and receive relevant CRM records. Integrate results into your workflow, dashboard, or assistant as needed.
6. Integration with Slack as UI
Use Slack Bolt (Socket Mode) to listen for mentions and forward messages to your FastAPI endpoint:
import os, threading, requests from slack_bolt import App as SlackApp from slack_bolt.adapter.socket_mode import SocketModeHandler # Load environment variables from dotenv import load_dotenv load_dotenv() # Initialize Bolt app bolt = SlackApp(token=os.getenv("SLACK_BOT_TOKEN")) app_token = os.getenv("SLACK_APP_TOKEN") @bolt.event("app_mention") def handle_mention(body, say): user_text = body["event"]["text"].split(maxsplit=1)[1] # Call local FastAPI /chat endpoint resp = requests.post( "http://localhost:8000/chat", json={"message": user_text}, timeout=15 ).json() reply = resp.get("response", "Sorry, I don't know how to respond yet.") say(text=reply, thread_ts=body["event"]["ts"]) # Start Bolt in background def _start_bolt(): SocketModeHandler(bolt, app_token).start() threading.Thread(target=_start_bolt, daemon=True).start()
This runs the Slack listener alongside your FastAPI server, enabling a fully interactive Slack UI. Feel free to customize event handlers or add interactive components as needed.
7. Test and iterate
Finally, test your setup with real queries. Use the feedback to fine-tune your data extraction and indexing to improve results over time.

How Ducky takes you from Hubspot chaos to search clarity in under an hour
Adding semantic search to your CRM doesn't have to be complicated. Ducky is built to remove the usual roadblocks like data cleanup, infrastructure overhead, ML complexity, making advanced search and retrieval simple..
Here's how it all comes together in under an hour.
Doesn't require data cleaning
No need to clean data, write custom chunkers, or fiddle with vector DBs. Ducky is designed to work with the real-world messiness of CRM systems. Instead of forcing you to normalize and clean everything before it’s usable, Ducky ingests your raw data as-is. Just use the API or SDK to upload your data, and Ducky will chunk, embed, and index it automatically.
Handles the messiness of your CRM
CRM records usually include notes scribbled during calls, half-filled custom objects, and fragmented timelines. Ducky handles it all. It automatically chunks and indexes any format within a vector database for you, so your search actually works across the messy middle.
Ship in under an hour
With Ducky, there’s no waiting on infrastructure setup, provisioning databases, or spending days writing glue code. You can pull your HubSpot data, send it to our API, and start querying almost immediately. This means faster iteration, faster user feedback, and faster value delivery.
No ML expertise required
You don’t need to understand the nuances of machine learning to deliver powerful search. Ducky abstracts complex retrieval logic like reranking, embedding selection, and vector search tuning. You get high-quality semantic results without touching a single model. Just plug in your data and let Ducky handle the rest
Works for any CRM
Finally, Ducky allows you to build a semantic search for your specific CRM, whether it’s HubSpot, Salesforce, or something custom.
While this walkthrough focused on HubSpot, the same approach applies to other CRMs too. Ducky’s infrastructure is flexible enough to handle a wide variety of CRM tools and internal data sources without changes to your underlying stack.
Need help with your specific CRM? Reach out to our experts and set up semantic search in less than an hour.
Or, get your Ducky API key and hook up your Hubspot to semantic search.
Other RAG example you can build today
Here are some other practical ways teams are already using Ducky today.
No credit card required - we have a generous free tier to support builders