There are a ton of RAG-as-a-service / Knowledge Bases for AI, but none tackle GTM data well. GTM data is a unique challenge that requires specific search index and retrieval design to cover common use cases. In this post, we cover the challenges and how we address them at DealPage.
When I first started recording my sales meetings and maintaining a CRM, the sheer amount of data I generated quickly became overwhelming. With a mountain of emails, transcripts, notes, and CRM entries, finding the right information in the haystack was a huge time sink. This not only slowed down my meeting prep but also made creating collateral a time intensive task.
The Problem
For larger sales teams, the problem intensifies. Each deal can involve multiple stakeholders, each generating more data. Managing and utilizing this data efficiently requires a mix of manual labor, complex operational rules, and expensive software solutions.
As a result, teams spend countless hours in 1:1s, reviews, sync-ups, and fact-finding sessions, just to prepare for these meetings. This scattered approach often leads to missed insights, such as why deals are being lost, which can significantly impact the team's success.
Before founding DealPage and joining YC, I worked as a PM. It was really difficult to understand how prospects were reacting to the products and features we were developing because accessing sales team data was so cumbersome. I had to rely on 1:1s with SEs and other slow, unreliable methods.
The Market Gap and the Unique Challenge
From an outside perspective, this seems like a problem for enterprise search tools and RAG-as-a-service platforms (e.g. Glean or Guru), but none have effectively tackled the specific challenges of GTM data. This blog post briefly explores the unique challenges and our approach.
The Unique Challenges of GTM Data
Before jumping into specifics - why is this important?
In our work with customers, we've found that sales teams are already using ChatGPT for tasks like drafting emails, proposals, and documents. However, AI lacks the context needed to make a substantial impact, often requiring just as much time to paste in context, craft prompts, and edit outputs.
This led us to consider RAG (Retrieval-Augmented Generation as a solution- but so far, no products have emerged that can handle the complexity and uniqueness of GTM data.
Comparing Knowledge Bases in Notion or Google Drive with the typical GTM stack (email, CRM, meeting transcripts, notes, Slack, documents, contracts, etc.) you can see the scale of the challenge.
GTM data is difficult to embed for RAG for a few reasons:
Multiple Tools and Data Sources: Sales data comes from various platforms, making it hard to centralize.
Diverse Data Handling: Different data types (emails, notes, transcripts) need specific chunking and retrieval strategies.
Frequent Data Evolution: Sales data changes often, requiring constant updates via webhooks and pipelines.
Poor CRM Search: CRMs often have inadequate search functionalities.
Opportunity-Specific Context: Each deal has unique context that can’t be mixed with others.
How Standard RAG Works
Index Design:
Common Approach: In a typical RAG system, all documents from various sources like CRMs, meeting transcripts, and emails are indexed together. This usually involves breaking down (chunking) the data into manageable pieces, regardless of the source or context. The resulting index serves as a single bucket of vectors, enabling quick search across the entire dataset.
Potential Issues: This one-size-fits-all approach can lead to significant problems:
Context Loss: Chunking strategies that don't differentiate between data types can strip away crucial context. For example, an email chain discussing a specific aspect of a deal might get split into isolated chunks, losing the thread of conversation and its relevance.
Data Overlap: Important distinctions between types of data, such as customer support emails versus sales negotiation transcripts, can be lost. This can result in irrelevant data being retrieved, complicating the search for specific information.
Sync:
Common Approach: Periodic updates are employed to keep the indexed data fresh. However, this update frequency might not match the dynamic nature of GTM data, which can change rapidly with new emails, meeting notes, and CRM updates.
Potential Issues:
Outdated Information: In fast-paced sales environments, data becomes outdated quickly. For instance, a sales rep might refer to a product's feature set based on an old index, while recent changes haven't been updated yet, leading to inaccurate information being presented to a prospect.
Recency Filtering: For some queries (e.g. what questions did they ask in the last meeting) you need only the latest content from transcripts. For others, (e.g. who are the decision makers), you need to search through all content. Simple RAG doesn't allow this.
Search:
Common Approach: A general search query will retrieve results from the entire indexed dataset, often without prioritizing the most relevant or recent information. This lack of specificity can lead to a wide range of problems.
Potential Issues:
Irrelevant Results: For example, searching for "next steps for deal X" might pull up irrelevant past communications or outdated status updates from different deals. This not only wastes time but can also lead to confusion and miscommunication.
No Prioritization: The inability to prioritize the latest information means that users might retrieve outdated documents, missing out on the most recent developments. This is particularly problematic in sales, where timing and the latest data are crucial for strategy and decision-making.
Cross-Contamination of Results: Without clear boundaries between different deals and types of data, searches can mix up information from separate contexts. For instance, insights meant for one deal could be mistakenly applied to another, potentially leading to inappropriate recommendations or actions.
How DealPage Addresses These Issues
Index Design:
Each deal’s context is isolated in its own bucket, preventing data contamination.
We treat each data type uniquely—for example, emails are chunked into threads, labeled by type, and include metadata for date.
Your CRM data, shared collateral, emails, transcripts, notes, and more all live in one dedicated search index that the AI is trained to write queries for
Sync:
Integrations allow data in your Deal Libraries to update automatically in real time
Custom webhooks keep everything in sync with common CRM and email events
Search:
Our model first selects the appropriate Deals, based on the query.
Users can filter deals by name, size, industry, line items, and more
Once the model confirms that it has found the right deals, the AI searches within selected deals for specific types of resources or recency.
We enable multi-step queries that allow you to quickly generate pipeline-level insights with the specificity of a semantic search
Want to see how it all works? Check out our video demo.
What This Enables You to Do
With DealPage, you can effortlessly prep for meetings, generate insights and reports on deals in your pipeline, and create various types of sales materials, like proposals, documents, emails, and presentations. And that's just the beginning.
By streamlining data management and making relevant information easily accessible, we empower sales teams to focus on what matters most—closing deals and driving success.