From 63K Contacts to a CRM the Sales Team Actually Uses
63K → 28K
Contacts cleaned
40% → 91%
Lifecycle accuracy
90 → 25 min
Pipeline review time
Full RevOps Diagnostic + Foundation Rebuild6 weeksB2B SaaS · 150–200 employeesHubSpot · ClayAnonymised
The Situation
A CRM that grew too fast to stay useful
A B2B SaaS company had been running HubSpot since 2020. By the time we audited it, the database had grown to 63,000 contacts accumulated over 4+ years — a mix of prospects, customers, partners, bounced emails, and contacts assigned to people who had left the company years ago.
The sales team had quietly stopped trusting HubSpot. Reps were maintaining their own spreadsheets. Pipeline reviews on Friday afternoons had become storytelling sessions, not data reviews.
Nobody could answer a simple question: "How many active prospects do we have right now?"
Before — Audit Findings
What the audit found
Problem
Detail
Total contacts
63,000 — accumulated since 2020
Contacts inactive since 2024
~40% of the entire database
Bounced email contacts
Mixed in with valid contacts, no suppression
Duplicate contacts
Estimated 8–12% duplication rate
Contacts with no active owner
~15% assigned to departed reps
ICP property fill rate
Under 35% (industry / title / country)
Contact type mapping
No distinction between prospect, partner, customer
Lifecycle stage accuracy
Less than 40% correctly mapped
Channel segmentation
Zero — inbound, outbound, event all mixed together
Customer–account associations
Mismatched across objects
The root cause was not rep behaviour. The CRM had been built by whoever had time — not by someone who understood what the data needed to do downstream.
The Fix
What we built — and why this order matters
The order matters. Each phase enabled the next. Doing phase 4 before phase 1 would have broken everything downstream.
01
Foundation — Relationships + Ownership
Audited all customer–account associations — ensured every customer contact was properly linked to the correct account object
Mapped contact types: Prospect / Partner / Customer — enforced as a required property
Identified all contacts assigned to inactive or departed users — bulk reassigned to active owners
02
Volume Reduction — Remove the Noise First
Archived all contacts with no activity since 2024 and no open deals
Deleted all bounced contacts from active lists and sequences
Ran full deduplication — merged duplicates, preserving the most complete record
03
Data Quality — Enrich What Remains
Enriched remaining contacts for ICP properties: Industry, Job Title, Country
Filled gaps on contacts that had email and company but missing firmographic data
Set validation rules to enforce these properties on all new contacts going forward
04
Classification — Make It Usable
Mapped all contacts to correct lifecycle stage and lead status — enforced via workflow, not manual rep input
Built contact segmentation by channel: Inbound / Outbound / Event
Added proper event tagging — every event-sourced contact tagged with event name, date, and type
05
Commercial Layer — Revenue Visibility
Updated deal types in collaboration with the CSM team
Set up active vs churned status on the deal object — CS team could see customer health at deal level for the first time
After — What Changed
Before and after
Metric
Before
After
Total active contacts
63,000
~28,000
Duplicate rate
~10%
Under 2%
ICP property fill rate
~35%
~88%
Contacts with active owner
~85%
100%
Contact type mapped
0%
100%
Lifecycle stage accuracy
~40%
~91%
Channel segmentation coverage
0%
~85%
Pipeline review time
90 minutes
25 minutes
Bounced contacts in sequences
Unknown (mixed in)
Zero
The Key Insight
Why the order of operations matters
"The reps were not the problem. The system was built without a clear definition of what the data needed to do. Once the architecture was right — the data started maintaining itself."
The biggest unlock was doing the work in the right order:
1
Fix associations before cleaning volume — otherwise you clean the wrong records.
2
Remove volume before enriching — enriching 63K costs 3× what enriching 28K costs.
3
Enrich before classifying — you need firmographic data to segment correctly.
4
Classify before reporting — lifecycle and channel data is the foundation of every dashboard. Most teams skip to step 4 and wonder why their dashboards lie.
What This Unlocked
Business outcomes
Sales team stopped using spreadsheets within 3 weeks of go-live
Pipeline review time dropped from 90 minutes to 25 minutes — data was trustworthy
Marketing could report MQLs by channel for the first time
CS team had live visibility into active vs churned customers at account level
Lead scoring became possible — ICP properties now populated for scoring model
Recognise any of this?
If your HubSpot is bloated, your reps have gone back to spreadsheets, or your pipeline report is a Friday afternoon ritual — this is the exact work I do.