All case studies
CRM Cleanup & Data Architecture

From 63K Contacts to a CRM the Sales Team Actually Uses

63K 28K
Contacts cleaned
40% 91%
Lifecycle accuracy
90 25 min
Pipeline review time
Full RevOps Diagnostic + Foundation Rebuild 6 weeks B2B SaaS · 150–200 employees HubSpot · Clay Anonymised

A CRM that grew too fast to stay useful

A B2B SaaS company had been running HubSpot since 2020. By the time we audited it, the database had grown to 63,000 contacts accumulated over 4+ years — a mix of prospects, customers, partners, bounced emails, and contacts assigned to people who had left the company years ago.

The sales team had quietly stopped trusting HubSpot. Reps were maintaining their own spreadsheets. Pipeline reviews on Friday afternoons had become storytelling sessions, not data reviews.

Nobody could answer a simple question: "How many active prospects do we have right now?"


What the audit found

Problem Detail
Total contacts63,000 — accumulated since 2020
Contacts inactive since 2024~40% of the entire database
Bounced email contactsMixed in with valid contacts, no suppression
Duplicate contactsEstimated 8–12% duplication rate
Contacts with no active owner~15% assigned to departed reps
ICP property fill rateUnder 35% (industry / title / country)
Contact type mappingNo distinction between prospect, partner, customer
Lifecycle stage accuracyLess than 40% correctly mapped
Channel segmentationZero — inbound, outbound, event all mixed together
Customer–account associationsMismatched across objects

The root cause was not rep behaviour. The CRM had been built by whoever had time — not by someone who understood what the data needed to do downstream.


What we built — and why this order matters

The order matters. Each phase enabled the next. Doing phase 4 before phase 1 would have broken everything downstream.

01
Foundation — Relationships + Ownership
  • Audited all customer–account associations — ensured every customer contact was properly linked to the correct account object
  • Mapped contact types: Prospect / Partner / Customer — enforced as a required property
  • Identified all contacts assigned to inactive or departed users — bulk reassigned to active owners
02
Volume Reduction — Remove the Noise First
  • Archived all contacts with no activity since 2024 and no open deals
  • Deleted all bounced contacts from active lists and sequences
  • Ran full deduplication — merged duplicates, preserving the most complete record
03
Data Quality — Enrich What Remains
  • Enriched remaining contacts for ICP properties: Industry, Job Title, Country
  • Filled gaps on contacts that had email and company but missing firmographic data
  • Set validation rules to enforce these properties on all new contacts going forward
04
Classification — Make It Usable
  • Mapped all contacts to correct lifecycle stage and lead status — enforced via workflow, not manual rep input
  • Built contact segmentation by channel: Inbound / Outbound / Event
  • Added proper event tagging — every event-sourced contact tagged with event name, date, and type
05
Commercial Layer — Revenue Visibility
  • Updated deal types in collaboration with the CSM team
  • Set up active vs churned status on the deal object — CS team could see customer health at deal level for the first time

Before and after

Metric Before After
Total active contacts63,000~28,000
Duplicate rate~10%Under 2%
ICP property fill rate~35%~88%
Contacts with active owner~85%100%
Contact type mapped0%100%
Lifecycle stage accuracy~40%~91%
Channel segmentation coverage0%~85%
Pipeline review time90 minutes25 minutes
Bounced contacts in sequencesUnknown (mixed in)Zero

Why the order of operations matters

"The reps were not the problem. The system was built without a clear definition of what the data needed to do. Once the architecture was right — the data started maintaining itself."

The biggest unlock was doing the work in the right order:

1
Fix associations before cleaning volume — otherwise you clean the wrong records.
2
Remove volume before enriching — enriching 63K costs 3× what enriching 28K costs.
3
Enrich before classifying — you need firmographic data to segment correctly.
4
Classify before reporting — lifecycle and channel data is the foundation of every dashboard. Most teams skip to step 4 and wonder why their dashboards lie.

Business outcomes

Sales team stopped using spreadsheets within 3 weeks of go-live
Pipeline review time dropped from 90 minutes to 25 minutes — data was trustworthy
Marketing could report MQLs by channel for the first time
CS team had live visibility into active vs churned customers at account level
Outbound sequences stopped hitting bounced contacts — deliverability improved immediately
Lead scoring became possible — ICP properties now populated for scoring model

Recognise any of this?

If your HubSpot is bloated, your reps have gone back to spreadsheets, or your pipeline report is a Friday afternoon ritual — this is the exact work I do.

Book a free 20-min HubSpot teardown →

Fixed scope. Fixed price. You'll know exactly what's broken before committing to anything.