CRM Data Hygiene · Account Hierarchy

One company. 450 contacts.
No structure underneath.

Contacts were associated with duplicate company records. No parent-child hierarchy. No regional ownership. No way to manage enterprise accounts at scale. Here is how we built an agent to fix it — automatically.

500+
Records fixed automatically
4-tier
Fallback logic — zero manual guesswork
Zero
Incorrect merges — confidence tiers prevented bad data
CRM Data Hygiene Account Hierarchy HubSpot API · Python Enterprise B2B Anonymised

The Situation

The problem no one talks about: duplicate company associations.

Take an enterprise account with 450 contacts in the CRM. One company — but spread across multiple duplicate company records, no regional structure, and no hierarchy. Contacts associated with two different company records simultaneously. Deals owned by the wrong region. Account-based reporting giving nonsense numbers.

This is not a one-off data entry error. It is the default state of almost every CRM that has been running for more than two years without active governance. HubSpot auto-creates company records from email domains. Enrichment tools add more. Reps create their own. And nobody cleans up.

The result: a contact like travis.lock@cmtgroup.co.uk ends up associated with two separate company records — cmt.co.uk and cmtgroup.global — that are the same organisation. The CRM thinks they are different companies. The reports are wrong. The sales team is confused. And nobody has time to fix 500 records manually.

Why This Happens

It is not a data entry problem. It is a structural one.

At enterprise scale, a company like Wates does not have one domain. It has a global parent, regional subsidiaries, and country-specific domains — all representing the same group. The CRM has no idea they are related unless someone tells it.

What the CRM seesWhat is actually true
wates.com — standalone companyGlobal parent — the canonical brand
wates.co.uk — separate companyUK subsidiary — child of wates.com
wates.ae — separate companyMiddle East subsidiary — child of wates.com
wates.in — separate companyIndia subsidiary — child of wates.com
Contact with @wates.co.uk email — associated with 2 companiesShould be under wates.co.uk only — rolls up to parent automatically

Without this structure, you cannot assign contacts by region, report by territory, or manage account ownership cleanly. You are flying blind on your most important accounts.

The Agent Logic

A decision engine, not a script.

The key insight was that you cannot brute-force this with a simple dedup rule. Every company record needs to be evaluated on its own signals before any action is taken. We built a decision engine that works through a defined signal hierarchy before touching a single record.

START — Find all contacts with 2+ company associations
↓ Pull both company records (name, domain, URL)
↓ Check Company A domain (Primary) — is the URL live?
URL WORKING → Does it redirect to Company B? — If yes, merge into B (parent)
URL DEAD → Check Company B domain
Both dead → Flag for manual review
Company B live → Check contact LinkedIn company URL
↓ Does LinkedIn company URL = Company B domain?
NO → Flag for manual review
YES → Set Company B as Parent, Company A as Child, remove direct B association from contact

The redirect check was the most important addition. If cmt.co.uk simply redirects to cmtgroup.global, there is no need to create a parent-child relationship. The contact belongs directly on the parent. No child record needed, no extra hierarchy layer, clean CRM.

Regional Segmentation

How contacts get assigned to the right child company.

Once the parent is confirmed, the agent needs to decide which child company each contact belongs to. We built a four-tier fallback so that every contact lands somewhere — with no gaps and no ambiguity.

Signal 1 — Strongest
Email Domain Suffix
  • @wates.co.uk → UK child company
  • @wates.ae → Middle East child company
  • @wates.in → India child company
  • Most reliable signal — directly tied to the company's own domain structure
Signal 2 — If Email Domain Is Ambiguous
HubSpot Country Property
  • Contact has a Country field populated in HubSpot
  • Country maps directly to the correct child company (UK → wates.co.uk, India → wates.in, UAE → wates.ae)
  • Covers contacts with a global domain email (@wates.com) who are region-specific people
Signal 3 — If Country Is Blank
Assigned Sales Owner's Region
  • Each sales rep is assigned to a territory in HubSpot
  • If the contact has a sales owner, the owner's territory determines the child company
  • Ensures sales ownership and CRM structure stay in sync
Signal 4 — Fallback
Flag for Manual Review
  • No email domain signal, no country property, no assigned sales owner
  • Contact is written to the review queue — not touched by the agent
  • RevOps assigns manually with full context visible

Safety Architecture

The agent does not act unless it is confident.

The biggest risk with any automated CRM operation is a confident wrong decision. We built three confidence tiers so the agent only executes when it has corroborating signals — and flags everything else for human review.

✅ Auto-Fix
URL check + LinkedIn both confirm — execute automatically The live URL matches the contact's LinkedIn company, and the redirect check is clean. Agent sets parent-child, removes duplicate association, flags record as "Created by Agent" for auditability.
⚠️ Review Queue
Only one signal matches — hold for human review URL is live but LinkedIn is missing. Or LinkedIn matches but URL redirects unexpectedly. Agent writes the record to a review CSV with all signals shown — a human makes the final call in minutes.
🚨 Manual Only
Both signals conflict or are unavailable — do not touch Both URLs dead, or LinkedIn company doesn't match either record. Agent flags with full context. RevOps investigates. Nothing is changed automatically.

Every record the agent modifies carries a "Created by Agent" flag in HubSpot. This makes the entire operation auditable and fully reversible — you always know what was automated and what was manual.

Before & After

What the CRM looked like before and after.

ProblemBeforeAfter
Company associations per contact2+ (duplicates common)1 — clean single association
Account hierarchyNone — all companies flatParent-child structure across all enterprise accounts
Regional ownershipNo structure — contacts unassigned by regionChild companies per region, contacts correctly segmented
ABM reportingBroken — contacts couldn't roll up to accountClean rollup from contact → child → parent
Sales territory assignmentManual, inconsistentAuto-assigned via email domain → country → owner fallback
Duplicate company recordsUnknown — no audit trailResolved or flagged — every decision logged
Time to fix 500 recordsDays of manual workAgent run — hours

The Key Insight

Data hygiene is not a cleanup task. It is an architecture decision.

"Most teams treat duplicate records as something to fix manually when it gets bad enough. The real fix is building a system that makes the right decision automatically — and knows when to stop and ask a human."

The agent works because it mirrors how a trained RevOps person would think through each record: check the URL, check LinkedIn, check the region signals, then act — or flag. The difference is it does this for 500 records in the time it takes a human to do five.

1
Validate before acting — a live URL check is the cheapest signal available. It filters out half the ambiguous cases before any enrichment API is called.
2
Redirect logic changes everything — if a company domain redirects to another, you do not need a parent-child structure. You need a merge. Getting this wrong creates hierarchy where none is needed.
3
Region fallback prevents orphans — email domain covers most cases, but country property and sales owner assignment catch the contacts who would otherwise fall through and stay unassigned forever.
4
Confidence tiers protect the CRM — automation without confidence scoring is how you create new data problems while solving old ones. If the agent is not sure, it stops and flags. Human oversight is built into the design.

What This Unlocked

Business outcomes

🏗️
Account hierarchy live across all enterprise accountsEvery subsidiary and regional company now sits correctly under its global parent. ABM reporting works for the first time.
🌍
Regional sales ownership cleanContacts segmented by region automatically. Sales reps work their territory without overlap or confusion over who owns which account.
500+ records fixed — no manual workWhat would have taken days of manual CRM updates ran as an agent operation. Review queue gave the team visibility without the grind.
🔒
Zero incorrect mergesConfidence tiers meant the agent only acted on records it could verify. Ambiguous cases were flagged, not guessed. No new data problems created.
📋
Full audit trail in HubSpot"Created by Agent" flag on every touched record. The entire operation is visible, attributable, and reversible if needed.
🔄
Repeatable — runs on any new accountThe same agent logic applies to every new enterprise account added to the CRM. Hierarchy is built correctly from day one, not fixed years later.

Is your CRM carrying the same weight?

If your contacts have duplicate company associations, no account hierarchy, or regional ownership that nobody trusts — this is exactly the kind of work I build for.

Book a free 20-min HubSpot teardown →

Fixed scope. Fixed price. You will know exactly what is broken before committing to anything.