The Situation

The marketing team had all the tools. Salesforce for CRM. Marketo for automation. DemandBase for ABM and intent data. LinkedIn Ads for paid targeting. Four platforms, significant annual spend, and a dashboard that could tell you anything you wanted to know.

Except it couldn't tell you which accounts were actually likely to buy.

The propensity model existed—technically. But sales didn't trust it. "Your 'hot' leads are garbage," was the polite version of the feedback. Marketing would point to the data showing high intent scores. Sales would point to the conversations that went nowhere.

Both were right. The model was mathematically sound. It was also practically useless.

I was brought in to fix demand gen, but within the first two weeks, it became clear that the problem wasn't campaigns or content or targeting. The problem was data.

The Insight

Propensity models are only as good as the data feeding them. Garbage in, garbage out—everyone knows this. But the failure mode I found wasn't garbage data. It was inconsistent data.

The four systems had been implemented at different times by different teams with different naming conventions:

The model was mathematically weighting data—but it was doing math on records that weren't actually the same entities.

The insight: The prediction problem was actually a data governance problem.

The System

Layer 1: Data Audit and Mapping

First, we mapped every field across all four systems. The output was brutal:

Layer 2: Standardization Rules

Company Identification

Industry Taxonomy

Layer 3: Integration Architecture

DemandBase (Intent Data)
    │
    ├──→ Salesforce Account (nightly batch + real-time for surges)
    │        │
    │        └──→ Marketo (account-level intent visible in automation)
    │
LinkedIn Ads
    │
    ├──→ Marketo (engagement tracking)
    │        │
    │        └──→ Salesforce Lead/Contact
    │
Marketo (Behavioral Data)
    │
    └──→ Salesforce (bi-directional sync, 15-minute intervals)

Layer 4: The Propensity Model

With clean, synchronized data, we rebuilt the scoring model:

Input Signals (weighted by predictive power):

Score RangeClassificationSales Action
80-100Sales-ReadyImmediate outreach, high priority
60-79Marketing QualifiedAccelerated nurture, SDR qualification
40-59EngagedStandard nurture, monitor for surge
20-39AwareLight-touch digital, brand awareness
0-19ColdExclude from active campaigns

Layer 5: Vertical Segmentation

Rather than one model to rule them all, we calibrated for high-value verticals:

Banking/Financial Services: Heavier weight on compliance content, longer lookback window

Retail: Seasonal adjustment, customer experience content weight, peak period multiplier

The Takeaway

Most companies don't have a prediction problem. They have a data problem masquerading as a prediction problem.

1. Audit before you build. We lost a month before I realized the model wasn't the problem. That month would have been saved by starting with a data audit.
2. Domain is your friend. Email domain as the primary key for matching solved most of our account matching challenges.
3. Vertical calibration beats universal models. A model trained on all accounts will be mediocre at predicting any specific segment.