How Broken Data Misleads Attribution Models and Wastes Budget

Introduction: The $15,000 Mistake

You may have seen this happen. You check GA4 and notice a google/cpc campaign falling behind. It is not closing deals, so you shift its $15,000 budget to a channel that looks stronger. A week later, your pipeline collapses. What happened?

The root cause often lies in the data itself. Even small tracking errors can distort performance numbers, making successful campaigns appear to fail. When the data is broken, any decision based on it can backfire.

In this post, we will explore how common attribution approaches can tell very different stories and how data quality issues amplify these conflicts. Using interactive figures, we will visualize these differences and highlight why fixing the underlying data must come first.

This builds on our earlier guides on GA4 events as the foundation of analytics and how sessions group those events into meaningful insights.

The post has some interactive graphs to help explain the concepts. You can find a full demo of attribution models and the impact of data quality in the resources section of the website.

Key Takeaways

Small tracking errors can make campaigns that are performing well look like failures.
Conflicting interpretations from different attribution approaches can lead to costly decisions.
Cleaning your data is more important and urgent than debating which attribution approach is correct.

Understanding Attribution: The Last-Touch Model

When we try to understand which marketing channels drive results, we rely on attribution. Attribution is simply the method used to assign credit for a sale or conversion to the channels and touchpoints a user interacted with along their journey. How we assign this credit changes what we think is working and what is not.

The most common default approach is Last-Touch Attribution. This model gives all the credit to the final touchpoint a user interacted with before converting.

This means it rewards the channel that closed the deal. The downside is that it ignores everything else in the customer journey, like for example a blog post someone has read some weeks ago, or the social media ad that first introduced them to your brand.

In figure 1 you can see how the Last-Touch model assigned credit for a demo dataset. This dataset contains data that you would get from GA4 and a CRM. In the Last-Touch model the biggest channel is (direct) / (none) at $55,000, followed by newsletter / email at $42,000 and google / cpc at $33,000. These numbers suggest clear winners, but the story changes when we switch to data that has some attribution gaps for example through missing or inconsistently configured UTM parameters.

Interactive Demo: Last Touch

Figure 1: Last-Touch Attribution. This report shows the clean data, where (direct) / (none) and google / cpc are the main drivers. Click the 'Dirty Data' toggle to see how a few broken UTMs instantly change the results.

With unclean data (direct) / (none) jumps by $15,000, pulling revenue away from google / cpc. A simple broken UTM caused this shift here. Even this straightforward model can give misleading results when the underlying tracking is flawed. Campaigns that are performing well can appear to fail, putting budget decisions at risk.

The First-Touch Model: Rewarding the Opener

The opposite of Last-Touch is First-Touch Attribution. This model gives all the credit to the very first touchpoint a user interacted with in their journey, regardless of what actually closed the sale.

This approach highlights the channels that bring new users into your world, like SEO, social media, or other discovery engines. The downside is that it ignores the touches that happened later, such as newsletters, retargeting ads, or sales calls that actually nurtured and converted the lead.

In figure 2, we can see the First-Touch model applied to the same demo dataset. Here, google / organic emerges as the top channel at $55,000. Channels that were dominant in the Last-Touch view, like (direct) / (none) and google / cpc, now drop significantly to $0 and $18,000 respectively. Some channels, like newsletter / email at $42,000 and partner-xyz.com / referral at $27,000, remain unchanged because the user journeys were simple and direct.

Interactive Demo: First Touch

Figure 2: First-Touch Attribution. This report shows the clean data, where google / organic is highlighted as the main driver. Click the ‘Dirty Data’ toggle to see how missing first-touch information shifts credit to (direct) / (none).

When we switch to unclean data, google / organic loses more than half its revenue, dropping by $30,000. That revenue is reassigned to (direct) / (none) due to missing or misconfigured tracking. Even channels that originally looked like top performers can appear to underperform if the first touch is missrecorded. This demonstrates how sensitive First-Touch models are to tracking gaps, and why data quality is critical before making any budget decisions.

The Reporting Conflict: First vs. Last

At this point, we have two very different views of the same marketing performance. Last-Touch highlights channels that closed deals, while First-Touch emphasizes the channels that opened the journey. Both reports are technically correct, but they tell completely opposite stories.

In clean data, Last-Touch shows (direct) / (none) at the top, followed by newsletter / email, google / cpc, and partner-xyz.com / referral. This view suggests that the direct and closing channels drive the most revenue. First-Touch flips the perspective, highlighting google / organic as the top channel, followed by newsletter / email, partner-xyz.com / referral, google / cpc, and linkedin / social. Channels like (direct) / (none) now play a much smaller role.

Interactive Demo: First Vs Last

Figure 3: Last vs. First. This view compares revenue by channel between the Last-Touch and First-Touch models. Positive bars indicate channels that gain credit in Last-Touch, negative bars indicate channels that lose credit. Toggle between ‘Clean Data’ and ‘Dirty Data’ to see the impact of broken tracking.

This comparison shows the strategic problem: one report suggests prioritizing direct and closing channels, while the other emphasizes discovery channels like SEO and social. Decisions based solely on one model could lead to investing in one set of channels while ignoring others that are actually contributing to growth.

When we add unclean data, the differences become even more pronounced. Channels like (direct) / (none) inflate further, taking credit from other channels. This makes it clear that both simple models are highly sensitive to data quality. Without accurate tracking, neither report can be trusted, and budget decisions based on these views carry real risk.

A Solution: Multi-Touch Attribution

To address the conflict between First-Touch and Last-Touch, we can move to multi-touch attribution. Instead of giving all the credit to a single touchpoint, multi-touch models divide revenue across the customer journey, recognizing every interaction that contributed to the conversion.

Linear Attribution is the simplest of these approaches. It divides credit equally across all touchpoints. For example, if a user had four interactions before converting, each touch receives 25 percent of the revenue. Linear acknowledges the opener, the closer, and all the assists in between. Other multi-touch approaches, such as time-decay or U-shaped models, assign credit differently, but they all rely on the same underlying data.

In figure 4, we see the Linear model applied to the demo dataset with clean data. The result is a more balanced view: channels that were invisible in the simpler models because they are neither a first or last touchpoint, like facebook / social, now appear with $10,000 in credit. Channels that swung dramatically in the First- and Last-Touch reports, like linkedin / social, stabilize at $10,000. google / organic and (direct) / (none) each receive $22,500.

Interactive Demo: Linear

Figure 4: Linear Attribution. This report shows how revenue is distributed across all touches in the journey. Toggle between ‘Clean Data’ and ‘Dirty Data’ to see the effect of broken tracking on this model.

Even this more balanced approach is sensitive to data quality. Switching to dirty data inflates (direct) / (none) from $22,500 to $37,500, pulling credit from other channels. This reinforces the central point: no model can fix broken tracking. The reliability of any attribution report depends entirely on clean underlying data.

Common Objections (FAQs)

After reviewing these attribution models, a few questions often come up.

But my (direct) / (none) traffic is high because we have a strong brand. - Partly true. But (direct) / (none) also acts as a catch-all for broken tracking. If a paid ad’s UTM fails or an email link is untagged, that revenue ends up here. Without detailed investigation, it is impossible to separate genuine direct traffic from misattributed conversions.
Why not just use the default GA4 attribution reports? - Standard GA4 reports hide many data quality issues. They show the symptom, like high direct traffic, but not the cause, such as a specific broken link or misconfigured UTM. To make informed decisions, we need visibility into the raw, uncorrected data.
This seems too complicated. My report looks fine. - This is the biggest risk. A report may appear accurate because errors are hidden. You could be cutting budgets for top-performing channels without realizing it, simply because their revenue is misattributed.

Conclusion: Stop Debating Models. Start Cleaning Your Data

We have now seen how different attribution approaches can tell completely different stories from the same data, and how all models are vulnerable to tracking gaps and unclean data. Even more balanced multi-touch approaches, like Linear Attribution, only provide a clearer view when the underlying data is accurate.

The key takeaway is this: attribution issues are not primarily about which model to choose. They are a Garbage In, Garbage Out problem. If event and session tracking are broken, every model built on top of that data will be misleading. A more sophisticated model applied to flawed data simply produces a more polished, but still incorrect, result.

The solution is clear: audit and fix your data first. Only with reliable tracking can you trust the insights any attribution model provides.