Lagged Deals and the Dynamic Nature of Venture Capital Databases

Venture capital databases are great but they are not perfect. One of the key problems is their dynamic nature. New information is coming in all of the time. Early-stage activity in particular is systematically lagged until additional rounds are raised—either because the earlier rounds were unpriced or because they were unannounced.

For this reason, I try not to report annual tabulations of deals too recently from when they are to have occurred. For example, we are just two months into 2019 and so I’m uneasy making claims about what occurred in 2018 because the data are still coming in. Many others still do it for any number of reasons. Most of the major venture data providers publish annual statistical reports shortly before or after year-end, and many interested parties, such as corporations and venture firms, routinely produce similar analyses.

I recently had an editor of a major news outlet refuse to cover a study of mine because I wouldn’t include data for 2018 for the reason stated above. The study went back more than a dozen years and had something new to say, but none of that mattered because it wasn’t “fresh” enough for him. I think he missed an opportunity. That experience left me wondering: just how big a problem is the lagged nature of venture data?

Yoram Wijngaarde of Dealroom (a leading venture data provider in Europe) answers with a compelling post. He dug into the data, analyzing several recent vintages of PitchBook year-end reports and his own Dealroom database. His conclusion is that these deals are significantly lagged in a systematic way. I wanted to understand this a bit better, so I dug into the historical PitchBook data releases as Yoram did, but took a deeper look across a broader range of round types and measures. Here’s what that looks like.

Let’s start by looking at the number of deals by year as reported by PitchBook’s annual yearbooks for 2014 through 2018. The deals here have been segmented by three stages: Angel/Seed, Early VC, and Later VC. Each line represents a particular vintage for the yearbook (eg, 2014, 2015, etc.). Values indicate the number of deals as reported for that year in that vintage yearbook.


A few insights jump out.

First, we see that PitchBook is systematically capturing more deals over time. This is particularly the case at the Angel/Seed stage and especially after the 2014 and 2015 vintages. I attribute much of this to the fact that PitchBook was scaling production and simply getting better at capturing deals overall. In other words, much of this I suspect is transitory and outside the bounds of what one might expect from typical year to year revisions moving forward.

Second, putting these transitory effects aside, there are noticeable revisions in the numbers of deals across the round types—though again the impact is strongest at the early stages (ie, Angel/Seed then Early VC).

Third, these revisions tend to lessen the further one moves out from a year’s first reporting—they are the strongest after year one and then reduce sharply afterwards. For example, revisions for deals in 2016 are the largest in the 2017 year-end report and less so in the 2018 year-end report, and so on.

Here’s another way of looking at that impact. I’ve aggregated weighted and unweighted average changes across the years by time from first reporting. Since we are dealing with a relatively small number of vintages, and because the averages across the time lags are not based on a balanced panel of years, these numbers should be interpreted with some caution. Also, because there appears to be a clear methodological change in collecting deals after 2014, I’ll exclude that vintage from this portion of the analysis. I also reduce vintage 2015 changes for Angel/Seed only for the same reason.


The data show that the number of Angel/Seed round deals increases 15-18% one year after they are first published (usually a month after the year ends). The revisions then taper down to one or two percent in the years that follow. A similar pattern occurs for Early VC, which is revised up an average of 14% in the next vintage report, and then a few additional percent in the following years. Revisions for Later VC are the smallest in the year after first release, at about 11%, but also a few percentage points higher in the following years.

The lesson here is clear—venture capital databases are highly dynamic as the reporting of deals is lagged, and data are significantly revised upward in the years that follow the real-time release of annual tabulations. This effect is strongest in the next vintage release after a year of data has first appeared and is also largest at the Angel/Seed stage. For these reasons, it’s important to be cautious about interpretations of data released too soon after a calendar year has completed, unless systematic adjustments have been made. This analysis is suggestive of the magnitude of those adjustments, though more work will need to be done to better estimate them for responsible use in practice.