Using Benford's Law to analyze US Covid Data Quality

Benford's law is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. It is seen that a leading 1 occurs about 30% of time and a leading 2 occurs about 17% of time time.

Like many real world data sets, we expect the reported covid data, for number of daily deaths and number of new positive cases to match the expected Benford distribution.

Benford's law has been used to detect fraud and data quality in many areas including Accounting, Election data and price manipulation.

Getting the data

The fine folks at https://covidtracking.com/ have been publishing daily statewise data. We copy their data from 24th October and put it in a place we can get in a dataframe.

References

  1. https://mathworld.wolfram.com/BenfordsLaw.html
  2. https://covidtracking.com/data
  3. https://www.journalofaccountancy.com/issues/2017/apr/excel-and-benfords-law-to-detect-fraud.html