Analysis of Confirmed COVID-19 Cases the EU (2020–2022)

By Ursula Tamen, conducted in Mar 2025.
Last updated 20 May 2025

Disclaimer: This analysis was conducted as part of an academic exercise and is not meant to be interpreted as definitive. While based on credible data sources, limitations exist regarding the accuracy of case reporting, and the are no claims made on the correctness of any assumptions and statements made from the data. The author (I) is not an accredited public health professional or analyst, and the interpretations are intended for educational and illustrative purposes only.

Introduction: A Data-Centric Look at a Global Crisis

Data Management and Refinement

Handling Missing Data

Summary Statistics and Interpretation

With the data cleaned and a relevant data subset created, I ran detailed descriptive analyses using .describe() to understand the spread and behaviour of key variables.

Categorical Variables and Time Dimensions:

To better explore trends, I created:
– Month
– Year
– Month-year
– Stringency Category (Low: 0–40, Medium: 41–69, High: 70–100)

All 27 EU countries had equal representation. Time was also evenly distributed, with consistent daily records throughout the three-year span, ensuring no seasonal or temporal bias.

The insights gotten so far guided the next layer of exploratory questions and statistical testing.

Exploratory Analysis

How Many Total Confirmed Cases Reported in the EU within 2020 and 2022 ?
What was the Cumulative Trend of Confirmed COVID-19 Cases Over Time in the EU between 2020 and 2022 ?
What was the Trend of New Cases in the EU Over The Months (2020-2022)?

I analysed the trend of new COVID-19 cases over time by grouping the data by month_year and plotting new cases month by month. This provides a clearer picture of how new infections surged or declined throughout the period.

How did different stringency categories compare in case numbers ?

For this analysis, the aim was to uncover how different stringency categories impacted the total number of new confirmed COVID-19 cases over the period from 2020 to 2022. To do this, I first grouped the data by the stringency category column and summed up the new_cases within each category. This allowed me to get the total number of cases for each stringency category (Low, Medium, High). After that, I used a bar plot to visualize the total new cases for each stringency category, with the x-axis representing the stringency categories and the y-axis representing the number of confirmed cases.

What was the Trend of Government Response (Stringency Index) Over Time ?

This was obtained by computing the mean stringency index by month-year and plotting the trends, allowing me to compare how different countries adjusted their measures over time.

Exploring Correlation Between Stringency Index and COVID-19 Confirmed Case Counts

As a preliminary step toward understanding how government responses may have influenced the pandemic’s spread, I ran a basic correlation analysis between the stringency index and confirmed COVID-19 cases. While this approach is limited, as it does not account for interactions with other variables or time-lagged effects, it provides a useful high-level view. Two case metrics were tested: daily new cases and total cases per million, helping to distinguish between short-term volatility and long-term patterns

Final Reflections and Conclusion

The correlation results revealed a very weak negative relationship between the stringency index and daily new cases (Pearson: -0.0076), suggesting short-term policy strictness had little immediate impact on case fluctuations, likely influenced by factors like testing practices and virus variants.

In contrast, the relationship between the stringency index and total cases per million was moderately negative (Pearson: -0.5764), implying that countries with stricter measures had fewer cumulative infections per capita. While this doesn’t confirm causality, it points to potential long-term benefits of rigorous interventions.

Key insights from the data include:

  • The EU recorded 178,836,027 confirmed cases from 2020 to 2022—likely an undercount.
  • France had the highest number of total cases, while Austria had the higher number of cases per million.
  • Greece had the highest average stringency, and Estonia the lowest, showing different national responses.
  • The Omicron surge in early 2022 caused the steepest rise in cases.
  • Reporting of the stringency index dropped significantly after 2022, likely due to lifted restrictions.

The python code used to generate the statistics and create the charts is not included in this written piece. However, the code file and the original report are available upon request for anyone who would like a closer look.