Causal inference in data science, session 2

Image for Causal inference in data science, session 2

Continuing the “Data Science across Disciplines” series, this session returned to the topic of causal inference in data science with two additional perspectives and applications. We heard from experts in the development of government policy and clinical and public health.

_

Talks:

Understanding the role of causal inference from observational datasets in developing government policy

Dr Elena Tartaglia, CSIRO’s Data 61

One of the aims of public policy is to encourage behaviours towards a desired outcome. To develop effective and evidence-based policy, policymakers need to understand the likely impact of a policy. In other words, they need to understand the causal effect of an intervention. The ‘gold standard’ for showing causal effects is the randomised control trial (RCT). However, there are many situations where RCTs are impossible or unethical. Instead, governments rely on administrative data, which is a type of observational data that typically contains various biases. Controlling for biases is critical when estimating causal effects. Although it is impossible to guarantee perfect bias removal, using causal diagrams and adjustment techniques can help us identify data limitations and better estimate causal effects. I recently worked with the Department of Education, Skills and Employment (DESE) to incorporate causal inference techniques into the analytics underpinning their policy advice and formulation. Example questions of interest to DESE are ‘what is the effect of childcare attendance on student readiness to enter primary school?’ and ‘what is the effect of high school completion on income later in life?’. In this talk, I will highlight the utility of causal inference in policy formulation and address some implementation challenges.

Causal machine learning in health data science

A/Prof Margarita Moreno-Betancur, Paediatrics Royal Children’s Hospital

The ultimate goal of medical and health research is to improve patient outcomes and population health. As a result, the overwhelming majority of clinical and public health research studies ask “causal” questions, concerning the effect of treatments, policies, behaviours and other exposures on health outcomes. In many cases, especially in the current era of data deluge, these studies rely on observational (non-experimental) data to address causal questions. Unfortunately, for a long time the statistics discipline largely shunned the possibility of causal inference beyond randomised trials, and instead focused on the development of tools such as regression models without clarity regarding their usefulness and limitations for addressing the causal questions that substantive areas continued to ask. In recent decades, however, the discipline has seen the rise of a new area focused on determining the settings and approaches that could allow causal inference from observational data. This talk will first provide an overview of some of the fundamental contributions of this statistical area to enable and improve the study of causality in health research, and then describe the role of machine learning within this causal inference paradigm, including recent methodological advances.

This was a hybrid event.