What Machine Learning aims to do is, with an underlying assumption of the probability distribution of certain variables, filter out meaningful details from data to uncover certain features. These features can be used to classify data, or make predictions.

The “Learning” is based on statistical inference, where data is collected and probability distributions are developed, refined, and structures that link the variables together are worked out.

Judea Pearl develops on, what he calls Causal analysis. There is a mechanism where we are assigned data from nature. We assume a probability distribution and assign relationships between the different variables. This is called a Causal Model. The model allows us to predict two things:

- The relationship between the variables given the underlying distributions
- How the probability distributions of the different variables affect each other

In this schema, there are two levels of inference: On causal mechanisms and probability distributions. Given this schema, we can inquire in many ways. We can ask questions on observations, interventions, and counterfactuals. Pearl makes the claim that this Causal schema’s strength is where it deals with counterfactual questions.

Pearl contrasts his position against the position of those who rely on pure statistical inference. He likens that position to the subjects of Plato’s Allegory of the Cave. In the Allegory, an entire community is chained in a cave. They are restricted in movement. They see their shadows on the wall opposite the entrance. Their shadows move as they do, so they eventually identify themselves with their shadows. As a result, they lose all understanding of depth, light, and colour.

In the absence of causal models, Machine Learning only works with what can be observed, and loses what

For a Causal Model, you can develop a graph or a decision network where many causal relationships are mapped out, and the holistic consequences of an intervention can be mapped out. The network is Bayesian, with variables that can stand for possible interventions. The model has four types of components:

Endogenous variables

System variables (with a probability distribution))

Background variables (interesting as they can be postulated and are not observed)

Functions (how an observed variable relates to the observed variables and unobserved.

You can then map out the relationship between the variables. You can use the underlying probability distribution to create an induced probability distribution on endogenous variables. If you have a relationship between two variables, you will write y = beta*x + alpha + Noise (one variable depends on the other). This noise can be error or uncertainty.

How do you work with Counterfactuals?

Counterfactual situations allow us to consider situations that have not occurred. Using the schema, we can see how one variable affects another in different situations. You can take the model, with all the causal relationships in the network, replace certain elements, and create, what Pearl calls, a “mutilated” model. Counterfactuals can be studied by model restrictions or expansions. Pearl calls these mutilated models, as he has primarily studied model restrictions, but the theory has developed from there. He has two fundamental principles of counterfactual.

- Law of Structural Counterfactual
- Law of Structural Independence

The tight influence between variables characterizes a causal relationship. The absence of a direct arrow in the casual graph implies independence, which is not statistical, but conditional on a separating factor. Observed and background variables can be used to summarize millions of background processes. By developing these processes and the model can learn.

By arranging the model this way, one can see the consequences of actions in the model. The model can develop variables and probability distributions. An action can be simulated. All of this requires a causal analysis. Using assumptions one can look at implications, specifically testable implications. The Causal framework allows one to relate various actions in a complicated network of variables. This approach has a statistical overlay on a model approach (with the goodness of fit, testable implications, etc.)

He develops a causal calculus (called the Do-calculus) based on three fundamental rules, titled: Ignoring Observations, Action/Observation exchange, and Ignoring Interventions

External validity in causal models:

How much can you extrapolate or generalize?

Pearl says that the causal assumptions will determine how much you can take from the model. The strength of causal models can develop counterfactuals.

Eg: what if you educate a person on income. The causal model you develop/a world view will determine what the effect is.

Incorrect causal assumptions will lead one astray, but by tuning the assumptions to improve the model.

All you have is data and inputs, but by altering the causal story, the probability distributions will change, leading to the creation of a causal graph. You can find the factors that create differences, which can complicate the model and make it richer.

There are theorems that relate the Do-calculus to applications to the general population.

Missing data:

How does this approach deal with missing data? Pearl argues that causality is fundamental in inferring missing data. To justify this, Pearl makes use of a theorem that states that there is no universal algorithm to recover missing data. Without investigation of the model, one cannot infer lost data, so causal inference is essential in recovering lost data