Conditional Probability for Data Science¶

Let’s consider a fundamental question: how does our understanding of a system change when we acquire new information?
In data science, conditional probability often feels like an abstract mathematical hurdle, introduced through dry, theoretical formulas. However, in practice, it is the backbone of dynamic decision-making—from routing pipeline logic and evaluating a fraud model's true effectiveness, to understanding feature interactions.
In this blog, we will walk through a series of interactive visualisations that reframe conditional probability not as a mere equation, but as a practical, operational mechanism. We will explore five distinct views:
- Filtering
- Rescaling
- Population Flow
- Distribution Shift
- Density Redistribution

import numpy as np
import plotly.graph_objects as go
import scipy.stats as stats

Conditional Probability as Filtering¶

Operational Context¶

Imagine we are analysing customer conversion. If a user belongs to a highly engaged segment, how does that specific context change their baseline likelihood to convert?

The Theory¶

Visually, conditioning is simply filtering. When we apply a condition (e.g., $X > t$), we eliminate outcomes that no longer matter. We are not altering the underlying data or fabricating new outcomes, we are strictly narrowing our operational focus to a relevant subspace.
Once the condition is applied, the probability updates because we are now counting within a smaller, targeted group:
$$ P(Y > 0.5 \mid X > t) = \frac{\text{Number of remaining points with } Y > 0.5}{\text{Number of remaining points}} $$

Implementation¶

In the visualisation below, start with the full scatter of data points representing our sample space. As you move the slider to tighten the condition: - Grey points: Outcomes that are filtered out and no longer matter.
- Remaining points: The new, restricted sample space.
- Highlighted outcomes: Those that satisfy our event of interest within the new space.

import numpy as np
import plotly.graph_objects as go

np.random.seed(0)
points = np.random.rand(2000, 2)

thresholds = np.linspace(0.1, 0.9, 6)
steps = []

for t in thresholds:
    mask = points[:, 0] > t
    prob = np.mean(points[mask, 1] > 0.5)

    steps.append(dict(
        method="update",
        args=[
            {"x": [points[mask, 0]],
             "y": [points[mask, 1]]},
            {"title": f"P(Y > 0.5 | X > {t:.2f}) = {prob:.2f}"}
        ],
        label=f"{t:.2f}"
    ))

fig = go.Figure(
    data=[go.Scatter(
        x=points[:, 0],
        y=points[:, 1],
        mode="markers",
        marker=dict(opacity=0.6)
    )]
)

fig.update_layout(
    sliders=[dict(
        currentvalue={"prefix": "Condition X > "},
        steps=steps
    )],
    xaxis_title="Feature X",
    yaxis_title="Feature Y"
)

fig

This filtering concept directly underpins how we build Funnel Metrics and conduct basic Exploratory Data Analysis (EDA).

Conditional Probability as Rescaling¶

Operational Context¶

Why does a probability metric sometimes increase dramatically even though the absolute number of positive outcomes has decreased?

The Theory¶

Conditioning performs a two-step operation: it removes irrelevant outcomes, and then it rescales the remaining data so that the new total probability equals 1. The remaining points are now treated as the entire world.
We can read this directly from the classic formula. The denominator acts as our scaling factor to renormalise the probability:
$$ P(A \mid B) = \frac{P(A \cap B)}{P(B)} $$

Implementation¶

In this countable dataset, pay attention to the probability mass. - Ignore colours and count total points (8 total).
- After changing the filter, grey points fall outside the condition. Stop counting them entirely.
- Light blue and dark blue form the new world. The new probability is simply the dark blue count divided by the total blue count.

import numpy as np
import plotly.graph_objects as go

# Manually construct a small, countable dataset
# Each point is intentional
x = np.array([0.15, 0.25, 0.35, 0.55, 0.65, 0.75, 0.85, 0.90])
y = np.array([0.30, 0.60, 0.45, 0.70, 0.40, 0.80, 0.55, 0.90])

thresholds = [0, 0.3, 0.5, 0.7]
steps = []

for t in thresholds:
    B = x > t
    A = y > 0.5

    colours = np.where(
        ~B, "lightgrey",
        np.where(A, "royalblue", "deepskyblue")
    )

    opacity = np.where(~B, 0.2, 0.9)

    count_A_and_B = np.sum(A & B)
    count_B = np.sum(B)

    percentage = (count_A_and_B / count_B) * 100 if count_B > 0 else 0

    annotation_text = (
        f"P(A | B) = P(A ∩ B)/P(B) = ({count_A_and_B} / {8})/({count_B}/8) = {count_A_and_B} / {count_B} = {percentage:.0f}%"
    )

    steps.append(dict(
        method="update",
        args=[
            {
                "marker.color": [colours],
                "marker.opacity": [opacity]
            },
            {
                "annotations": [
                    dict(
                        text="Grey: Not B",
                        x=0.9, y=1.15,
                        xref="paper", yref="paper",
                        showarrow=False
                    ),
                    dict(
                        text="Light blue: B but not A",
                        x=0.9, y=1.1,
                        xref="paper", yref="paper",
                        showarrow=False
                    ),
                    dict(
                        text="Dark blue: A and B",
                        x=0.9, y=1.05,
                        xref="paper", yref="paper",
                        showarrow=False
                    ),
                    dict(
                        text=annotation_text,
                        x=0.5, y=1.1,
                        xref="paper", yref="paper",
                        showarrow=False,
                        font=dict(size=14)
                    )
                ],
                # "title": f"Conditional Probability as Countable Outcomes: Condition: X > {t}"
            }
        ],
        label=f"{t}"
    ))

fig = go.Figure()

fig.add_scatter(
    x=x,
    y=y,
    mode="markers+text",
    text=[str(i+1) for i in range(len(x))],  # label points for counting
    textposition="middle center",
    marker=dict(size=18),
)

fig.update_layout(
    sliders=[dict(
        currentvalue={"prefix": "Threshold X > "},
        steps=steps
    )],
    xaxis_title="X",
    yaxis_title="Y",
    title="Conditional Probability as Countable Outcomes",
    height=600
)

fig

This rescaling principle is vital for interpreting normalisation in models, sample reweighting, and softmax outputs.

Conditional Probability as Population Flow¶

Operational Context¶

Let's translate this to a deployed machine learning scenario. A model flags a transaction as fraudulent. What proportion of those flagged cases are truly fraudulent? We refer to this as Precision (or Positive Predictive Value).
We are not asking how good the model is at catching fraud overall. We are asking: given that an alert was generated, can it be trusted?

The Theory¶

Mathematically, this is a conditional probability conditioned on the model's output:
$$ P(\text{Actually Fraud} \mid \text{Flagged as Fraud}) $$

In terms of the confusion matrix, we restrict our view to the "Flagged" column:
$$ \textbf{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}} = \frac{TP}{TP + FP} $$

Conditional probability here answers a population flow question: "Where does the population go once we know the prediction is positive?"
Once flagged, the population splits into True Positives and False Positives. Precision evaluates that exact split.

Implementation¶

Watch what happens to the False Positives when the underlying prevalence (the base rate) changes. The model's inherent behaviour (Sensitivity and False Positive Rate) remains strictly fixed, and accuracy stays high. Yet, the Precision swings dramatically.

import plotly.graph_objects as go

# Fixed model parameters
POPULATION = 10_000
SENSITIVITY = 0.90        # Used to compute TP / FN
FALSE_POS_RATE = 0.05

prevalences = [0.01, 0.05, 0.10, 0.20]
steps = []

for p in prevalences:
    # Actual counts
    actual_fraud = int(POPULATION * p)
    actual_not_fraud = POPULATION - actual_fraud

    # Confusion matrix values
    TP = int(actual_fraud * SENSITIVITY)
    FN = actual_fraud - TP

    FP = int(actual_not_fraud * FALSE_POS_RATE)
    TN = actual_not_fraud - FP

    accuracy = (TP + TN) / POPULATION
    precision = TP / (TP + FP) if (TP + FP) > 0 else 0

    matrix = [
        [TP, FP],
        [FN, TN]
    ]

    text_matrix = [
        [f"FN = {FN}", f"TN = {TN}"],
        [f"TP = {TP}", f"FP = {FP}"]
    ]

    steps.append(dict(
        method="update",
        args=[
            {
                "z": [matrix],
                "text": [text_matrix]
            },
            {
                "title.text": (
                    f"Fraud Prevalence: {int(p*100)}%<br>"
                    f"Accuracy: {accuracy:.2f} | "
                    f"Precision: {TP}/({TP}+{FP}) = {precision:.2f}"
                )
            }
        ],
        label=f"{int(p*100)}%"
    ))

fig = go.Figure(
    data=[
        go.Heatmap(
            z=[[0, 0], [0, 0]],
            x=["Fraud (Actual)", "Not Fraud (Actual)"],
            y=["Not Fraud (Predicted)", "Fraud (Predicted)"],
            text=[["FN = 10", "TN = 9405"], ["TP = 90", "FP = 495"]],
            texttemplate="%{text}",
            textfont={"size": 16},
            colorscale="Blues",
            showscale=False
        )
    ]
)

fig.update_layout(
    sliders=[dict(
        currentvalue={"prefix": "Fraud prevalence: "},
        steps=steps
    )],
    title="Confusion Matrix (TP, FP / FN, TN)",
    margin=dict(l=90, r=90, t=140, b=60),
    height=560
)

fig

This exposes the base-rate fallacy. Accuracy answers: “How often is the model right overall?” Precision answers: “When the model alerts, is it worth acting on?” This gap dictates how we handle imbalanced data problems and medical testing.

Conditional Probability as Distribution Shift¶

Operational Context¶

So far, we have dealt with discrete counts. But what happens when we observe a continuous variable? Given a specific condition, how does the behaviour of that variable change?

The Theory¶

When we apply a condition to continuous data, we are extracting a structured subset, not a random sample. Consequently, the entire distribution changes. We move from measuring a marginal probability $P(\text{Height})$ to a conditional one $P(\text{Height} \mid \text{Gender})$.
Conditioning on a segment can shift the mean, alter the spread, and completely change the shape of our data.

Implementation¶

Toggle between the distributions below. The variable (Height) remains exactly the same; only the condition (Gender) changes.

import numpy as np
import plotly.graph_objects as go

np.random.seed(42)

# Synthetic height data (in cm)
n = 3000

heights_male = np.random.normal(loc=175, scale=7, size=n)
heights_female = np.random.normal(loc=162, scale=6, size=n)
heights_all = np.concatenate([heights_male, heights_female])

groups = {
    "All": heights_all,
    "Male": heights_male,
    "Female": heights_female
}

steps = []

for label, data in groups.items():
    steps.append(dict(
        method="update",
        args=[
            {"x": [data]},
            {
                "title.text": (
                    f"Height Distribution | Condition: {label}<br>"
                    "Same variable, different distribution"
                )
            }
        ],
        label=label
    ))

fig = go.Figure(
    data=[
        go.Histogram(
            x=heights_all,
            nbinsx=40,
            histnorm="probability density",
            marker=dict(color="royalblue", opacity=0.7)
        )
    ]
)

fig.update_layout(
    sliders=[dict(
        currentvalue={"prefix": "Condition: "},
        steps=steps
    )],
    xaxis_title="Height (cm)",
    yaxis_title="Density",
    title="Distribution of heights",
    height=550,
    bargap=0.05
)
fig.update_xaxes(range=[140, 200])
fig

Whenever a model's behaviour fluctuates across segments, you are observing this shift. Regression models estimate these conditional distributions ($\mathbb{E}[Y \mid X]$), which are critical for Partial Dependence Plots and fairness analyses.

Conditional Probability as Density Redistribution¶

Operational Context¶

Finally, what actually happens to the continuous probability mass after we condition on a related feature?

The Theory¶

Think of probability mass as a limited resource. Conditioning redistributes it. When we restrict $X$, the mass outside our condition instantly drops to zero. To ensure the total area remains 1, the probability density inside the condition intensifies and rebalances.
Because $X$ and $Y$ are correlated, placing a restriction on $X$ implicitly restructures the likely values of $Y$. The condition acts on the x-axis, but the consequences materialise along the y-axis.
Mathematically, we extract a slice of the joint distribution and renormalise it:
$$ P(Y \in \text{bin}_j \mid X \in \text{bin}_i, X > t) = \frac{P(Y, X > t)}{P(X > t)} $$

Implementation¶

In this heatmap, watch how the conditional density adapts. By sliding the Standard Deviation threshold on $X$, you can see the density of $Y$ shift entirely.

colorscale = [
    [0.0, "white"],
    [0.2, "#fde0dd"],
    [0.4, "#fcbba1"],
    [0.6, "#fc9272"],
    [0.8, "#fb6a4a"],
    [1.0, "#cb181d"]
]

import numpy as np
import plotly.graph_objects as go

np.random.seed(2)

# Generate correlated data
x = np.random.randn(5000)
y = x + np.random.randn(5000) * 0.5

# Joint density
heat, xedges, yedges = np.histogram2d(x, y, bins=30, density=True)

thresholds = [-1.0, 0.0, 1.0]
steps = []

for t in thresholds:
    mask = xedges[:-1] > t

    conditioned = heat.copy()
    conditioned[~mask, :] = 0
    conditioned = conditioned / conditioned.sum()  # rescale

    steps.append(dict(
        method="update",
        args=[
            {"z": [conditioned.T]},
            {"title.text": f"Conditional Density | X > ({t} standard deviations from mean)"}
        ],
        label=str(t)
    ))

fig = go.Figure(
    data=[
        go.Heatmap(
            z=heat.T,
            colorscale=[
                [0.0, "white"],
                [0.2, "#fde0dd"],
                [0.4, "#fcbba1"],
                [0.6, "#fc9272"],
                [0.8, "#fb6a4a"],
                [1.0, "#cb181d"]
            ],
            showscale=True,
            colorbar=dict(title="Probability density")
        )
    ]
)

fig.update_layout(
    sliders=[dict(
        currentvalue={"prefix": "Condition (Standard Deviation threshold): "},
        steps=steps
    )],
    xaxis_title="X",
    yaxis_title="Y",
    height=550
)

fig

This concept is foundational for Generative Modelling, likelihood estimation, and diagnosing covariate shift in production models.

Final Takeaway¶

Conditional probability is not a single, rigid formula. In the context of data science, it is a multi-faceted tool. By understanding it as Filtering, Rescaling, Population Flow, Distribution Shift, and Density Redistribution, we transition from solving textbook equations to designing robust, context-aware machine learning architectures.