Probability: Building Intuition by Counting Worlds¶
In my years architecting AI solutions, I've noticed a recurring intuition among senior practitioners. They moght not have touched a probability textbook in 10 years ago, yet they often have a visceral feel for the fundamental mechanics of probability. They dont treat formulas like black boxes.
In a production environment, probability isn't just a formula, it's a counting problem. Whether you are sampling data for a training set or simulating agentic workflows, you are essentially "counting worlds."
This post aims to rebuild that foundation from the ground up, using interactive visualisations to turn abstract ratios into tangible architecture.
import numpy as np
import pandas as pd
import plotly.graph_objects as go
The Glossary of the ArchitectBefore we dive into the visualisations, let's align on the terminology we'll be using:
- Sample Space (\(\Omega\)): The set of all possible "worlds" or outcomes.
- Event (\(E\)): A specific subset of those worlds that we care about.
- Probability (\(P\)): Simply the ratio of the "worlds we want" to the "total worlds available."
The Die Grid: Probability as "Counting Worlds¶
"Most of us were taught probability through fractions. However, to truly analyse a system, you need to see the entire sample space.When we roll two dice, there are exactly 36 possible worlds (\(6 \times 6\)). In the visual below, the grid remains fixed because the total number of possibilities never changes. When you move the slider to a specific sum, we aren't "calculating" a probability, we are simply filtering the grid to count which worlds satisfy that condition.
df = pd.DataFrame([(i,j) for i in range(1,7) for j in range(1,7)], columns=['d1','d2'])
frames = []
for s in range(2,13):
ok = (df.d1 + df.d2) == s
frames.append(go.Frame(
name=str(s),
data=[go.Scatter(
x=df.d1, y=df.d2,
mode='markers',
marker=dict(color=['red' if v else 'lightgrey' for v in ok])
)]
))
fig = go.Figure(
data=[frames[0].data[0]],
frames=frames
)
fig.update_layout(
title=f'Die grid, worlds that satisfy sum of two dice is X',
xaxis_title='Die 1', yaxis_title='Die 2',
sliders=[{
'steps':[{'args':[[str(s)],{'frame':{'duration':300,'redraw':True}}],
'label':str(s), 'method':'animate'} for s in range(2,13)],
'currentvalue':{'prefix':'Sum = '}
}]
)
fig
Why this matters: When you view probability as a filter on a fixed grid, the "Law of Total Probability" becomes intuitive. You aren't adding abstract fractions; you're just counting different areas of the same map.
The Slot System: Permutations and Sampling without Replacement¶
Permutations (\(nPr\)) often confuse because the formula \(n! / (n-r)!\) feels arbitrary. But if you think like a system designer, a permutation is just a sequential resource consumption problem.
When we talk about Sampling without Replacement where the order matters, we are filling "slots." As you fill the first slot, you use up one option. The bars in the visualisation below shrink because your pool of choices is being consumed.
- Slot 1: \(n\) choices available.
- Slot 2: \(n-1\) choices remain.
- Slot 3: \(n-2\) choices remain.
import plotly.graph_objects as go
import math
frames = []
for n in range(3, 9):
r = 3
choices = [n - i for i in range(r)]
# npr logic
choices_str = " * ".join(map(str, choices))
total_choices = math.prod(choices)
num_ = str(n)+"!"
den_ = "("+str(n)+" - "+str(r)+")!"
subtitle = f"Number of choices = {choices_str} = {num_}/{den_} = {n}P{r} = {total_choices}"
frames.append(go.Frame(
name=str(n),
data=[go.Bar(x=[f'Slot {i+1}' for i in range(r)], y=choices)],
# We update the layout for each frame to change the annotation
layout=go.Layout(annotations=[{
'text': subtitle,
'xref': 'paper', 'yref': 'paper',
'x': 0.5, 'y': 1.1, # Positioned just under the main title
'showarrow': False,
'font': {'size': 14}
}])
))
# Initialize the figure with the first frame's data and layout
fig = go.Figure(
data=frames[0].data,
layout=go.Layout(
title='Choices per slot as n increases (fixed r)',
yaxis_title='Available choices',
yaxis=dict(range=[0, 9]),
annotations=frames[0].layout.annotations, # Set initial subtitle
sliders=[{
'steps': [{
'args': [[f.name], {
'frame': {'duration': 300, 'redraw': True},
'mode': 'immediate'
}],
'label': f.name,
'method': 'animate'
} for f in frames],
'currentvalue': {'prefix': 'n = '}
}]
),
frames=frames
)
fig.show()
Operational Insight: We use permutations when the sequence of consumption defines the outcome. In a pipeline, if Step A followed by Step B is a different "world" than Step B followed by Step A, you are in the realm of permutations.
## Combinations: Selecting the Subset Often, we don't care about the sequence, we only care about the subset. This is Sampling without Replacement where order is irrelevant (e.g., choosing 5 users for a pilot study out of a pool of 100).
The Combination formula (\(nCr\)) is effectively the Permutation formula with a "correction factor." We divide by \(r!\) to collapse all the different sequences that result in the same group into a single "world.
Architectural View: If Permutations are about the path taken, Combinations are about the destination. When you are sub-sampling a dataset for validation, you are almost always dealing with combinations.
import plotly.graph_objects as go
import math
n = 8 # Fixed n
max_r = 6
frames = []
for r in range(1, max_r + 1):
choices = [n - i for i in range(r)]
npr = math.perm(n, r)
r_fact = math.factorial(r)
ncr = math.comb(n, r)
# Mathematical expansion of r!
r_expansion = " * ".join([str(i) for i in range(r, 0, -1)])
# Labels for the subtitle
subtitle = f"For n={n}, r={r}, The unique Combinations are (nCr) = {n}P{r}/{r}! = {npr} / {r}! = {ncr}"
redundancy_explanation = (
f"<b>r = {r} slots filled</b><br>"
f"Redundancy per set: {r}! = <span style='color:red'>{r_expansion}</span> = <b>{r_fact}</b><br>"
f"<i>We divide the total permutations by {r_fact} to remove order.</i>"
)
frames.append(go.Frame(
name=str(r),
data=[go.Bar(
x=[f'Slot {i+1}' for i in range(r)],
y=choices,
marker_color='rgb(55, 83, 109)',
text=choices,
textposition='outside'
)],
layout=go.Layout(
annotations=[
# Subtitle
{'text': subtitle, 'xref': 'paper', 'yref': 'paper', 'x': 0.5, 'y': 1.15,
'showarrow': False, 'font': {'size': 14}},
# Redundancy Explanation
{'text': redundancy_explanation, 'xref': 'paper', 'yref': 'paper', 'x': 0.5, 'y': 0.7,
'showarrow': False, 'font': {'size': 16}, 'align': 'center',
'bgcolor': 'rgba(255,255,255,0.9)', 'bordercolor': 'black', 'borderwidth': 1}
]
)
))
# Build Figure
fig = go.Figure(
data=frames[0].data,
layout=go.Layout(
title=f"Visualizing Combination Redundancy (Fixed n={n})",
margin=dict(t=120),
# FIXED X-AXIS: This prevents the 'Slot 1' bar from moving
xaxis=dict(
title="Slots",
range=[-0.5, max_r - 0.5],
fixedrange=True
),
yaxis=dict(
title="Available Choices",
range=[0, n + 1],
fixedrange=True
),
annotations=frames[0].layout.annotations,
sliders=[{
'steps': [{
'args': [[f.name], {'frame': {'duration': 400, 'redraw': True}, 'mode': 'immediate'}],
'label': f'r = {f.name}', 'method': 'animate'
} for f in frames],
'currentvalue': {'prefix': 'Number of items chosen (r): '}
}]
),
frames=frames
)
fig.show()
Sampling with Replacement: The Infinite Resource Pool¶
In many modern simulations or when dealing with extremely large populations, we use Sampling with Replacement. Here, the "Grid" never changes. Every time you make a choice, the item is returned to the pool.
The Logic: Every "slot" has exactly \(n\) choices.
The Count: For \(r\) slots, the total worlds are simply \(n^r\).
This is the foundation of independent events. In a production monitoring system, if we assume the probability of a server failure is independent of the previous one, we are modelling our world using "Replacement" logic.
import plotly.graph_objects as go
frames = []
r = 5 # Number of slots/steps
for n in range(5, 9):
# Data calculations
with_rep = [n] * r
without_rep = [n - i for i in range(r)]
# Logic for the "With Replacement" subtitle string
with_rep_str = " * ".join([str(n)] * r)
total_with = n ** r
# Logic for the "Without Replacement" subtitle string
without_rep_str = " * ".join([str(val) for val in without_rep])
total_without = 1
for val in without_rep: total_without *= val
# Combine into a multi-line subtitle
subtitle = (
f"<b>With Replacement:</b> {with_rep_str} = {n}^{r} = {total_with:,}<br>"
f"<b>Without Replacement:</b> {without_rep_str} = {total_without:,}"
)
frames.append(go.Frame(
name=str(n),
data=[
go.Bar(x=[f'Slot {i+1}' for i in range(r)], y=with_rep, name='With Replacement', marker_color='rgb(55, 83, 109)'),
go.Bar(x=[f'Slot {i+1}' for i in range(r)], y=without_rep, name='Without Replacement', marker_color='rgb(26, 118, 255)')
],
layout=go.Layout(
annotations=[{
'text': subtitle,
'xref': 'paper', 'yref': 'paper',
'x': 0.5, 'y': 1.15,
'showarrow': False,
'font': {'size': 14},
'align': 'center'
}]
)
))
fig = go.Figure(
data=frames[0].data,
layout=go.Layout(
title='How Replacement Alters the Choice Pool',
barmode='group',
margin=dict(t=150), # Extra top margin for the double-line subtitle
xaxis_title='Step / Slot',
yaxis_title='Available Choices',
yaxis=dict(range=[0, 9]),
legend=dict(x=0.8, y=1),
annotations=frames[0].layout.annotations,
sliders=[{
'steps': [{
'args': [[f.name], {'frame': {'duration': 300, 'redraw': True}, 'mode': 'immediate'}],
'label': f.name,
'method': 'animate'
} for f in frames],
'currentvalue': {'prefix': 'Pool size (n) = '}
}]
),
frames=frames
)
fig.show()
The Law of Large Numbers: Trusting the Convergence¶
Theoretical probability (the "Counting Worlds" approach) tells us what should happen in a perfect model. The Law of Large Numbers (LLN) is the bridge to the messy, real world.
As we increase the number of trials, say, tossing a coin thousands of times, the observed frequency of "Heads" starts to stabilise. It "forgets" the early noise and converges toward the theoretical 0.5.
import plotly.graph_objects as go
import numpy as np
np.random.seed(42)
max_trials = 5000
# Generate the data
tosses = np.random.choice([0, 1], size=max_trials)
heads_count = np.cumsum(tosses)
running_heads_prob = heads_count / np.arange(1, max_trials + 1)
steps = []
for i in [10, 50, 100, 500, 1000, 2000, 5000]:
current_heads = heads_count[i-1]
current_prob = running_heads_prob[i-1]
# Subtitle showing the mapping: Counting -> Probability
subtitle = f"Count of Heads: {current_heads} | Total Tosses: {i} | P(Heads) ≈ {current_heads}/{i} = <b>{current_prob:.4f}</b>"
steps.append(dict(
method='update',
args=[
{'x': [list(range(1, i + 1))], 'y': [running_heads_prob[:i]]},
{
'title': f'Law of Large Numbers: {i} Tosses',
'annotations': [{
'text': subtitle,
'xref': 'paper', 'yref': 'paper',
'x': 0.5, 'y': 1.1,
'showarrow': False,
'font': {'size': 14}
}]
}
],
label=str(i)
))
# Initial figure state
fig = go.Figure(
data=[go.Scatter(x=[1], y=[running_heads_prob[0]], mode='lines', name='Observed Prob')]
)
# Add a dashed line at 0.5 to show the theoretical target
fig.add_hline(y=0.5, line_dash="dash", line_color="red", annotation_text="Theoretical P=0.5")
fig.update_layout(
sliders=[dict(steps=steps, currentvalue={'prefix': 'Number of tosses: '})],
yaxis=dict(range=[0, 1]),
xaxis=dict(range=[0, 5000]),
xaxis_title='Trial number',
yaxis_title='Estimated probability',
margin=dict(t=120)
)
# Set the initial annotation for the first step (10 tosses)
fig.update_layout(annotations=steps[0]['args'][1]['annotations'])
fig.show()
Operational Insight: In Data Science, we rely on this convergence for every A/B test. If your sample size is too small, you are merely looking at "early noise." If it's large enough, the "noise" cancels out, and the true signal of the underlying world-count emerges.
Summary: The Operational Takeaway¶
Probability is about organising your sampling strategy: - Define your Sample Space (The Grid).
- Determine your Consumption (Replacement vs. No Replacement).
- Decide if Sequence Matters (Permutations vs. Combinations).
- Await Convergence (The LLN).
When you build with these mental models, you stop "guessing" at statistics and start architecting certainties.