📊 Mathematics & Statistics · 24 February 2026
Statistics Fundamentals
Statistics is the grammar of science. Whether you're analysing experimental data, building machine-learning models, or simply reading a poll, a handful of core ideas are all you need to make sense of numbers. This article covers the essentials — with interactive simulations to build real intuition.
1. Central Tendency — Where Is the Data?
When you have a dataset, the first question is: what is a typical value? Three measures answer this differently.
Mean — the arithmetic average. Sum all values and divide by :
The mean is sensitive to outliers. One extreme value can pull it far from the bulk of the data.
Median — the middle value when the data is sorted. If is even, it's the average of the two middle values. The median is robust: a single outlier can't move it much. This is why economists report median household income rather than mean — a handful of billionaires would inflate the mean beyond any useful meaning.
Mode — the most frequently occurring value. Useful for categorical data (e.g., most popular shoe size) and for identifying peaks in a distribution.
Try it: Switch between Symmetric, Right-skewed, and Left-skewed distributions. Notice how the mean chases the long tail while the median stays closer to the centre of mass.
2. Standard Deviation — How Spread Out Is the Data?
The mean tells you where the data sits. The standard deviation () tells you how spread out it is around that centre.
The formula first computes the average squared distance from the mean (the variance, ), then takes the square root to return to the original units.
A small means values cluster tightly around the mean. A large means they're spread wide. On a normal distribution, increasing flattens and widens the bell curve — but the total area underneath always stays exactly 1 (it's a probability distribution).
Try it: Set σ₁ = 1 and σ₂ = 4. The wider curve has the same centre and the same total area — it just distributes probability over a larger range.
3. The Empirical Rule — 68–95–99.7
For a normal distribution (the symmetric bell curve), the proportion of data within each σ band follows a remarkably clean pattern:
| Range | Approx. | Exact |
|---|---|---|
| ~68% | 68.27% | |
| ~95% | 95.45% | |
| ~99.7% | 99.73% |
This is sometimes called the 68-95-99.7 rule or the three-sigma rule. It shows up everywhere:
- In manufacturing, "six sigma" quality means fewer than 3.4 defects per million (±6σ from the target).
- In particle physics, a "5σ discovery threshold" means the probability of seeing the result by chance is less than 1 in 3.5 million.
- In finance, a "2σ event" in asset returns is expected to happen roughly 1 in 20 trading days.
Try it: Toggle between 1σ, 2σ, and 3σ ranges. The annotation shows the exact percentage of the distribution contained in each band.
4. Z-Scores — Standardising Any Value
A Z-score converts any measurement into a universal scale: how many standard deviations from the mean is this value?
This lets you compare values from completely different scales. A student scoring 72 on a test with , has — moderately above average. A patient with a blood pressure reading can be standardised to the same scale for comparison.
The colour indicator in the simulation follows a common convention:
- : typical — most values land here (68% of the distribution)
- : moderate
- : unusual — only ~5% of values are this far out
- : extreme — less than 0.3% probability under a normal distribution
Try it: Move the value while keeping μ and σ fixed. Watch the z-score and classification update in real time.
5. Confidence Intervals — Estimating the Unknown
In practice, you rarely know the true population mean . You draw a sample and compute its mean . The question is: how close is to ?
A confidence interval gives a plausible range for the true parameter:
where:
- is the standard error (SE) — the standard deviation of the sample mean
- is the critical value: 1.645 for 90%, 1.96 for 95%, 2.576 for 99% confidence
A common misconception: "95% confidence" does not mean there's a 95% chance the true mean falls in this particular interval. The true mean either is or isn't in the interval — it's fixed. What it means is: if you repeated the experiment many times and built a CI each time, 95% of those intervals would contain the true mean.
Two levers narrow the interval:
- Increase (more data) — SE shrinks as
- Decrease confidence level — a 90% CI is narrower than a 99% CI, but less reliable
Try it: Drag the sample size slider from 5 to 500. The interval collapses as data accumulates.
Interactive Simulations
1. Central Tendency
Mean: 49.87
Median: 49.32
Mode: 48.03
Std Dev: 10.09
2. Standard Deviation
3. Empirical Rule (68-95-99.7)
4. Z-Scores
z = (x − μ) / σ = 1.400
5. Confidence Intervals
Std Error: 2.739
Margin of Error: ±5.368
CI: [94.63, 105.37]
6. Playground
Sample Mean: -0.13
Sample Std: 10.09
Sample Median: -0.68
True μ: 0.0
True σ: 10.0
Quick Reference
| Concept | Formula | What it tells you |
|---|---|---|
| Mean | Arithmetic average | |
| Median | Middle value (sorted) | Robust centre (outlier-resistant) |
| Std Deviation | Spread in original units | |
| Z-score | Standard deviations from the mean | |
| Standard Error | Uncertainty of the sample mean | |
| 95% CI | Plausible range for the true mean |
Further Reading
- Khan Academy — Statistics and Probability
- 3Blue1Brown — But what is a normal distribution?
- Seeing Theory — Brown University
Explore more simulations
Every concept on PhysicStuff has an interactive visualisation — no login, no setup required.