📊 Mathematics & Statistics  ·  24 February 2026

Statistics Fundamentals

Statistics is the grammar of science. Whether you're analysing experimental data, building machine-learning models, or simply reading a poll, a handful of core ideas are all you need to make sense of numbers. This article covers the essentials — with interactive simulations to build real intuition.


1. Central Tendency — Where Is the Data?

When you have a dataset, the first question is: what is a typical value? Three measures answer this differently.

Mean — the arithmetic average. Sum all values and divide by nn:

xˉ=1ni=1nxi\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i

The mean is sensitive to outliers. One extreme value can pull it far from the bulk of the data.

Median — the middle value when the data is sorted. If nn is even, it's the average of the two middle values. The median is robust: a single outlier can't move it much. This is why economists report median household income rather than mean — a handful of billionaires would inflate the mean beyond any useful meaning.

Mode — the most frequently occurring value. Useful for categorical data (e.g., most popular shoe size) and for identifying peaks in a distribution.

Try it: Switch between Symmetric, Right-skewed, and Left-skewed distributions. Notice how the mean chases the long tail while the median stays closer to the centre of mass.


2. Standard Deviation — How Spread Out Is the Data?

The mean tells you where the data sits. The standard deviation (σ\sigma) tells you how spread out it is around that centre.

σ=1ni=1n(xixˉ)2\sigma = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^2}

The formula first computes the average squared distance from the mean (the variance, σ2\sigma^2), then takes the square root to return to the original units.

A small σ\sigma means values cluster tightly around the mean. A large σ\sigma means they're spread wide. On a normal distribution, increasing σ\sigma flattens and widens the bell curve — but the total area underneath always stays exactly 1 (it's a probability distribution).

Try it: Set σ₁ = 1 and σ₂ = 4. The wider curve has the same centre and the same total area — it just distributes probability over a larger range.


3. The Empirical Rule — 68–95–99.7

For a normal distribution (the symmetric bell curve), the proportion of data within each σ band follows a remarkably clean pattern:

RangeApprox.Exact
μ±1σ\mu \pm 1\sigma~68%68.27%
μ±2σ\mu \pm 2\sigma~95%95.45%
μ±3σ\mu \pm 3\sigma~99.7%99.73%

This is sometimes called the 68-95-99.7 rule or the three-sigma rule. It shows up everywhere:

  • In manufacturing, "six sigma" quality means fewer than 3.4 defects per million (±6σ from the target).
  • In particle physics, a "5σ discovery threshold" means the probability of seeing the result by chance is less than 1 in 3.5 million.
  • In finance, a "2σ event" in asset returns is expected to happen roughly 1 in 20 trading days.

Try it: Toggle between 1σ, 2σ, and 3σ ranges. The annotation shows the exact percentage of the distribution contained in each band.


4. Z-Scores — Standardising Any Value

A Z-score converts any measurement into a universal scale: how many standard deviations from the mean is this value?

z=xμσz = \frac{x - \mu}{\sigma}

This lets you compare values from completely different scales. A student scoring 72 on a test with μ=65\mu = 65, σ=5\sigma = 5 has z=1.4z = 1.4 — moderately above average. A patient with a blood pressure reading can be standardised to the same scale for comparison.

The colour indicator in the simulation follows a common convention:

  • z<1|z| < 1: typical — most values land here (68% of the distribution)
  • z1|z| \geq 1: moderate
  • z2|z| \geq 2: unusual — only ~5% of values are this far out
  • z3|z| \geq 3: extreme — less than 0.3% probability under a normal distribution

Try it: Move the value xx while keeping μ and σ fixed. Watch the z-score and classification update in real time.


5. Confidence Intervals — Estimating the Unknown

In practice, you rarely know the true population mean μ\mu. You draw a sample and compute its mean xˉ\bar{x}. The question is: how close is xˉ\bar{x} to μ\mu?

A confidence interval gives a plausible range for the true parameter:

CI=xˉ±zσn\text{CI} = \bar{x} \pm z^* \cdot \frac{\sigma}{\sqrt{n}}

where:

  • σ/n\sigma / \sqrt{n} is the standard error (SE) — the standard deviation of the sample mean
  • zz^* is the critical value: 1.645 for 90%, 1.96 for 95%, 2.576 for 99% confidence

A common misconception: "95% confidence" does not mean there's a 95% chance the true mean falls in this particular interval. The true mean either is or isn't in the interval — it's fixed. What it means is: if you repeated the experiment many times and built a CI each time, 95% of those intervals would contain the true mean.

Two levers narrow the interval:

  1. Increase nn (more data) — SE shrinks as 1/n1/\sqrt{n}
  2. Decrease confidence level — a 90% CI is narrower than a 99% CI, but less reliable

Try it: Drag the sample size slider from 5 to 500. The interval collapses as data accumulates.


Interactive Simulations

1. Central Tendency

Loading chart...
500

Mean: 49.87

Median: 49.32

Mode: 48.03

Std Dev: 10.09

2. Standard Deviation

Loading chart...
0
1
2

3. Empirical Rule (68-95-99.7)

Loading chart...
0
1

4. Z-Scores

Loading chart...

z = (x − μ) / σ = 1.400

Moderate (|z| ≥ 1)

5. Confidence Intervals

Loading chart...
30

Std Error: 2.739

Margin of Error: ±5.368

CI: [94.63, 105.37]

6. Playground

Loading chart...
0
10
500

Sample Mean: -0.13

Sample Std: 10.09

Sample Median: -0.68

True μ: 0.0

True σ: 10.0


Quick Reference

ConceptFormulaWhat it tells you
Meanxˉ=xi/n\bar{x} = \sum x_i / nArithmetic average
MedianMiddle value (sorted)Robust centre (outlier-resistant)
Std Deviationσ=(xixˉ)2/n\sigma = \sqrt{\sum(x_i-\bar{x})^2/n}Spread in original units
Z-scorez=(xμ)/σz = (x - \mu)/\sigmaStandard deviations from the mean
Standard ErrorSE=σ/nSE = \sigma/\sqrt{n}Uncertainty of the sample mean
95% CIxˉ±1.96SE\bar{x} \pm 1.96 \cdot SEPlausible range for the true mean

Further Reading

Explore more simulations

Every concept on PhysicStuff has an interactive visualisation — no login, no setup required.