## What is Randomization?

Randomization is the process of using chance methods to assign subjects to treatment groups. In an A/B test that would usually be users (potential clients) or clients. If the target group sizes are equal then through randomization each participant in an experiment has an equal probability to be assigned to any of the groups.

In this sense randomization is not haphazard but simply a process whose outcomes do not follow a deterministic pattern, but an evolution described by a probability distribution. Thus, a random sample of users from your website visitors refers to a sample where every individual has a known probability of being sampled. The users were not arbitrarily selected.

Randomization is a **key part of any randomized controlled experiment** including an online controlled experiment due to its role in assuring the validity of any statistical calculation (e.g. significance test performed afterwards due to the fact that many of the statistical methods assume randomization has been performed and any error-inducing factors are randomly dispersed. Its importance was first stressed on by Ronald Fisher who introduced it as a method for controlling the unknown causes of variation of the parameter of interest. Using randomization we can produce a statistical model in which the outcome variable can be modeled as a random variable. This is due to the fact that any unknown confounding variables have an equal probability of affecting any test group (assuming equal allocation).

Randomization also ensures that the distribution of users among test groups is an independent variable with regards to the test intervention: no user or group of users is preferred to be assigned to any particular group due to desirable or undesirable characteristics (e.g. location, browser, connection speed).

Note that even though randomization tends towards equal distribution between factors with larger sample sizes it **does not guarantee equal distribution** of all relevant factors (e.g. traffic source, location, device, browser). An equal distribution is not a necessary prerequisite for a valid statistical analysis since the chance of an unequal distribution happening is taken into account in the resulting statistics.

Randomized blocking can be employed when one or more factors are known to be causally linked to the parameter of interest, however, given the continuous nature of data gathering in A/B testing it is often hard to balance the factors in practice. Also, blocking and pure randomization also lead to the same distribution on major factors given the sample size of most online A/B tests. If one uses a block design then appropriate methods should be used since a naive p-value calculation that doesn't take blocking into account will likely significantly under-appreciate how unexpected the result is.

## Articles on Randomization

Like this glossary entry? For an in-depth and comprehensive reading on A/B testing stats, check out the book "Statistical Methods in Online A/B Testing" by the author of this glossary, Georgi Georgiev.