Random Distributions

One of the first things I did when I started building out my framework to make art was to develop some tools for producing numbers on various distributions. At the base of it all is a function that produces uniformly distributed random (or pseudo-random) numbers in the range [0, 1), that is greater than or equal to zero but less than one.

The standard uniform distribution is fine for some things, but sometimes you want a different ranges of values, or other distributions. Given this standard uniform distribution it is easy enough to produce numbers in other distributions.

Uniform, different ranges

A uniform distribution is useful of you want an even selection, say, of colours, or of a choice of shapes. It is also hard early in a project to get a sense of the parameter space for some particular algorithm.

Sometimes you want numbers from, say, -0.5 to 0.5 instead of 0 to 1, or from 10 to 100. This is a matter of multiplying by the range and adding the minimum value:

min + random() * (max - min)

What's more, I generally find I want inclusive ranges (that include the top number). That's a little trickier: adding in tiny ε value and then taking values in the range [max, max + ε) as equal to max. This does slightly over-weight the max value, depending on how small ε is.


A normal distribution is useful when you have locked down the good points in a parameter space but still want a little variety. I use it a lot for turning angles, or position jitters.

To compute the normal distribution from the standard uniform distribution, I use the Marsaglia polar method. Two (uniform) random values u and v in the range (-1, 1) are generated such that s = u² + v² < 1. The random value is then μ + σu√(-2 ln(s)/s), where μ is the mean and σ is the standard deviation.

Skewed Normal

A skewed normal distribution is useful for a lot of the same cases as the normal distribution, but when you want to inject a certain amount of bias into the outcome: most angles bigger, but some are smaller, say.

A skewed normal distribution is defined by the parameters α, ξ, and ω. α is the skew: positive for a rightward skew above the mean, negative for a leftward skew below the mean. ξ is the location, a shift along the x-axis. ω is the scaling of the y-axis.

I use the approximation in [1], which is based on the observation that if you have two independent standard normal distributions u and v then s = (α|u|+v)/√(1+α²) is a skewed normal distribution with a skew of α. The location and scaling are applied to this to get the desired distribution: ωs + ξ.

I find it more convenient to think of the distribution in terms of mean and standard deviation, like a normal distribution. Given δ = (α/√(1+α²)), the mean is μ = ξ + ωδ√(2/π) and the standard deviation is σ = √(ω²(1-2δ²/π)), so the scale is ω = σ/√(1-2δ²/π) and the location is ξ = μ - ωδ√(2/π).


The Bernoulli distribution returns either a 0 or a 1 with a certain probability. This is useful for coin-flip kind of decisions: should I turn left or right? Should this be shaded or not?

The implementation is easy: compare a value from the standard uniform distribution to the probability. If the value is below, return 1; if it is above, return 0.


The Zipf distribution is useful for natural scalings: most things are small, some are a little bigger, and a few are very big. I use it for sizes, or when I want a strong bias toward certain selections while allowing for the others.

I use a truncated Zipf distribution (1 to N) using a precomputed cumulative probability table. The probability for the jth value is 1/(j^α Σ[i=1:n](1/i)^α). The cumulative probability is just the sum of the probabilities through j.

The cumulative probability table is used to produce a value by generating a probe probability using the standard uniform distribution, and then conducting a binary search for the index of the lowest value in the cumulative probability table that is greater than the probe probability.

Adhoc probability

Cumulative probability tables can be constructed using adhoc values just as easily. I have a helper function that takes a set of percentages and computes the cumulative probability table from them. This is handy for when you want to make a biased selection of components or colours, e.g. red 20% of the time, blue 50% of the time, and green 30% of the time.

The same binary search operation used for the Zipf distribution uses the table to produce index values with the appropriate distribution.


A simple Markov chain can be created using essentially the same cumulative probability table as before. Here, however, the table is a table of tables, indexed by the prior index value. Markov chains come up when there is a sequence of components that you want to correlate with one another in some fashion.


A multimodal distribution is just one that is the combination of two or more other distributions. I've used it for things like when I want peaks around a couple of values but an excluded middle.

This is implemented as a uniform selection of which distribution to select from, followed by a selection from that distribution.


The constant distribution is obvious to implement and obviously boring. It isn't a "random" distribution at all. Yet I have an implementation for it. Why? Read on.

Implementation by descriptor

For my purposes in art, I am often tinkering with the distributions I use. It is useful, therefore, to have them represented as data rather than embedded deep in the code as a bunch of function calls and mathematical operations. This also allows me to dump out the distribution information into metadata so I know what distributions were used for a particular result. The descriptors are maps with keys for the distribution type, its basic parameters (e.g. mean, Zipf alpha, etc.). A generic function takes one of these descriptors to produce a value. Here is where the "constant" distribution comes in: I can fix certain parts of the algorithm to test the effects of other parts by using a constant distribution, without having to change anything in the code but a descriptor in a table.

The descriptors also allow for some other features wrapped into the generic randomization functions.

Range truncation

There are a couple of ways to handle values outside of a desired minimum/maximum range with various pros and cons:

  1. truncation/rounding: clamp the out of bounds value to the desired minimum or maximum. Quick, easy, but can give you value spikes at these extremes.
  2. resampling: keep trying until you find a value that works. Can take an arbitrary amount of time, unless you cap the number of retries and fall back on another method, but preserves the shape of the distribution.
  3. no value: don't return the bad value. Only appropriate if you are getting a pool of values and some slop in the number returned is OK.


For distributions that return an index value such as the Zipf distribution, sometimes you just take that value as an index into some table of values, but often you just want to scale it by some value. Scaling can be done either before range truncation or after.

Casting, rounding

Sometimes you want an integer, sometimes you want a float, sometimes you want a boolean value. Since the output of a lot of this, for me, is an SVG file, it is also useful to be able to eliminate a lot of the extra trailing digits, and just keep 2 or 3. The descriptors have properties for all this.

This grid shows the output from various distributions. The range of all values is the same, only the distributions differ. Left to right, top to bottom the distributions for each cell are: uniform, normal with a high standard deviation, Zipf, left-skewed normal, Bernoulli (50%), right-skewed normal, a particular Markov table, normal with a small standard deviation, and a multimodal distribution with narrow normal distributions to the extremes. Sizes are ordered top to bottom, colours left to right, and the number of circles is left sequential.