http://mathling.com/core/random library module
http://mathling.com/core/random
Module with functions producing random distributions of various kinds.
This version is XQuery 3.1 compliant. It is not as performant as the
MarkLogic version because random-number-generator() just isn't as usable as
xdmp:random. This works through the ugly hack of relying on generate-id()
on a constructed node to produce new seeds. That isn't entirely random
because whenever Saxon decides that one expression is the same as another
it will optimize it to produce the same result, but for the most part
this works ok. With Saxon-PE or Saxon-EE I use timestamp() which is more
reliable. Saxon-EE optimizer is often too clever regardless, and you get
repeated random numbers. Running with -opt:-l seems to fix this.
You may need more for some cases: -opt:-gl
Even generating a sequence of numbers off random-number-generator isn't
that great because you need to construct maps to hold both the next
generator function and the number, or use recursion in a way highly
likely to generate stack overflows. (The example in the spec suffers this
problem.)
The problem is that for my purposes I need to be able to create multiple
random sequences in the same query that are different from the sequences
in subsequent episodes. Where the sequences are partitioned into separate
modules so even if I did the ugly thing of making every function everywhere
have to cart around and return both the values I cared about and the
current randomizer (which XQuery is so very not good at), and used explicit
distinct seeds: whence those seeds? They need to be different based on
global program context and different from each run of the program. So
current-dateTime() concatenated with a value from a value range that is
preassigned to each component module. Or: cart around yet another randomizer
to provide the seeds. It is ugly, ugly, ugly any way you slice it.
Usage:
You can either use the explicit distribution functions here, e.g.
rand:normal(10.0, 2.0), or construct a distribution descriptor using
the distributions API and then pass it into a call to rand:randomize(),
e.g. rand:randomize(dist:normal(10.0, 2.0))
The latter approach allows for more fine control over the behaviour of
the final result (e.g. you can set a minimum value, a post-multiplier, etc.)
and is useful when you have a randomization algorithm that is getting
reused a lot, or that you might want to change up without having to find
all the places where it is called in the code.
Copyright© Mary Holstege 2020-2023
CC-BY (https://creativecommons.org/licenses/by/4.0/)
Status: Stable
Imports
http://mathling.com/core/utilitiesimport module namespace util="http://mathling.com/core/utilities" at "../core/utilities.xqy"http://mathling.com/type/distribution
import module namespace dist="http://mathling.com/type/distribution" at "../types/distributions.xqy"http://mathling.com/core/errors
import module namespace errors="http://mathling.com/core/errors" at "../core/errors.xqy"
Variables
Variable: $SCALING as xs:integer
ε is used to handle the switch from exclusive to inclusive bounds.
Variable: $DECIMAL-DIGITS as xs:integer
Number of fractional decimal digits to keep when we cast to decimal.
Variable: $π as xs:double
Pi
Math looks better with a real letter in there. Pity about having to put in
the module prefix.
Variable: $RESAMPLE-LIMIT as xs:integer
Number of resamplings will we try before we give up
Variable: $STD-UNIFORM as map(xs:string,item()*)
Uniform distribution [0,1.0)
Variable: $STD-UNIT as map(xs:string,item()*)
Uniform distribution (-1.0,1.0)
Variable: $STD-NORMAL as map(xs:string,item()*)
Normal distribution mean=0 std=1
Variable: $STD-DEGREES as map(xs:string,item()*)
Uniform angles (degrees) [0, 360.0)
Functions
Function: seed
declare function seed() as xs:anyAtomicType
declare function seed() as xs:anyAtomicType
Function to hand out seeds:
Ugly hack alert: browbeat r-n-g into using a new seed => new number
See top comment. If you start getting samey-samey sequences all I
can suggest is to spring for Saxon-PE.
Returns
- xs:anyAtomicType
declare %art:non-deterministic function this:seed() as xs:anyAtomicType { (: Function lookup in the closure of module imported into XSLT : doesn't appear to work and we end up with the fallback. : We could move the lookup and text inside the function body, : but that will slow down normal randomization calls in XQuery with : repeated checks. :) let $timestamp := function-lookup(QName("http://saxon.sf.net/", "timestamp"), 0) let $rand-double := function-lookup(QName("http://basex.org/modules/random","double"), 0) return ( if (exists($timestamp)) then $timestamp() else if (exists($rand-double)) then $rand-double() else ( concat(current-dateTime(), generate-id(<n/>)) ) ) }
Function: flip
declare function flip($percent as xs:double) as xs:boolean
declare function flip($percent as xs:double) as xs:boolean
flip()
Returns random Bernoulli distributed data as a boolean
Params
- percent as xs:double: Percent that should come up true()
Returns
- xs:boolean: true() or false()
declare %art:non-deterministic function this:flip($percent as xs:double) as xs:boolean { (: 100*random-number-generator(this:seed())("number") lt $percent :) 100*$this:RANDOM_0_1() lt $percent }
Function: constant
declare function constant($min as xs:double?, $max as xs:double?) as xs:double
declare function constant($min as xs:double?, $max as xs:double?) as xs:double
constant()
Returns a constant value, either the average of $min and $max (if they
exist) or whichever one does exist. Failing everything else, 1.
This is the degenerate case of uniform() and exists as a convenience
for data-driven randomization in randomize().
Params
- min as xs:double?: minimum value
- max as xs:double?: maximum value
Returns
- xs:double: constant
declare function this:constant($min as xs:double?, $max as xs:double?) as xs:double { if (exists($min) and exists($max)) then ( if ($max < $min) then errors:error("RAND-BADMINMAX",($min,$max)) else if ($max = $min) then $min else ($max + $min) div 2 ) else ( ($min,$max,1)[1] ) }
Function: uniform
declare function uniform($min as xs:numeric, $max as xs:numeric) as xs:double
declare function uniform($min as xs:numeric, $max as xs:numeric) as xs:double
uniform()
Return random uniformly distributed data in the range.
Because the base randomizer is exclusive on the upper bound, add ε
and then cap. This creates a slight over-weighting of the max value,
depending on fineness of ε.
When we are using integers, we need to avoid chopping off the top of
the range, or we lose. We can get it, but it is unlikely and for small
integer ranges spectacularly unlikely. So we add 1-ε instead of ε and
quantize at this level. You want to avoid this and get, say, floating
point values between 5 and 10 (inc), pass floating point arguments.
Base randomizer gives [0,1) we want [x,y]
[0,1) * (ε+y-x) => [0,ε+y-x)
[0,ε+y-x) + x => [x, y+ε)
min(([x, y+ε),y) => [x,y]
For integers:
[0,1) * (1-ε+y-x) => [0,1-ε+y-x)
[0,1-ε+y-x) + x => [x, y+1-ε)
[x,y+1-ε) idiv 1 => [x, y]
Params
- min as xs:numeric: Lower bound (inclusive) of range
- max as xs:numeric: Upper bound (inclusive) of range
Returns
- xs:double: random number in the range
declare %art:non-deterministic function this:uniform($min as xs:numeric, $max as xs:numeric) as xs:double { if ($max < $min) then errors:error("RAND-BADMINMAX",($min,$max)) else if ($max = $min) then $min else if ( ($max instance of xs:integer) and ($min instance of xs:integer) ) then ( (: let $random_0_1 := random-number-generator(this:seed())("number") return ((1 - $this:ε + $max - $min)*$random_0_1 + $min) idiv 1 :) ((1 - $this:ε + $max - $min)*$this:RANDOM_0_1() + $min) idiv 1 ) else ( (: let $random_0_1 := random-number-generator(this:seed())("number") return min(( ($this:ε + $max - $min)*$random_0_1 + $min, $max )) :) min(( ($this:ε + $max - $min)*$this:RANDOM_0_1() + $min, $max )) ) }
Function: normal
declare function normal($mean as xs:double, $std as xs:double) as xs:double
declare function normal($mean as xs:double, $std as xs:double) as xs:double
normal()
Return random normally distributed data.
Params
- mean as xs:double: Mean of range of values
- std as xs:double: Standard deviation of range of values
Returns
- xs:double: random number
declare %art:non-deterministic function this:normal($mean as xs:double, $std as xs:double) as xs:double { let $randomizer := random-number-generator(this:seed()) return $mean + this:gauss($randomizer) * $std }
Function: skewed
declare function skewed($mean as xs:double, $std as xs:double, $skew as xs:double) as xs:double
declare function skewed($mean as xs:double, $std as xs:double, $skew as xs:double) as xs:double
skewed()
Return random data in skewed normal distribution.
Params
- mean as xs:double: Mean of distribution
- std as xs:double: Standard deviation of distribution
- skew as xs:double: Skew of distribution
Returns
- xs:double: random number Skewed defined by parameters α, ξ, and ω α is skew α = 0 => normal, α > 0 right skewed, α < 0 left skewed right skewed => long tail on right ξ = location = shift along x ω = scaling along y ω is positive mean = ξ + ω*δ*√(2/π) δ=α/√(1+α²) std = √(ω²*(1 - 2*δ²/π)) So: ω = std / √(1 - 2*δ²/π)) ξ = mean - ω*δ*√(2/π)
declare %art:non-deterministic function this:skewed($mean as xs:double, $std as xs:double, $skew as xs:double) as xs:double { let $α := $skew let $δ := $α div math:sqrt(1 + $α*$α) let $ω := $std div math:sqrt(1 - (2*$δ*$δ div $this:π)) let $ξ := $mean - $ω*$δ*math:sqrt(2 div $this:π) let $σ := 1 div math:sqrt(1 + $α*$α) let $u := this:normal(0, 1) let $v := this:normal(0, 1) let $u1 := $σ*($α*abs($u) + $v) return ( $ω * $u1 + $ξ ) }
Function: binomial
declare function binomial($n as xs:integer,
$percent as xs:double) as xs:integer
declare function binomial($n as xs:integer, $percent as xs:double) as xs:integer
binomial()
Return random data in a binomial distribution.
Params
- n as xs:integer: the n distribution parameter = max value
- percent as xs:double: probability distribution parameter as a percent [0,100]
Returns
- xs:integer: random integer in range μ = np σ² = np(1-p)
declare %art:non-deterministic function this:binomial( $n as xs:integer, $percent as xs:double ) as xs:integer { let $p := $percent div 100 let $seq := dist:simple-sums( for $k in 1 to $n return util:binomial($n, $k)*math:pow($p, $k)*math:pow(1 - $p, $n - $k) ) let $u := this:uniform(0E0, 1E0) return util:rangeindex($seq, $u) }
Function: binomial-beta
declare function binomial-beta($n as xs:integer,
$α as xs:double,
$β as xs:double) as xs:integer
declare function binomial-beta($n as xs:integer, $α as xs:double, $β as xs:double) as xs:integer
binomial-beta()
Return random data in a Beta binomial distribution.
Params
- n as xs:integer n distribution parameter, max value
- α as xs:double alpha distribution parameter
- β as xs:double beta distribution parameter
Returns
- xs:integer: random integer
declare %art:non-deterministic function this:binomial-beta( $n as xs:integer, $α as xs:double, $β as xs:double ) as xs:integer { util:assert($n > 0, "n <= 0"), util:assert($α > 0, "α <= 0"), util:assert($β > 0, "β <= 0"), let $p := this:beta($α, $β) return this:binomial($n, $p * 100) }
Function: binomial-poisson
declare function binomial-poisson($probabilities as xs:double*) as xs:double
declare function binomial-poisson($probabilities as xs:double*) as xs:double
binomial-poisson()
Return random data in a Poisson binomial distribution.
Params
- probabilities as xs:double*: probabilities of the base Bernoulli variables (as percents [0,100])
Returns
- xs:double: random number μ = Σp[i] σ = √(Σ(1 - p[i])p[i])
declare %art:non-deterministic function this:binomial-poisson($probabilities as xs:double*) as xs:double { sum(for $p in $probabilities return if (this:flip($p)) then 1 else 0) }
Function: poisson
declare function poisson($mean as xs:double) as xs:double
declare function poisson($mean as xs:double) as xs:double
poisson()
Return random data in a Poisson distribution.
Using inverse transform sampling method. If λ is large and e^-λ underflows;
this will not work well.
Params
- mean as xs:double: the lambda parameter
Returns
- xs:double: random number μ = λ σ = λ
declare %art:non-deterministic function this:poisson($mean as xs:double) as xs:double { let $u := this:uniform(0E0, 1E0) return util:while( function($x as xs:double, $p as xs:double, $s as xs:double) as xs:boolean { $u > $s }, function($x as xs:double, $p as xs:double, $s as xs:double) as xs:double* { let $x := $x + 1 let $p := $p * $mean div $x let $s := $s + $p return ( $x, $p, $s ) }, 0E0, math:exp(-$mean), math:exp(-$mean) )=>head() }
Function: exponential
declare function exponential($λ as xs:double) as xs:double
declare function exponential($λ as xs:double) as xs:double
exponential()
Return random data in an exponential distribution.
Params
- λ as xs:double: the distribution parameter
Returns
- xs:double: random number μ = 1/λ Using inverse transform sampling -ln(U)/λ
declare %art:non-deterministic function this:exponential($λ as xs:double) as xs:double { let $u := this:uniform(0E0, 1E0) return -math:log($u) div $λ }
Function: gamma
declare function gamma($k as xs:double,
$θ as xs:double) as xs:double
declare function gamma($k as xs:double, $θ as xs:double) as xs:double
gamma()
Return random data in a gamma distribution.
Params
- k as xs:double: the k distribution parameter > 0 ("shape")
- θ as xs:double: the θ distribution parameter > 0 ("scaling")
Returns
- xs:double: random number μ = kθ σ² = kθ² Using inverse transform sampling in combination with the Ahrens-Dieter acceptance-rejection method (performance is linear wrt k)
declare %art:non-deterministic function this:gamma( $k as xs:double, $θ as xs:double ) as xs:double { util:assert($k > 0, "k <= 0"), util:assert($θ > 0, "θ <= 0"), let $n := floor($k) cast as xs:integer let $δ := $k - $n let $ξ := if ($δ = 0) then 0 else this:ahrens-dieter($δ, random-number-generator(this:seed())) return ( $θ * ($ξ - sum(for $i in 1 to $n return math:log(this:uniform(0E0, 1E0)))) ) }
Function: beta
declare function beta($α as xs:double,
$β as xs:double) as xs:double
declare function beta($α as xs:double, $β as xs:double) as xs:double
beta()
Return random data in a beta distribution.
Params
- α as xs:double: the α distribution parameter
- β as xs:double: the β distribution parameter
Returns
- xs:double: random number μ = α/(α+β) σ² = αβ/(α+β)²(α+β+1)
declare %art:non-deterministic function this:beta( $α as xs:double, $β as xs:double ) as xs:double { util:assert($α > 0, "α <= 0"), util:assert($β > 0, "β <= 0"), let $u := this:gamma($α, 1) let $v := this:gamma($β, 1) return ( $u div ($u + $v) ) }
Function: zipf
declare function zipf($alpha as xs:double, $n as xs:integer) as xs:integer
declare function zipf($alpha as xs:double, $n as xs:integer) as xs:integer
zipf()
Return random data in a Zipf distribution as an integer index from 1 to N
Note that this constructs the sum table anew each time: not very efficient.
You'd be better off constructing the distribution or just calling
select-index() directly
p(n)=(1/k^α)/sum(i=1 to n)(1/i^α)
n = number of elements
k = rank
α = exponent
Params
- alpha as xs:double: the α parameter, should be >=1 For English words this by some estimates is around 1.6 For city sizes 1.07
- n as xs:integer: number of Zipf values in range
Returns
- xs:integer: random integer in range
declare %art:non-deterministic function this:zipf($alpha as xs:double, $n as xs:integer) as xs:integer { let $zipf-weights as xs:double* := this:zipf-sums($alpha,$n+1) return this:select-index($zipf-weights, $n) }
Function: shuffle
declare function shuffle($data as item()*)
declare function shuffle($data as item()*)
shuffle()
Return the data items in a (uniformly) random order.
Params
- data as item()*: Data items to shuffle
declare %art:non-deterministic function this:shuffle($data as item()*) { random-number-generator(this:seed())?permute($data) }
Function: percent-sums
declare function percent-sums($weights as map(xs:string,item()*)) as xs:double*
declare function percent-sums($weights as map(xs:string,item()*)) as xs:double*
percent-sums()
Returns a sequence of cumulative probabilities, to be used
by select-index() to perform selections. This is to ensure we don't have to
keep recomputing the cumulative sums for each selection.
To account for rounding, the last item is always forced to 1 and any sums
greater than 1 are also rounded down to 1. To get expected distributions,
ensure that the input percentages sum to 100.
Sums are returned in map:keys() order, so this assumes stability of that
function between calls to percent-sums() and calls to selection().
Params
- weights as map(xs:string,item()*) a map from key to percent (expressed as integer 0 to 100)
Returns
- xs:double*: cumulative probability table
declare function this:percent-sums($weights as map(xs:string,item()*)) as xs:double* { dist:percent-sums($weights) }
Function: zipf-sums
declare function zipf-sums($alpha as xs:double, $n as xs:integer) as xs:double*
declare function zipf-sums($alpha as xs:double, $n as xs:integer) as xs:double*
zipf-sums()
Returns a sequence of cumulative Zipf probabilities, to be used
by select-index() to perform selections. This is to ensure we don't have to
keep recomputing the cumulative sums for each selection.
p(k)=(1/k^α)/sum(i=1 to n)(1/i^α)
n = number of elements
k = rank
α = exponent
Params
- alpha as xs:double: the alpha parameter, should be >=1 For English words this by some estimates is around 1.6 For city sizes 1.07
- n as xs:integer: number to generate
Returns
- xs:double*: cumulative probability table
declare function this:zipf-sums($alpha as xs:double, $n as xs:integer) as xs:double* { dist:zipf-sums($alpha, $n) }
Function: select-index
declare function select-index($sums as xs:double*, $n as xs:integer) as xs:integer
declare function select-index($sums as xs:double*, $n as xs:integer) as xs:integer
select-index()
Return a rank from 1 to N of values distributed according to cumulative
probability sums. Use zipf-sums(), percent-sums(), etc. to compute those.
Params
- sums as xs:double*: cumulative probability sums, computed via zipf-sums() or percent-sums()
- n as xs:integer: maximum rank to return ($n should be <= count($sums))
Returns
- xs:integer: random integer in range
declare %art:non-deterministic function this:select-index($sums as xs:double*, $n as xs:integer) as xs:integer { (: Get value in (0,1) exclusive; underlying randomizer returns inclusive : values, so generate value within epsilon of bounds instead :) let $p := this:uniform($this:ε, 1 - $this:ε) return util:rangeindex($sums[position()=(1 to $n)], $p, $n) }
Function: selection
declare function selection($sums as xs:double*, $values as item()*) as item()
declare function selection($sums as xs:double*, $values as item()*) as item()
selection()
Return a selection using data distributed per the cumulative probability sums.
Params
- sums as xs:double*: cumulative probability sums, computed via zipf-sums(), percent-sums(), etc.
- values as item()*: values to select, in rank order
Returns
- item(): random value
declare %art:non-deterministic function this:selection($sums as xs:double*, $values as item()*) as item() { let $n := min((count($sums),count($values))) let $k := this:select-index($sums, $n) return $values[$k] }
Function: check-markov-matrix
declare function check-markov-matrix($dim as xs:integer, $matrix as xs:double*, $is-cumulative as xs:boolean) as empty-sequence()
declare function check-markov-matrix($dim as xs:integer, $matrix as xs:double*, $is-cumulative as xs:boolean) as empty-sequence()
check-markov-matrix()
Check whether the matrix is a valid Markov matrix, raise error if it is not.
Params
- dim as xs:integer matrix size (matrix is dim X dim)
- matrix as xs:double* data values
- is-cumulative as xs:boolean true() if this is a cumulative probability matrix
Returns
declare function this:check-markov-matrix($dim as xs:integer, $matrix as xs:double*, $is-cumulative as xs:boolean) as empty-sequence() { let $size := count($matrix) let $sqrt := math:sqrt($size) return ( (: Matrix must be square :) if ($sqrt * $sqrt ne $size) then errors:error("RAND-NOTSQUARE", $size) else if ($dim ne ($sqrt cast as xs:integer)) then errors:error("RAND-BADDIM", ($dim, $sqrt)) else () , (: Values must be probabilities :) for $i in 1 to $size return ( if ($matrix[$i] < 0 or $matrix[$i] > 1) then errors:error("RAND-BADPROBABILITY", ($i, $matrix[$i])) else () ) , (: Each row should have a cumulative probability of 1 :) (: If $is-cumulative, each row has been expressed as cumulative : probabilities already (for efficiency) so we check p<1 vs sum(p)<1 : but we did that check above already. :) if ($is-cumulative) then () else ( for $i in 1 to $dim return ( let $sprob := sum($matrix[position()=(($i - 1)*$dim + 1 to ($i - 1) * $dim)]) return ( if (util:twixt($sprob, 0, 1 + $this:ε)) then () else errors:error("RAND-BADSUMPROB", ($i, $sprob)) ) ) ) ) }
Errors
RAND-NOTSQUARE if data won't fit evenly in square matrix
Errors
RAND-BADDIM if dimension doesn't match amount of data
Errors
RAND-BADPROBABILITY if value is out of range [0,1]
Errors
RAND-BADSUMPROB if cumulative sums are out of range [0,1]
Function: markov-sums
declare function markov-sums($dim as xs:integer, $matrix as xs:double*) as xs:double*
declare function markov-sums($dim as xs:integer, $matrix as xs:double*) as xs:double*
markov-sums()
Return a Markov probability matrix as a matrix where each row contains
the cumulative probability sums for that row. This allows the Markov
distribution selection to run more efficiently.
Example:
markov-sums(3, (
0.2, 0.4, 0.4,
0.5, 0.5, 0,
0.4, 0.2, 0.4
)) => (
0.2, 0.6, 1,
0.5, 1, 1,
0.4, 0.6, 1
)
To account for rounding, the last item in the row is always forced to 1
and any sums greater than 1 are also rounded down to 1.
Params
- dim as xs:integer: size of each dimension of matrix, math:sqrt(count($matrix))
- matrix as xs:double*: the input non-cumulative Markov matrix
Returns
- xs:double*
declare function this:markov-sums($dim as xs:integer, $matrix as xs:double*) as xs:double* { dist:markov-sums($dim, $matrix) }
Function: markov-percent-sums
declare function markov-percent-sums($dim as xs:integer, $matrix as xs:integer*) as xs:double*
declare function markov-percent-sums($dim as xs:integer, $matrix as xs:integer*) as xs:double*
markov-percent-sums()
Return a Markov probability matrix as a matrix where each row contains
the cumulative probability sums for that row. This allows the Markov
distribution selection to run more efficiently.
Example:
markov-percent-sums(3, (
20, 40, 40,
50, 50, 0,
40, 20, 40
)) => (
0.2, 0.6, 1,
0.5, 1, 1,
0.4, 0.6, 1
)
To account for rounding, the last item in the row is always forced to 1
and any sums greater than 1 are also rounded down to 1.
Params
- dim as xs:integer: size of each dimension of matrix, math:sqrt(count($matrix))
- matrix as xs:integer*: the input non-cumulative Markov matrix
Returns
- xs:double*: cumulative probability matrix
declare function this:markov-percent-sums($dim as xs:integer, $matrix as xs:integer*) as xs:double* { dist:markov-percent-sums($dim, $matrix) }
Function: markov
declare function markov($current as xs:integer, $dim as xs:integer, $matrix as xs:double*) as xs:integer
declare function markov($current as xs:integer, $dim as xs:integer, $matrix as xs:double*) as xs:integer
markov()
Given a square matrix of Markov probabilities and a current state,
return a successor state with the probabilities as given. For efficiency,
the matrix should be given in the cumulative probability form, as created
by markov-sums() or markov-percent-sums()
Params
- current as xs:integer: Index of current state in the matrix
- dim as xs:integer: size of each dimension of matrix, math:sqrt(count($matrix))
- matrix as xs:double*: Square matrix of cumulative probabilities
Returns
- xs:integer: random index from Markov table
declare %art:non-deterministic function this:markov($current as xs:integer, $dim as xs:integer, $matrix as xs:double*) as xs:integer { if (util:twixt($current, 1, $dim)) then () else errors:error("ML-BADARGS", ("current", $current)), let $row := $matrix[position()=(($current - 1) * $dim + 1 to $current * $dim)] return this:select-index($row, $dim) }
Errors
ML-BADARGS if current index is out of range
Function: markov-selection
declare function markov-selection($current as xs:integer, $dim as xs:integer, $matrix as xs:double*, $values as item()*) as item()
declare function markov-selection($current as xs:integer, $dim as xs:integer, $matrix as xs:double*, $values as item()*) as item()
markov-selection()
Return a selection using Markov matrix distribution.
Params
- current as xs:integer: index of current selection
- dim as xs:integer: size of each dimension of matrix, math:sqrt(count($matrix))
- matrix as xs:double*: cumulative probability matrix, computed via markov-sums()
- values as item()*: values to select, in index order
Returns
- item(): value corresponding to random index from Markov table
declare %art:non-deterministic function this:markov-selection($current as xs:integer, $dim as xs:integer, $matrix as xs:double*, $values as item()*) as item() { let $k := this:markov($current, $dim, $matrix) return ( (: Make sure matrix and values are compatible :) if (util:twixt($k, 1, count($values))) then () else errors:error("ML-BADARGS", ("values", $values)), $values[$k] ) }
Errors
ML-BADARGS if current index is out of range
Errors
ML-BADARGS if computed index is out of range
Function: random-selection
declare function random-selection($weights as map(xs:string,item()*)) as xs:string
declare function random-selection($weights as map(xs:string,item()*)) as xs:string
random-selection()
Return a random selection from a set of keys, where the key is selected
per a set of probabilities. A convenience wrapper (although less efficient,
because it recomputes the sums) for selection()
Params
- weights as map(xs:string,item()*): Map of probabilities (as integer percents) from key to weight
Returns
- xs:string: random selection from the table
declare %art:non-deterministic function this:random-selection($weights as map(xs:string,item()*)) as xs:string { let $sums := this:percent-sums($weights) let $values := $weights=>map:keys() return this:selection($sums, $values) }
Function: random-assortment
declare function random-assortment($keys as xs:string*, $weights as map(xs:string,item()*)) as xs:string*
declare function random-assortment($keys as xs:string*, $weights as map(xs:string,item()*)) as xs:string*
random-assortment()
Select a subset of a set of keys, in (uniform) random order, where each key
is selected at most once, and is selected with probability given by set
of weights.
Params
- keys as xs:string*: Set of keys to choose from
- weights as map(xs:string,item()*): Map of probabilities (as integer percents) from key to weight
Returns
- xs:string*: random assortment of the keys
declare %art:non-deterministic function this:random-assortment($keys as xs:string*, $weights as map(xs:string,item()*)) as xs:string* { this:shuffle( for $key in $keys return if (this:flip($weights($key))) then $key else () ) }
Function: cast
declare function cast($value as item(), $cast-as as xs:string?)
declare function cast($value as item(), $cast-as as xs:string?)
cast()
Cast/format a value to a particular type
Service function for randomize()
Casting to decimal is actually just rounding to $DECIMAL-DIGITS fractional
digits.
Params
- value as item(): Value to cast
- cast-as as xs:string?: One of "integer", "decimal", "boolean", "string", or "double"
declare function this:cast($value as item(), $cast-as as xs:string?) { switch ($cast-as) case "integer" return $value cast as xs:integer case "decimal" return round($value * $this:DECIMAL-DIV) div $this:DECIMAL-DIV case "boolean" return boolean($value) case "string" return string($value) default return $value }
Function: range
declare function range($value as xs:numeric, $min as xs:numeric?, $max as xs:numeric?) as xs:numeric
declare function range($value as xs:numeric, $min as xs:numeric?, $max as xs:numeric?) as xs:numeric
range()
Ensure the value is in the range by changing value to minimum or maximum as
necessary.
Service function for randomize()
Assumes min < max
Params
- value as xs:numeric: The value
- min as xs:numeric?: The lowest value (optional)
- max as xs:numeric?: The highest value (optional)
Returns
- xs:numeric: value clamped to range
declare function this:range($value as xs:numeric, $min as xs:numeric?, $max as xs:numeric?) as xs:numeric { let $low := if (exists($min)) then max(($min,$value)) else $value return if (exists($max)) then min(($max,$low)) else $low }
Function: get-last
declare function get-last($algorithm as map(xs:string,item()*), $last as item()?) as item()
declare function get-last($algorithm as map(xs:string,item()*), $last as item()?) as item()
get-last()
Function available for callback use in fetch()
Params
- algorithm as map(xs:string,item()*): The algorithm to use, a map with various parameter keys (see randomize())
- last as item()?: Previous value
Returns
- item(): previous value
declare function this:get-last($algorithm as map(xs:string,item()*), $last as item()?) as item() { $last }
Function: last-fraction
declare function last-fraction($centrality as xs:double) as function(map(xs:string,item()*), item()?) as item()
declare function last-fraction($centrality as xs:double) as function(map(xs:string,item()*), item()?) as item()
last-fraction()
Return a callback function for use in fetch() that returns the a
given fraction of previous value
Params
- centrality as xs:double: the fraction to apply
Returns
- function(map(xs:string,item()*),item()?)asitem(): fraction of last value
declare function this:last-fraction($centrality as xs:double) as function(map(xs:string,item()*), item()?) as item() { %art:name("rand:last-fraction") function($algorithm as map(xs:string,item()*), $last as item()?) as item() { $centrality * $last } }
Function: select-random
declare function select-random($N as xs:integer, $last as item()?, $algorithm as map(xs:string,item()*), $values as item()*) as item()*
declare function select-random($N as xs:integer, $last as item()?, $algorithm as map(xs:string,item()*), $values as item()*) as item()*
select-random()
Given a set of values and a randomizer (see randomize()), use the output
of randomize as an index into the value set, and return the selected values.
Params
- N as xs:integer: How many values to return
- last as item()?: Previous value (needed for markov, available to key callback)
- algorithm as map(xs:string,item()*): The algorithm to use, a map with various parameter keys
- values as item()*: The value set to choose from
Returns
- item()*: random values
declare %art:non-deterministic function this:select-random($N as xs:integer, $last as item()?, $algorithm as map(xs:string,item()*), $values as item()*) as item()* { let $count := count($values) let $indices := this:randomize($N, $last, $algorithm) for $raw-index in $indices (: let $index := 1 + (($raw-index cast as xs:integer) mod $count) :) let $index := max((min(($raw-index cast as xs:integer, $count)),1)) return $values[$index] }
Function: select-random
declare function select-random($N as xs:integer, $algorithm as map(xs:string,item()*), $values as item()*) as item()*
declare function select-random($N as xs:integer, $algorithm as map(xs:string,item()*), $values as item()*) as item()*
select-random()
Given a set of values and a randomizer (see randomize()), use the output
of randomize as an index into the value set, and return the selected values.
Params
- N as xs:integer: How many values to return
- algorithm as map(xs:string,item()*): The algorithm to use, a map with various parameter keys
- values as item()*: The value set to choose from
Returns
- item()*: random values
declare %art:non-deterministic function this:select-random($N as xs:integer, $algorithm as map(xs:string,item()*), $values as item()*) as item()* { let $count := count($values) let $indices := this:randomize($N, (), $algorithm) for $raw-index in $indices let $index := max((min(($raw-index cast as xs:integer, $count)),1)) return $values[$index] }
Function: select-random
declare function select-random($algorithm as map(xs:string,item()*), $values as item()*) as item()
declare function select-random($algorithm as map(xs:string,item()*), $values as item()*) as item()
select-random()
Given a set of values and a randomizer (see randomize()), use the output
of randomize as an index into the value set, and return the selected values
Params
- algorithm as map(xs:string,item()*): The algorithm to use, a map with various parameter keys
- values as item()*: The value set to choose from
Returns
- item(): random value
declare %art:non-deterministic function this:select-random($algorithm as map(xs:string,item()*), $values as item()*) as item() { let $count := count($values) let $raw-index := this:randomize($algorithm) let $index := max((min(($raw-index cast as xs:integer, $count)),1)) return $values[$index] }
Function: select-random
declare function select-random($values as item()*) as item()
declare function select-random($values as item()*) as item()
select-random()
Given a set of values pick as (uniformly random) value from the
value set, and return it
Params
- values as item()*: The value set to choose from
Returns
- item(): random value
declare %art:non-deterministic function this:select-random($values as item()*) as item() { let $count := count($values) let $algorithm := dist:uniform(1, $count)=>dist:cast("integer")=>dist:is-complex(false()) let $raw-index := this:randomize($algorithm) (: let $index := 1 + (($raw-index cast as xs:integer) mod $count) :) let $index := max((min(($raw-index cast as xs:integer, $count)),1)) let $result := $values[$index] return ( util:assert(exists($result), "No result! Index="||$index||" raw="||$raw-index||" range=[1:"||$count||"] dist="||util:quote($algorithm)), $result ) }
Function: randomize
declare function randomize($N as xs:integer, $last as item()?, $algorithm as map(xs:string,item()*))
declare function randomize($N as xs:integer, $last as item()?, $algorithm as map(xs:string,item()*))
randomize()
Return a sequence of random data distributed as per the algorithm.
Params
- N as xs:integer: How many values to return
- last as item()?: Previous value (needed for markov, available to key callback)
- algorithm as map(xs:string,item()*): The algorithm to use, a map with various parameter keys distribution: Distribution to use, one of "constant", "uniform", "normal", "skewed", "bernoulli", "flip", "zipf", "markov", "sums", "multimodal", "binomial-poisson", "poisson", "exponential", "gamma", "beta", "binomial" "binomial-beta" min: minimum value (optional) max: maximum value (uniform, binomial, binomial-beta, else optional) pre-multiplier: multiplier before min/max (optional) post-multiplier: multiplier after min/max (optional) post-shift: shift to add after min/max (optional) cast: cast type (optional) mean: mean of distribution (normal, skewed, poisson, exponential) (default=0) std: standard deviation of distribution (normal, skewed) (default=1) skew: skew of distribution (skewed) (default=0) p: probability (bernoulli, flip, binomial) (default=50) probabilities: probabilities (binomial-poisson) sums: cumulative probability sums (zipf, markov, sums) alpha: alpha parameter (zipf) (needed if no sums) (default=1) alpha parameter (beta, binomial-beta) limit: number of sums (zipf) (needed if no sums) (default=1000) start: index of starting symbol (markov) (default=uniform[1,dim]) dim: size of each dimension of Markov matrix (markov) matrix: raw Markov matrix (used if sums not provided, not recommended) distributions: sequence of distributions to combine (multimodal) k: k parameter (gamma) theta: theta parameter (gamma) beta: beta parameter (beta, binomial-beta) lambda: lambda parameter (exponential) truncation: how to keep value in [min,max] range (not constant/uniform) ceiling => ceiling/floor to min/max drop => toss out (so may return < $N) resample (default) => resample until an in-bound value is found or resample limit is reached ($RESAMPLE-LIMIT=10) then use ceiling
declare %art:non-deterministic function this:randomize($N as xs:integer, $last as item()?, $algorithm as map(xs:string,item()*)) { if (this:is-complex($algorithm)) then ( if (this:is-complex("distribution", $algorithm) or this:is-complex("distributions", $algorithm) or this:is-complex("selector", $algorithm) ) then ( (: We change distributions, so we have to do this one value at a time :) let $result := fold-left( 1 to $N, $last, function($vals as item()*, $i as xs:integer) as item()* { $vals , this:randomize(1, $vals[last()], $algorithm) } ) return ( if (empty($last)) then $result else $result[position()>1] ) ) else ( (: Distribution is fixed, but we may alter any other parameter : so need to feed $last as appropriate to the sequence : For multimodal distributions is also fixed :) let $distribution := ($algorithm("distribution"),"uniform")[1] let $values := this:raw-distribution($N, $last, $algorithm) let $values := fold-left( $values, $last, function($vals as item()*, $val as item()?) as item()* { $vals , this:modify($val, $vals[last()], $algorithm) } ) return if (empty($last)) then $values else $values[position()>1] ) ) else ( (: Everything is fixed, just look up keys :) let $distribution := ($algorithm("distribution"),"uniform")[1] let $values := this:raw-distribution($N, $last, $algorithm) let $values := this:modify-all($values, $last, $algorithm) return ( $values ) ) }
Function: randomize
declare function randomize($N as xs:integer, $algorithm as map(xs:string,item()*))
declare function randomize($N as xs:integer, $algorithm as map(xs:string,item()*))
randomize()
Generate one random value with given previous value.
See randomize($N, $last, $algorithm) above.
Params
- N as xs:integer: How many values to return
- algorithm as map(xs:string,item()*): The algorithm to use
declare %art:non-deterministic function this:randomize($N as xs:integer, $algorithm as map(xs:string,item()*)) { this:randomize($N, (), $algorithm) }
Function: randomize
declare function randomize($algorithm as map(xs:string,item()*))
declare function randomize($algorithm as map(xs:string,item()*))
randomize()
Generate one random value.
See randomize($N, $last, $algorithm) above.
Params
- algorithm as map(xs:string,item()*): The algorithm to use
declare %art:non-deterministic function this:randomize($algorithm as map(xs:string,item()*)) { this:randomize(1, (), $algorithm) }
Function: histogram
declare function histogram($width as xs:integer, $height as xs:integer, $data as xs:double*) as xs:string*
declare function histogram($width as xs:integer, $height as xs:integer, $data as xs:double*) as xs:string*
histogram()
Print out the distribution as a histogram.
Params
- width as xs:integer: Size of each bucket (max - min)
- height as xs:integer: Number of values per star
- data as xs:double*: Data to analyze
Returns
- xs:string*: sequence of strings, one per bucket
declare function this:histogram($width as xs:integer, $height as xs:integer, $data as xs:double*) as xs:string* { let $buckets := fold-left( $data, map {}, function($buckets as map(xs:integer, xs:integer), $d as xs:double) as map(xs:integer,xs:integer) { let $key := round-half-to-even($d div $width, 0) cast as xs:integer let $num := $buckets($key) return ( if (exists($num)) then $buckets=>map:put($key, $num+1) else $buckets=>map:put($key, 1) ) } ) for $key in $buckets=>map:keys() let $num := xs:integer($buckets($key)) order by $key return ( format-number($key*$width," 00;-00")||" "|| format-number($num,"000")||" "|| string-join(for $i in 1 to $num idiv $height return "*","") ) }
Function: mean
declare function mean($algorithm as map(xs:string,item()*), $last as item()?)
declare function mean($algorithm as map(xs:string,item()*), $last as item()?)
mean()
Return the mean of the distribution. Some distributions don't have a mean;
these will raise an error.
Params
- algorithm as map(xs:string,item()*): The algorithm to use, a map with various parameter keys (see randomize())
- last as item()?: Previous value
declare function this:mean($algorithm as map(xs:string,item()*), $last as item()?) { switch ($algorithm("distribution")) case "constant" return this:randomize($algorithm) case "uniform" return let $min := $algorithm=>this:fetch("min",$last) (: OK to be empty :) let $max := $algorithm=>this:fetch("max",$last) (: OK to be empty :) return if ($max < $min) then errors:error("RAND-BADMINMAX",($min,$max)) else ($max - $min) idiv 2 case "normal" return ($algorithm=>this:fetch("mean",$last),0)[1] case "skewed" return ($algorithm=>this:fetch("mean",$last),0)[1] case "bernoulli" return let $p := ($algorithm=>this:fetch("p",$last),50)[1] return $p div 100 case "flip" return let $p := ($algorithm=>this:fetch("p",$last),50)[1] return $p div 100 case "binomial" return let $p := ($algorithm=>this:fetch("p",$last),50)[1] let $n := $algorithm=>this:fetch("max",$last) return $n * $p div 100 case "binomial-beta" return ( let $n := $algorithm=>this:fetch("max",$last) cast as xs:integer let $α := $algorithm=>this:fetch("alpha",$last) let $β := $algorithm=>this:fetch("beta",$last) return ( $n * $α div ($α + $β) ) ) case "binomial-poisson" return let $ps := $algorithm=>this:fetch("probabilities", $last) return sum($ps) div 100 case "poisson" return $algorithm=>this:fetch("mean",$last) case "exponential" return 1 div $algorithm=>this:fetch("lambda",$last) case "gamma" return ( $algorithm=>this:fetch("k",$last) * $algorithm=>this:fetch("theta",$last) ) case "beta" return ( let $α := $algorithm=>this:fetch("alpha",$last) let $β := $algorithm=>this:fetch("beta",$last) return ( $α div ($α + $β) ) ) default return errors:error("RAND-UNIMPLEMENTED", ($algorithm("distribution"), "mean")) }
Errors
RAND-BADMINMAX if uniform distribution has max < min
Errors
RAND-UNIMPLEMENTED if don't know how to compute mean for that distribution
Function: mean
declare function mean($algorithm as map(xs:string,item()*))
declare function mean($algorithm as map(xs:string,item()*))
mean()
Return the mean of the distribution. Some distributions don't have a mean;
these will raise an error.
Params
- algorithm as map(xs:string,item()*): The algorithm to use, a map with various parameter keys (see randomize())
declare function this:mean($algorithm as map(xs:string,item()*)) { this:mean($algorithm, ()) }
Errors
RAND-BADMINMAX if uniform distribution has max < min
Errors
RAND-UNIMPLEMENTED if don't know how to compute mean for that distribution
Function: id
declare function id($prefix as xs:string) as xs:string
declare function id($prefix as xs:string) as xs:string
this:id()
Construct an id of the form {$prefix}{hex-number}
Params
- prefix as xs:string: prefix for the id
Returns
- xs:string: random id
declare %art:non-deterministic function this:id($prefix as xs:string) as xs:string { $prefix||"-"||util:integer-to-hex(this:uniform(0,1E7)=>this:cast("integer")) }
Function: id
declare function id($prefix as xs:string, $n as xs:integer) as xs:string
declare function id($prefix as xs:string, $n as xs:integer) as xs:string
this:id()
Construct an id of the form {$prefix}{hex-number in range 0 to $n}
Params
- prefix as xs:string: prefix for the id
- n as xs:integer: range limit
Returns
- xs:string: random id
declare %art:non-deterministic function this:id($prefix as xs:string, $n as xs:integer) as xs:string { $prefix||"-"||util:integer-to-hex(this:uniform(0,$n)=>this:cast("integer")) }
Function: mutual-prime-pair
declare function mutual-prime-pair($max as xs:integer) as xs:integer*
declare function mutual-prime-pair($max as xs:integer) as xs:integer*
mutual-prime-pair()
A random pair of mutually prime numbers less than the given maximum,
which has to be more than 2, or you can't find two primes.
Params
- max as xs:integer: maximum value to return
Returns
- xs:integer*: random sequence of integers that are mutually prime
declare %art:non-deterministic function this:mutual-prime-pair($max as xs:integer) as xs:integer* { if ($max <= 2) then errors:error("ML-BADARGS", ("max", $max)) else (), let $base-selection := if ($max > $util:PRIMES100[last()]) then $util:CANVAS-PRIMES[. <= $max] else $util:PRIMES100[. <= $max] let $q := this:select-random($base-selection) let $r := util:while( function($r as xs:integer) as xs:boolean { max(($q,$r)) mod min(($q,$r)) = 0 }, function($r as xs:integer) as xs:integer { $r + 1 }, this:select-random(2 to 2 * util:round(math:sqrt($q) + 1)) ) return ($q, $r) }
Errors
ML-BADARGS if $max is too small
Function: partition
declare function partition($sum as xs:integer, $max as xs:integer) as xs:integer*
declare function partition($sum as xs:integer, $max as xs:integer) as xs:integer*
partition()
Make a partition (sum totalling to $sum) where no part is more than $max
Example: partition(10, 5) may be (4, 1, 2, 3) but not (6, 4)
Params
- sum as xs:integer: sum of values to divide
- max as xs:integer: maximum value to return
Returns
- xs:integer*: random sequence of integers that total to the sum
declare %art:non-deterministic function this:partition($sum as xs:integer, $max as xs:integer) as xs:integer* { if ($sum = 0 or $max = 0) then () else if ($sum = 1) then 1 else ( let $limit := min(($sum, $max)) let $first := this:normal($limit div 2, $limit div 2)=> this:range(1, $limit)=> this:cast("integer") return ( $first, this:partition($sum - $first, $max) ) ) }
Function: partition
declare function partition($sum as xs:integer, $max as xs:integer, $min as xs:integer) as xs:integer*
declare function partition($sum as xs:integer, $max as xs:integer, $min as xs:integer) as xs:integer*
partition()
Make a partition (sum totalling to $sum) where no part is more than $max
and no part is less than $min
Example: partition(10, 5) may be (4, 2, 2, 2) but not (4, 1, 2, 3)
Params
- sum as xs:integer: sum of values to divide
- max as xs:integer: maximum value to return
- min as xs:integer: minimum value to return
Returns
- xs:integer*: random sequence of integers that total to the sum
declare %art:non-deterministic function this:partition($sum as xs:integer, $max as xs:integer, $min as xs:integer) as xs:integer* { if ($sum = 0 or $max = 0) then () else if ($sum = $min) then $min else ( let $tentative := this:do-partition($sum, $max, $min) let $tentative-sum := sum($tentative) let $delta := $sum - $tentative-sum return ( util:assert($delta >= 0, "Sum of tentative partition is too high"), if ($delta = 0) then $tentative else ( fold-left(1 to $delta, $tentative, function($seq as xs:integer*, $i as xs:integer) as xs:integer* { let $pick := this:shuffle( for $x at $j in $seq where $x < $max return $j )=>head() for $x at $j in $seq return ( if ($j = $pick) then $x + 1 else $x ) } ) ) ) ) }
Function: n-partition
declare function n-partition($sum as xs:integer, $n as xs:integer) as xs:integer*
declare function n-partition($sum as xs:integer, $n as xs:integer) as xs:integer*
n-partition()
Make a partition (sum) of $sum with exactly $n parts
It is possible to get unlucky and end up with a 0 partition here: we have
no min constraint. If you care about that, you'll need to adjust the sequence
afterwards.
Params
- sum as xs:integer: sum of values to divide
- n as xs:integer: number of parts in division
Returns
- xs:integer*: random sequence of integers that total to the sum
declare %art:non-deterministic function this:n-partition($sum as xs:integer, $n as xs:integer) as xs:integer* { if ($n = 0) then () else if ($n = 1) then $sum else if ($sum <= 1) then ( $sum, this:n-partition(0, $n - 1) ) else ( let $limit := max(($sum - 1, 1)) let $first := this:normal($limit div 2, $limit div 4)=> this:range(1, $limit)=> this:cast("integer") return ( $first, this:n-partition($sum - $first, $n - 1) ) ) }
Original Source Code
xquery version "3.1"; (:~ : Module with functions producing random distributions of various kinds. : : This version is XQuery 3.1 compliant. It is not as performant as the : MarkLogic version because random-number-generator() just isn't as usable as : xdmp:random. This works through the ugly hack of relying on generate-id() : on a constructed node to produce new seeds. That isn't entirely random : because whenever Saxon decides that one expression is the same as another : it will optimize it to produce the same result, but for the most part : this works ok. With Saxon-PE or Saxon-EE I use timestamp() which is more : reliable. Saxon-EE optimizer is often too clever regardless, and you get : repeated random numbers. Running with -opt:-l seems to fix this. : You may need more for some cases: -opt:-gl : : Even generating a sequence of numbers off random-number-generator isn't : that great because you need to construct maps to hold both the next : generator function and the number, or use recursion in a way highly : likely to generate stack overflows. (The example in the spec suffers this : problem.) : : The problem is that for my purposes I need to be able to create multiple : random sequences in the same query that are different from the sequences : in subsequent episodes. Where the sequences are partitioned into separate : modules so even if I did the ugly thing of making every function everywhere : have to cart around and return both the values I cared about and the : current randomizer (which XQuery is so very not good at), and used explicit : distinct seeds: whence those seeds? They need to be different based on : global program context and different from each run of the program. So : current-dateTime() concatenated with a value from a value range that is : preassigned to each component module. Or: cart around yet another randomizer : to provide the seeds. It is ugly, ugly, ugly any way you slice it. : : Usage: : You can either use the explicit distribution functions here, e.g. : rand:normal(10.0, 2.0), or construct a distribution descriptor using : the distributions API and then pass it into a call to rand:randomize(), : e.g. rand:randomize(dist:normal(10.0, 2.0)) : The latter approach allows for more fine control over the behaviour of : the final result (e.g. you can set a minimum value, a post-multiplier, etc.) : and is useful when you have a randomization algorithm that is getting : reused a lot, or that you might want to change up without having to find : all the places where it is called in the code. : : Copyright© Mary Holstege 2020-2023 : CC-BY (https://creativecommons.org/licenses/by/4.0/) : @since August 2020 : @custom:Status Stable :) module namespace this="http://mathling.com/core/random"; import module namespace errors="http://mathling.com/core/errors" at "../core/errors.xqy"; import module namespace util="http://mathling.com/core/utilities" at "../core/utilities.xqy"; import module namespace dist="http://mathling.com/type/distribution" at "../types/distributions.xqy"; declare namespace art="http://mathling.com/art"; declare namespace map="http://www.w3.org/2005/xpath-functions/map"; declare namespace array="http://www.w3.org/2005/xpath-functions/array"; declare namespace math="http://www.w3.org/2005/xpath-functions/math"; declare %private variable $this:RANDOM_0_1 as function() as xs:double := let $rand_double := function-lookup(QName("http://basex.org/modules/random","double"), 0) return ( if (exists($rand_double)) then $rand_double else ( function () as xs:double { random-number-generator(this:seed())("number") } ) ) ; (:~ : Function to hand out seeds: : Ugly hack alert: browbeat r-n-g into using a new seed => new number : See top comment. If you start getting samey-samey sequences all I : can suggest is to spring for Saxon-PE. :) declare %art:non-deterministic function this:seed() as xs:anyAtomicType { (: Function lookup in the closure of module imported into XSLT : doesn't appear to work and we end up with the fallback. : We could move the lookup and text inside the function body, : but that will slow down normal randomization calls in XQuery with : repeated checks. :) let $timestamp := function-lookup(QName("http://saxon.sf.net/", "timestamp"), 0) let $rand-double := function-lookup(QName("http://basex.org/modules/random","double"), 0) return ( if (exists($timestamp)) then $timestamp() else if (exists($rand-double)) then $rand-double() else ( concat(current-dateTime(), generate-id(<n/>)) ) ) }; (:~ : ε is used to handle the switch from exclusive to inclusive bounds. :) declare variable $this:SCALING as xs:integer := 100000; declare %private variable $this:ε as xs:double := (1 div $this:SCALING) cast as xs:double; (:~ : Number of fractional decimal digits to keep when we cast to decimal. :) declare variable $this:DECIMAL-DIGITS as xs:integer := 2; declare %private variable $this:DECIMAL-DIV as xs:integer := math:pow(10, $this:DECIMAL-DIGITS) cast as xs:integer; (:~ : Pi : Math looks better with a real letter in there. Pity about having to put in : the module prefix. :) declare variable $this:π as xs:double := math:pi(); (:~ : Number of resamplings will we try before we give up :) declare variable $this:RESAMPLE-LIMIT as xs:integer := 10; (:~ : Uniform distribution [0,1.0) :) declare variable $this:STD-UNIFORM as map(xs:string,item()*) := dist:uniform(0.0, 1.0 - $this:ε) ; (:~ : Uniform distribution (-1.0,1.0) :) declare variable $this:STD-UNIT as map(xs:string,item()*) := dist:uniform(-1.0 + $this:ε, 1.0 - $this:ε) ; (:~ : Normal distribution mean=0 std=1 :) declare variable $this:STD-NORMAL as map(xs:string,item()*) := dist:normal(0.0, 1.0) ; (:~ : Uniform angles (degrees) [0, 360.0) :) declare variable $this:STD-DEGREES as map(xs:string,item()*) := dist:uniform(0.0, 360.0 - $this:ε) ; (:~ : flip() : Returns random Bernoulli distributed data as a boolean : : @param $percent: Percent that should come up true() : @return true() or false() :) declare %art:non-deterministic function this:flip($percent as xs:double) as xs:boolean { (: 100*random-number-generator(this:seed())("number") lt $percent :) 100*$this:RANDOM_0_1() lt $percent }; (:~ : constant() : Returns a constant value, either the average of $min and $max (if they : exist) or whichever one does exist. Failing everything else, 1. : : This is the degenerate case of uniform() and exists as a convenience : for data-driven randomization in randomize(). : : @param $min: minimum value : @param $max: maximum value : @return constant :) declare function this:constant($min as xs:double?, $max as xs:double?) as xs:double { if (exists($min) and exists($max)) then ( if ($max < $min) then errors:error("RAND-BADMINMAX",($min,$max)) else if ($max = $min) then $min else ($max + $min) div 2 ) else ( ($min,$max,1)[1] ) }; (:~ : uniform() : Return random uniformly distributed data in the range. : Because the base randomizer is exclusive on the upper bound, add ε : and then cap. This creates a slight over-weighting of the max value, : depending on fineness of ε. : : When we are using integers, we need to avoid chopping off the top of : the range, or we lose. We can get it, but it is unlikely and for small : integer ranges spectacularly unlikely. So we add 1-ε instead of ε and : quantize at this level. You want to avoid this and get, say, floating : point values between 5 and 10 (inc), pass floating point arguments. : : Base randomizer gives [0,1) we want [x,y] : [0,1) * (ε+y-x) => [0,ε+y-x) : [0,ε+y-x) + x => [x, y+ε) : min(([x, y+ε),y) => [x,y] : : For integers: : [0,1) * (1-ε+y-x) => [0,1-ε+y-x) : [0,1-ε+y-x) + x => [x, y+1-ε) : [x,y+1-ε) idiv 1 => [x, y] : : @param $min: Lower bound (inclusive) of range : @param $max: Upper bound (inclusive) of range : @return random number in the range :) declare %art:non-deterministic function this:uniform($min as xs:numeric, $max as xs:numeric) as xs:double { if ($max < $min) then errors:error("RAND-BADMINMAX",($min,$max)) else if ($max = $min) then $min else if ( ($max instance of xs:integer) and ($min instance of xs:integer) ) then ( (: let $random_0_1 := random-number-generator(this:seed())("number") return ((1 - $this:ε + $max - $min)*$random_0_1 + $min) idiv 1 :) ((1 - $this:ε + $max - $min)*$this:RANDOM_0_1() + $min) idiv 1 ) else ( (: let $random_0_1 := random-number-generator(this:seed())("number") return min(( ($this:ε + $max - $min)*$random_0_1 + $min, $max )) :) min(( ($this:ε + $max - $min)*$this:RANDOM_0_1() + $min, $max )) ) }; (:~ : normal() : Return random normally distributed data. : : @param $mean: Mean of range of values : @param $std: Standard deviation of range of values : @return random number :) declare %art:non-deterministic function this:normal($mean as xs:double, $std as xs:double) as xs:double { let $randomizer := random-number-generator(this:seed()) return $mean + this:gauss($randomizer) * $std }; (: : gauss() : Return random normally distributed data between 0 and 1 : A service function used by normal() :) declare %private %art:non-deterministic function this:gauss($randomizer as map(xs:string,item())) as xs:double { let $u as xs:double := 2 * $randomizer("number") - 1 let $next := ($randomizer("next"))() let $v as xs:double := 2 * $next("number") - 1 let $r := $u * $u + $v * $v return ( if ($r = 0 or $r >= 1) then this:gauss(($next("next"))()) else ( let $c as xs:double := math:sqrt(-2 * math:log($r) div $r) return $u * $c ) ) }; (:~ : skewed() : Return random data in skewed normal distribution. : : @param $mean: Mean of distribution : @param $std: Standard deviation of distribution : @param $skew: Skew of distribution : @return random number : : Skewed defined by parameters α, ξ, and ω : α is skew : α = 0 => normal, α > 0 right skewed, α < 0 left skewed : right skewed => long tail on right : ξ = location = shift along x : ω = scaling along y : ω is positive : : mean = ξ + ω*δ*√(2/π) δ=α/√(1+α²) : std = √(ω²*(1 - 2*δ²/π)) : : So: : ω = std / √(1 - 2*δ²/π)) : ξ = mean - ω*δ*√(2/π) :) declare %art:non-deterministic function this:skewed($mean as xs:double, $std as xs:double, $skew as xs:double) as xs:double { let $α := $skew let $δ := $α div math:sqrt(1 + $α*$α) let $ω := $std div math:sqrt(1 - (2*$δ*$δ div $this:π)) let $ξ := $mean - $ω*$δ*math:sqrt(2 div $this:π) let $σ := 1 div math:sqrt(1 + $α*$α) let $u := this:normal(0, 1) let $v := this:normal(0, 1) let $u1 := $σ*($α*abs($u) + $v) return ( $ω * $u1 + $ξ ) }; (:~ : binomial() : Return random data in a binomial distribution. : : @param $n: the n distribution parameter = max value : @param $percent: probability distribution parameter as a percent [0,100] : @return random integer in range : : μ = np : σ² = np(1-p) : :) declare %art:non-deterministic function this:binomial( $n as xs:integer, $percent as xs:double ) as xs:integer { let $p := $percent div 100 let $seq := dist:simple-sums( for $k in 1 to $n return util:binomial($n, $k)*math:pow($p, $k)*math:pow(1 - $p, $n - $k) ) let $u := this:uniform(0E0, 1E0) return util:rangeindex($seq, $u) }; (:~ : binomial-beta() : Return random data in a Beta binomial distribution. : : @param $n n distribution parameter, max value : @param $α alpha distribution parameter : @param $β beta distribution parameter : @return random integer :) declare %art:non-deterministic function this:binomial-beta( $n as xs:integer, $α as xs:double, $β as xs:double ) as xs:integer { util:assert($n > 0, "n <= 0"), util:assert($α > 0, "α <= 0"), util:assert($β > 0, "β <= 0"), let $p := this:beta($α, $β) return this:binomial($n, $p * 100) }; (:~ : binomial-poisson() : Return random data in a Poisson binomial distribution. : : @param $probabilities: probabilities of the base Bernoulli variables (as percents [0,100]) : @return random number : : μ = Σp[i] : σ = √(Σ(1 - p[i])p[i]) :) declare %art:non-deterministic function this:binomial-poisson($probabilities as xs:double*) as xs:double { sum(for $p in $probabilities return if (this:flip($p)) then 1 else 0) }; (:~ : poisson() : Return random data in a Poisson distribution. : Using inverse transform sampling method. If λ is large and e^-λ underflows; : this will not work well. : : @param $mean: the lambda parameter : @return random number : : μ = λ : σ = λ :) declare %art:non-deterministic function this:poisson($mean as xs:double) as xs:double { let $u := this:uniform(0E0, 1E0) return util:while( function($x as xs:double, $p as xs:double, $s as xs:double) as xs:boolean { $u > $s }, function($x as xs:double, $p as xs:double, $s as xs:double) as xs:double* { let $x := $x + 1 let $p := $p * $mean div $x let $s := $s + $p return ( $x, $p, $s ) }, 0E0, math:exp(-$mean), math:exp(-$mean) )=>head() }; (:~ : exponential() : Return random data in an exponential distribution. : : @param $λ: the distribution parameter : @return random number : : μ = 1/λ : : Using inverse transform sampling -ln(U)/λ :) declare %art:non-deterministic function this:exponential($λ as xs:double) as xs:double { let $u := this:uniform(0E0, 1E0) return -math:log($u) div $λ }; (:~ : ahrens-dieter() : Ahrens-Dieter method to return Γ(δ,1) distributed data for 0 < δ < 1 :) declare %private %art:non-deterministic function this:ahrens-dieter( $δ as xs:double, $randomizer as map(xs:string,item()*) ) as xs:double { let $e := math:exp(1) let $u := $randomizer("number") let $next := ($randomizer("next"))() let $v := $next("number") let $next := ($randomizer("next"))() let $w := $randomizer("number") let $test := ($u <= $e div ($e + $δ)) let $ξ := if ($test) then math:pow($v, 1 div $δ) else 1 - math:log($v) let $η := if ($test) then $w * math:pow($ξ, $δ - 1) else $w * math:exp(-$ξ) return ( if ($η > math:pow($ξ, $δ - 1)*math:exp(-$ξ)) then this:ahrens-dieter($δ, ($next("next"))()) else $ξ ) }; (:~ : gamma() : Return random data in a gamma distribution. : : @param $k: the k distribution parameter > 0 ("shape") : @param $θ: the θ distribution parameter > 0 ("scaling") : @return random number : : μ = kθ : σ² = kθ² : : Using inverse transform sampling in combination with the Ahrens-Dieter : acceptance-rejection method (performance is linear wrt k) :) declare %art:non-deterministic function this:gamma( $k as xs:double, $θ as xs:double ) as xs:double { util:assert($k > 0, "k <= 0"), util:assert($θ > 0, "θ <= 0"), let $n := floor($k) cast as xs:integer let $δ := $k - $n let $ξ := if ($δ = 0) then 0 else this:ahrens-dieter($δ, random-number-generator(this:seed())) return ( $θ * ($ξ - sum(for $i in 1 to $n return math:log(this:uniform(0E0, 1E0)))) ) }; (:~ : beta() : Return random data in a beta distribution. : : @param $α: the α distribution parameter : @param $β: the β distribution parameter : @return random number : : μ = α/(α+β) : σ² = αβ/(α+β)²(α+β+1) : :) declare %art:non-deterministic function this:beta( $α as xs:double, $β as xs:double ) as xs:double { util:assert($α > 0, "α <= 0"), util:assert($β > 0, "β <= 0"), let $u := this:gamma($α, 1) let $v := this:gamma($β, 1) return ( $u div ($u + $v) ) }; (:~ : zipf() : Return random data in a Zipf distribution as an integer index from 1 to N : Note that this constructs the sum table anew each time: not very efficient. : You'd be better off constructing the distribution or just calling : select-index() directly : : p(n)=(1/k^α)/sum(i=1 to n)(1/i^α) : n = number of elements : k = rank : α = exponent : : @param $alpha: the α parameter, should be >=1 : For English words this by some estimates is around 1.6 : For city sizes 1.07 : @param $n: number of Zipf values in range : @return random integer in range :) declare %art:non-deterministic function this:zipf($alpha as xs:double, $n as xs:integer) as xs:integer { let $zipf-weights as xs:double* := this:zipf-sums($alpha,$n+1) return this:select-index($zipf-weights, $n) }; (:~ : shuffle() : Return the data items in a (uniformly) random order. : : @param $data: Data items to shuffle : @return data, shuffled :) declare %art:non-deterministic function this:shuffle($data as item()*) { random-number-generator(this:seed())?permute($data) }; (:~ : percent-sums() : Returns a sequence of cumulative probabilities, to be used : by select-index() to perform selections. This is to ensure we don't have to : keep recomputing the cumulative sums for each selection. : : To account for rounding, the last item is always forced to 1 and any sums : greater than 1 are also rounded down to 1. To get expected distributions, : ensure that the input percentages sum to 100. : : Sums are returned in map:keys() order, so this assumes stability of that : function between calls to percent-sums() and calls to selection(). : : @param $weights a map from key to percent (expressed as integer 0 to 100) : @return cumulative probability table :) declare function this:percent-sums($weights as map(xs:string,item()*)) as xs:double* { dist:percent-sums($weights) }; (:~ : zipf-sums() : Returns a sequence of cumulative Zipf probabilities, to be used : by select-index() to perform selections. This is to ensure we don't have to : keep recomputing the cumulative sums for each selection. : : p(k)=(1/k^α)/sum(i=1 to n)(1/i^α) : n = number of elements : k = rank : α = exponent : : @param $alpha: the alpha parameter, should be >=1 : For English words this by some estimates is around 1.6 : For city sizes 1.07 : @param $n: number to generate : @return cumulative probability table :) declare function this:zipf-sums($alpha as xs:double, $n as xs:integer) as xs:double* { dist:zipf-sums($alpha, $n) }; (:~ : select-index() : Return a rank from 1 to N of values distributed according to cumulative : probability sums. Use zipf-sums(), percent-sums(), etc. to compute those. : : @param $sums: cumulative probability sums, computed via zipf-sums() or percent-sums() : @param $n: maximum rank to return ($n should be <= count($sums)) : @return random integer in range :) declare %art:non-deterministic function this:select-index($sums as xs:double*, $n as xs:integer) as xs:integer { (: Get value in (0,1) exclusive; underlying randomizer returns inclusive : values, so generate value within epsilon of bounds instead :) let $p := this:uniform($this:ε, 1 - $this:ε) return util:rangeindex($sums[position()=(1 to $n)], $p, $n) }; (:~ : selection() : Return a selection using data distributed per the cumulative probability sums. : : @param $sums: cumulative probability sums, computed via zipf-sums(), percent-sums(), etc. : @param $values: values to select, in rank order : @return random value :) declare %art:non-deterministic function this:selection($sums as xs:double*, $values as item()*) as item() { let $n := min((count($sums),count($values))) let $k := this:select-index($sums, $n) return $values[$k] }; (:~ : check-markov-matrix() : Check whether the matrix is a valid Markov matrix, raise error if it is not. : : @param $dim matrix size (matrix is dim X dim) : @param $matrix data values : @param $is-cumulative true() if this is a cumulative probability matrix : @error RAND-NOTSQUARE if data won't fit evenly in square matrix : @error RAND-BADDIM if dimension doesn't match amount of data : @error RAND-BADPROBABILITY if value is out of range [0,1] : @error RAND-BADSUMPROB if cumulative sums are out of range [0,1] :) declare function this:check-markov-matrix($dim as xs:integer, $matrix as xs:double*, $is-cumulative as xs:boolean) as empty-sequence() { let $size := count($matrix) let $sqrt := math:sqrt($size) return ( (: Matrix must be square :) if ($sqrt * $sqrt ne $size) then errors:error("RAND-NOTSQUARE", $size) else if ($dim ne ($sqrt cast as xs:integer)) then errors:error("RAND-BADDIM", ($dim, $sqrt)) else () , (: Values must be probabilities :) for $i in 1 to $size return ( if ($matrix[$i] < 0 or $matrix[$i] > 1) then errors:error("RAND-BADPROBABILITY", ($i, $matrix[$i])) else () ) , (: Each row should have a cumulative probability of 1 :) (: If $is-cumulative, each row has been expressed as cumulative : probabilities already (for efficiency) so we check p<1 vs sum(p)<1 : but we did that check above already. :) if ($is-cumulative) then () else ( for $i in 1 to $dim return ( let $sprob := sum($matrix[position()=(($i - 1)*$dim + 1 to ($i - 1) * $dim)]) return ( if (util:twixt($sprob, 0, 1 + $this:ε)) then () else errors:error("RAND-BADSUMPROB", ($i, $sprob)) ) ) ) ) }; (:~ : markov-sums() : Return a Markov probability matrix as a matrix where each row contains : the cumulative probability sums for that row. This allows the Markov : distribution selection to run more efficiently. : : Example: : markov-sums(3, ( : 0.2, 0.4, 0.4, : 0.5, 0.5, 0, : 0.4, 0.2, 0.4 : )) => ( : 0.2, 0.6, 1, : 0.5, 1, 1, : 0.4, 0.6, 1 : ) : : To account for rounding, the last item in the row is always forced to 1 : and any sums greater than 1 are also rounded down to 1. : : @param $dim: size of each dimension of matrix, math:sqrt(count($matrix)) : @param $matrix: the input non-cumulative Markov matrix :) declare function this:markov-sums($dim as xs:integer, $matrix as xs:double*) as xs:double* { dist:markov-sums($dim, $matrix) }; (:~ : markov-percent-sums() : Return a Markov probability matrix as a matrix where each row contains : the cumulative probability sums for that row. This allows the Markov : distribution selection to run more efficiently. : : Example: : markov-percent-sums(3, ( : 20, 40, 40, : 50, 50, 0, : 40, 20, 40 : )) => ( : 0.2, 0.6, 1, : 0.5, 1, 1, : 0.4, 0.6, 1 : ) : : To account for rounding, the last item in the row is always forced to 1 : and any sums greater than 1 are also rounded down to 1. : : @param $dim: size of each dimension of matrix, math:sqrt(count($matrix)) : @param $matrix: the input non-cumulative Markov matrix : @return cumulative probability matrix :) declare function this:markov-percent-sums($dim as xs:integer, $matrix as xs:integer*) as xs:double* { dist:markov-percent-sums($dim, $matrix) }; (:~ : markov() : Given a square matrix of Markov probabilities and a current state, : return a successor state with the probabilities as given. For efficiency, : the matrix should be given in the cumulative probability form, as created : by markov-sums() or markov-percent-sums() : : @param $current: Index of current state in the matrix : @param $dim: size of each dimension of matrix, math:sqrt(count($matrix)) : @param $matrix: Square matrix of cumulative probabilities : @return random index from Markov table : @error ML-BADARGS if current index is out of range :) declare %art:non-deterministic function this:markov($current as xs:integer, $dim as xs:integer, $matrix as xs:double*) as xs:integer { if (util:twixt($current, 1, $dim)) then () else errors:error("ML-BADARGS", ("current", $current)), let $row := $matrix[position()=(($current - 1) * $dim + 1 to $current * $dim)] return this:select-index($row, $dim) }; (:~ : markov-selection() : Return a selection using Markov matrix distribution. : : @param $current: index of current selection : @param $dim: size of each dimension of matrix, math:sqrt(count($matrix)) : @param $matrix: cumulative probability matrix, computed via markov-sums() : @param $values: values to select, in index order : @return value corresponding to random index from Markov table : @error ML-BADARGS if current index is out of range : @error ML-BADARGS if computed index is out of range :) declare %art:non-deterministic function this:markov-selection($current as xs:integer, $dim as xs:integer, $matrix as xs:double*, $values as item()*) as item() { let $k := this:markov($current, $dim, $matrix) return ( (: Make sure matrix and values are compatible :) if (util:twixt($k, 1, count($values))) then () else errors:error("ML-BADARGS", ("values", $values)), $values[$k] ) }; (:~ : random-selection() : Return a random selection from a set of keys, where the key is selected : per a set of probabilities. A convenience wrapper (although less efficient, : because it recomputes the sums) for selection() : : @param $weights: Map of probabilities (as integer percents) from key to weight : @return random selection from the table :) declare %art:non-deterministic function this:random-selection($weights as map(xs:string,item()*)) as xs:string { let $sums := this:percent-sums($weights) let $values := $weights=>map:keys() return this:selection($sums, $values) }; (:~ : random-assortment() : Select a subset of a set of keys, in (uniform) random order, where each key : is selected at most once, and is selected with probability given by set : of weights. : : @param $keys: Set of keys to choose from : @param $weights: Map of probabilities (as integer percents) from key to weight : @return random assortment of the keys :) declare %art:non-deterministic function this:random-assortment($keys as xs:string*, $weights as map(xs:string,item()*)) as xs:string* { this:shuffle( for $key in $keys return if (this:flip($weights($key))) then $key else () ) }; (:~ : cast() : Cast/format a value to a particular type : Service function for randomize() : : Casting to decimal is actually just rounding to $DECIMAL-DIGITS fractional : digits. : : @param $value: Value to cast : @param $cast-as: One of "integer", "decimal", "boolean", "string", or "double" : @return value cast appropriately :) declare function this:cast($value as item(), $cast-as as xs:string?) { switch ($cast-as) case "integer" return $value cast as xs:integer case "decimal" return round($value * $this:DECIMAL-DIV) div $this:DECIMAL-DIV case "boolean" return boolean($value) case "string" return string($value) default return $value }; (:~ : range() : Ensure the value is in the range by changing value to minimum or maximum as : necessary. : Service function for randomize() : Assumes min < max : : @param $value: The value : @param $min: The lowest value (optional) : @param $max: The highest value (optional) : @return value clamped to range :) declare function this:range($value as xs:numeric, $min as xs:numeric?, $max as xs:numeric?) as xs:numeric { let $low := if (exists($min)) then max(($min,$value)) else $value return if (exists($max)) then min(($max,$low)) else $low }; (:~ : get-last() : Function available for callback use in fetch() : : @param $algorithm: The algorithm to use, a map with various parameter keys : (see randomize()) : @param $last: Previous value : @return previous value :) declare function this:get-last($algorithm as map(xs:string,item()*), $last as item()?) as item() { $last }; (:~ : last-fraction() : Return a callback function for use in fetch() that returns the a : given fraction of previous value : : @param $centrality: the fraction to apply : @return fraction of last value :) declare function this:last-fraction($centrality as xs:double) as function(map(xs:string,item()*), item()?) as item() { %art:name("rand:last-fraction") function($algorithm as map(xs:string,item()*), $last as item()?) as item() { $centrality * $last } }; (:~ : fetch() : Lookup the key in the map. If it is a two-arg function or a QName of : a two-arg function, call that function, passing the algorithm map and : the last value to it. Otherwise return the value for the key. : : @param $algorithm: The algorithm to use, a map with various parameter keys : (see randomize()) : @param $key: Key to lookup in map : @param $last: Previous value : @return value for the key :) declare %private function this:fetch($algorithm as map(xs:string,item()*), $key as xs:string, $last as item()?) as item()* { let $v := $algorithm($key) return typeswitch ($v) case function(map(xs:string,item()*), item()?) as item()* return $v($algorithm, $last) case xs:QName return ( let $f := function-lookup($v, 2) return $f($algorithm, $last) ) default return $v }; (:~ : is-complex() : Helper function for randomize() to allow for optimization of case where : we're using a static algorithm with no use of a function :) declare %private function this:is-complex($algorithm as map(xs:string,item()*)) as xs:boolean { if ($algorithm=>map:contains("is_complex")) then ( $algorithm("is_complex") ) else ( some $k in map:keys($algorithm) satisfies ( ($algorithm($k) instance of xs:QName) or ($algorithm($k) instance of function(map(*), item()?) as item()) ) ) }; (:~ : is-complex() : Helper function for randomize() to allow for optimization of case where : we're using a static algorithm with no use of a function :) declare %private function this:is-complex($key as xs:string, $algorithm as map(xs:string,item()*)) as xs:boolean { ($algorithm($key) instance of xs:QName) or ($algorithm($key) instance of function(map(*), item()?) as item()) }; (:~ : modify-all() : Apply modifiers to all values. Assumes all key values are static. : Service routine for randomize() :) declare %private function this:modify-all($values as item()*, $last as item()?, $algorithm as map(xs:string,item()*)) { let $distribution := $algorithm("distribution") let $min := $algorithm("min") let $max := $algorithm("max") let $pre-multiplier := $algorithm("pre-multiplier") let $values := if (exists($pre-multiplier)) then $values!(. * $pre-multiplier) else $values let $values := for $value in $values return ( if ($distribution = ("uniform","flip","constant") or util:twixt($value, $min, $max) ) then ( $value ) else ( switch($algorithm("truncation")) case "ceiling" return this:range($value, $min, $max) case "drop" return () default (: "resample" :) return ( let $new := fold-left( 1 to $this:RESAMPLE-LIMIT, (), function($v as item()?, $i as xs:integer) as item()? { if (empty($v)) then ( let $v := this:raw-distribution(1, $last, $algorithm) let $v2 := if (exists($pre-multiplier)) then $v * $pre-multiplier else $v return if (util:twixt($v2, $min, $max)) then $v2 else () ) else ( $v ) } ) return ( if (empty($new)) then ( util:log("Gave up resampling "||util:quote($algorithm)), this:range($value, $min, $max) ) else ( $new ) ) ) ) ) let $values := let $post-multiplier := $algorithm("post-multiplier") return if (exists($post-multiplier)) then $values!(. * $post-multiplier) else $values let $values := let $post-shift := $algorithm("post-shift") return if (exists($post-shift)) then $values!(. + $post-shift) else $values let $values := let $cast-as := $algorithm("cast") return if (exists($cast-as)) then $values!this:cast(., $cast-as) else $values return $values }; (:~ : modify() : Apply modifiers to values. Assumes all key values may be dynamic. : Service routine for randomize() :) declare %private function this:modify($value as item(), $last as item()?, $algorithm as map(xs:string,item()*)) { let $distribution := $algorithm=>this:fetch("distribution",$last) let $min := $algorithm=>this:fetch("min",$last) let $max := $algorithm=>this:fetch("max",$last) let $pre-multiplier := $algorithm=>this:fetch("pre-multiplier", $last) let $value := if (exists($pre-multiplier)) then $value * $pre-multiplier else $value let $value := if ($distribution = ("uniform","flip","constant") or util:twixt($value, $min, $max) ) then ( $value ) else ( switch($algorithm=>this:fetch("truncation",$last)) case "drop" return () case "ceiling" return this:range($value, $min, $max) default (: "resample" :) return ( let $new := fold-left( 1 to $this:RESAMPLE-LIMIT, (), function($v as item()?, $i as xs:integer) as item()? { if (empty($v)) then ( let $v := this:raw-distribution(1, $last, $algorithm) let $v2 := if (exists($pre-multiplier)) then $v * $pre-multiplier else $v return if (util:twixt($v2, $min, $max)) then $v2 else () ) else ( $v ) } ) return ( if (empty($new)) then ( trace((),"gave up resampling"), this:range($value, $min, $max) ) else ( $new ) ) ) ) let $value := let $post-multiplier := $algorithm=>this:fetch("post-multiplier",$last) return if (exists($post-multiplier)) then $value * $post-multiplier else $value let $value := let $post-shift := $algorithm=>this:fetch("post-shift",$last) return if (exists($post-shift)) then $value + $post-shift else $value let $value := let $cast-as := $algorithm=>this:fetch("cast",$last) return if (exists($cast-as)) then this:cast($value, $cast-as) else $value return $value }; (:~ : raw-distribution() : Service function to randomize(). Creates the raw values from the : distributions, without applying the modifications (min/max, multipliers, : casts) : : Assumes distribution is fixed (or distributions for multi-modal) : @param $N: How many values to return : @param $last: Previous value (needed for markov, available to key callback) : @param $algorithm: The algorithm to use, a map with various parameter keys : distribution: Distribution to use, one of "constant", "uniform", "normal", : "skewed", "bernoulli", "flip", "zipf", "markov", "sums", "multimodal", : "binomial-poisson", "poisson", "exponential", "gamma", "beta", "binomial" : "binomial-beta" : min: mimumum value (uniform, constant) : max: maximum value (uniform, constant, binomial=n, binomial-beta=n) : mean: mean of distribution (normal, skewed, poisson, exponential) (default=0) : std: standard deviation of distribution (normal, skewed) (default=1) : skew: skew of distribution (skewed) (default=0) : p: probability (bernoulli, flip, binomial) (default=50) : probabilities: probabilities (binomial-poisson) : sums: cumulative probability sums (zipf, markov, sums, binomial, binomial-beta) : alpha: alpha parameter (zipf) (needed if no sums) (default=1) : alpha parameter (beta, binomial-beta) : limit: number of sums (zipf) (needed if no sums) (default=1000) : start: index of starting symbol (markov) (default=uniform[1,dim]) : dim: size of each dimension of Markov matrix (markov) : matrix: raw Markov matrix (used if sums not provided, not recommended) : distributions: sequence of distributions to combine (multimodal) : lambda: lambda parameter (exponential) : k: k parameter (gamma) : theta: theta parameter (gamma) : beta: beta parameter (beta, binomial-beta) :) declare %private %art:non-deterministic function this:raw-distribution($N as xs:integer, $last as item()?, $algorithm as map(xs:string,item()*)) { if (this:is-complex($algorithm)) then ( (: Distribution is fixed, but we may alter any other parameter : so need to feed $last as appropriate to the sequence : For multimodal distributions is also fixed :) let $distribution := ($algorithm("distribution"),"uniform")[1] let $values := switch ($distribution) case "constant" return let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $last := $values[last()] let $min := $algorithm=>this:fetch("min",$values[last()]) let $max := $algorithm=>this:fetch("max",$values[last()]) let $val := this:constant($min, $max) return $val } ) return if (empty($last)) then $result else $result[position()>1] case "uniform" return let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $last := $values[last()] let $min := $algorithm=>this:fetch("min",$values[last()]) let $max := $algorithm=>this:fetch("max",$values[last()]) let $val := this:uniform($min, $max) return $val } ) return if (empty($last)) then $result else $result[position()>1] case "normal" return let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $last := $values[last()] let $μ := ($algorithm=>this:fetch("mean",$last),0)[1] let $σ := ($algorithm=>this:fetch("std",$last),1)[1] let $val := this:normal($μ, $σ) return $val } ) return if (empty($last)) then $result else $result[position()>1] case "skewed" return let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $last := $values[last()] let $μ := ($algorithm=>this:fetch("mean",$last),0)[1] let $σ := ($algorithm=>this:fetch("std",$last),1)[1] let $α := ($algorithm=>this:fetch("skew",$last),0)[1] let $val := this:skewed($μ, $σ, $α) return $val } ) return if (empty($last)) then $result else $result[position()>1] case "bernoulli" return let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $last := $values[last()] let $p := ($algorithm=>this:fetch("p",$last),50)[1] let $val := if (this:flip($p)) then 1 else 0 return $val } ) return if (empty($last)) then $result else $result[position()>1] case "flip" return let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $last := $values[last()] let $p := ($algorithm=>this:fetch("p",$last),50)[1] let $val := this:flip($p) return $val } ) return if (empty($last)) then $result else $result[position()>1] case "multimodal" return ( (: multimodal has sequence of sub randomizers in 'distributions' :) (: $last applies to each subrandomizer independently generally : except post-sum modifications apply to combined last :) let $distributions := $algorithm("distributions") let $selector := $algorithm("selector") (: let $numd := count($distributions) let $flips := for $i in 1 to $N return this:cast(this:uniform(1, $numd),"integer") :) let $flips := for $i in 1 to $N return this:randomize($selector) (: RAW: v11, v12, ..., v1N, v21, ..., v2N, ...., vd1, ... vdN :) (: Run this way so $last applies :) let $raw := for $subdistribution in $distributions return this:randomize($N, $last, $subdistribution) let $result := for $i in 1 to $N return $raw[($flips[$i] - 1)*$N + $i] return $result ) case "binomial" return ( (: It's probably a bad idea to have non-static parameters : An even worse idea to have to recompute probability sums : for each time through. : Precompute these and put them in the map or you'll be sorry : API constructors use constants, so no worries :) let $fixed-sums := if (this:is-complex("sums", $algorithm)) then () else ( if (exists($algorithm("sums"))) then $algorithm("sums") else ( if (this:is-complex("max", $algorithm) or this:is-complex("p", $algorithm) ) then () else ( let $n := $algorithm("max") cast as xs:integer let $p := ($algorithm("p"),50)[1] div 100 return ( dist:simple-sums( for $k in 1 to $n return ( util:binomial($n, $k)*math:pow($p, $k)*math:pow(1 - $p, $n - $k) ) ) ) ) ) ) let $fixed-max-rank := if (exists($fixed-sums)) then count($fixed-sums) else () let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $sums := if (exists($fixed-sums)) then $fixed-sums else if (exists($algorithm=>this:fetch("sums",$values[last()]))) then $algorithm=>this:fetch("sums",$values[last()]) else ( let $n := $algorithm=>this:fetch("max", $values[last()]) cast as xs:integer let $p := ($algorithm=>this:fetch("p", $values[last()]),50)[1] div 100 return ( dist:simple-sums( for $k in 1 to $n return ( util:binomial($n, $k)*math:pow($p, $k)*math:pow(1 - $p, $n - $k) ) ) ) ) let $max-rank := ( if (exists($fixed-max-rank)) then $fixed-max-rank else count($sums) ) let $val := this:select-index($sums, $max-rank) return $val } ) return if (empty($last)) then $result else $result[position()>1] ) case "binomial-beta" return ( let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $last := $values[last()] let $n := $algorithm=>this:fetch("max",$last) cast as xs:integer let $α := $algorithm=>this:fetch("alpha",$last) let $β := $algorithm=>this:fetch("beta",$last) let $val := this:binomial-beta($n, $α, $β) return $val } ) return if (empty($last)) then $result else $result[position()>1] ) case "binomial-poisson" return ( let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $last := $values[last()] let $ps := $algorithm=>this:fetch("probabilities",$last) let $val := this:binomial-poisson($ps) return $val } ) return if (empty($last)) then $result else $result[position()>1] ) case "poisson" return ( let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $last := $values[last()] let $λ := $algorithm=>this:fetch("mean",$last) let $val := this:poisson($λ) return $val } ) return if (empty($last)) then $result else $result[position()>1] ) case "exponential" return ( let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $last := $values[last()] let $λ := $algorithm=>this:fetch("lambda",$last) let $val := this:exponential($λ) return $val } ) return if (empty($last)) then $result else $result[position()>1] ) case "gamma" return ( let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $last := $values[last()] let $k := $algorithm=>this:fetch("k",$last) let $θ := $algorithm=>this:fetch("theta",$last) let $val := this:gamma($k, $θ) return $val } ) return if (empty($last)) then $result else $result[position()>1] ) case "beta" return ( let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $last := $values[last()] let $α := $algorithm=>this:fetch("alpha",$last) let $β := $algorithm=>this:fetch("beta",$last) let $val := this:beta($α, $β) return $val } ) return if (empty($last)) then $result else $result[position()>1] ) case "sums" return (: It's probably a bad idea to have non-static probability sums :) (: Precompute these and include them in the algorithm map :) let $fixed-sums := if (this:is-complex("sums", $algorithm)) then () else $algorithm("sums") let $fixed-max-rank := if (exists($fixed-sums)) then count($fixed-sums) else () let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $sums := if (exists($fixed-sums)) then $fixed-sums else $algorithm=>this:fetch("sums",$values[last()]) let $max-rank := if (exists($fixed-max-rank)) then $fixed-max-rank else count($sums) let $val := this:select-index($sums, $max-rank) return $val } ) return if (empty($last)) then $result else $result[position()>1] case "zipf" return (: It's probably a bad idea to have non-static Zipf parameters : An even worse idea to have to recompute probability sums : for each time through. : Precompute these and put them in the map or you'll be sorry : API constructors use constants, so no worries :) let $fixed-sums := if (this:is-complex("sums", $algorithm)) then () else ( if (exists($algorithm("sums"))) then $algorithm("sums") else ( if (this:is-complex("alpha", $algorithm) or this:is-complex("limit", $algorithm) ) then () else ( this:zipf-sums( ($algorithm("alpha"),1)[1], ($algorithm("limit"),1000)[1] + 1 ) ) ) ) let $fixed-max-rank := if (exists($fixed-sums)) then count($fixed-sums) else () let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $sums := if (exists($fixed-sums)) then $fixed-sums else if (exists($algorithm=>this:fetch("sums",$values[last()]))) then $algorithm=>this:fetch("sums",$values[last()]) else this:zipf-sums( ($algorithm=>this:fetch("alpha",$values[last()]),1)[1], ($algorithm=>this:fetch("limit",$values[last()]),1000)[1] + 1 ) let $max-rank := ( if (exists($fixed-max-rank)) then $fixed-max-rank else count($sums) ) - 1 (: We added one to sum calc :) let $val := this:select-index($sums, $max-rank) return $val } ) return if (empty($last)) then $result else $result[position()>1] case "markov" return (: It's probably a bad idea to have non-static Markov parameters :) (: Precompute them and include them in the algorithm map :) (: API constructors use constants so no worries :) let $fixed-sums := if (this:is-complex("sums", $algorithm)) then () else ( if (exists($algorithm("sums"))) then $algorithm("sums") else ( if (this:is-complex("matrix", $algorithm) or this:is-complex("dim", $algorithm) ) then () else ( let $matrix := $algorithm("matrix") let $dim := ( $algorithm("dim"), math:sqrt(count($matrix)) cast as xs:integer )[1] return this:markov-sums($dim, $matrix) ) ) ) let $fixed-dim := if (this:is-complex("dim", $algorithm)) then () else if ($algorithm=>map:contains("dim")) then $algorithm("dim") else if (exists($fixed-sums)) then math:sqrt(count($fixed-sums)) cast as xs:integer else () let $start := ($last,$algorithm=>this:fetch("start",$last),this:cast(this:uniform(1,$fixed-dim),"integer"))[1] return fold-left( 1 to $N, $start, function($values as item()*, $i as xs:integer) as item()* { $values , let $sums := if (exists($fixed-sums)) then $fixed-sums else if (exists($algorithm=>this:fetch("sums",$values[last()]))) then $algorithm=>this:fetch("sums",$values[last()]) else ( let $matrix := $algorithm=>this:fetch("matrix",$values[last()]) let $dim := ( $algorithm=>this:fetch("dim",$values[last()]), math:sqrt(count($matrix)) cast as xs:integer )[1] return this:markov-sums($dim, $matrix) ) let $dim := ( $fixed-dim, $algorithm=>this:fetch("dim",$values[last()]), math:sqrt(count($sums)) cast as xs:integer )[1] let $val := this:markov($values[last()],$dim,$sums) return $val } )[position()>1] (: start is never empty :) default return ( let $result := fold-left( 1 to $N, $last, function($values as item()*, $i as xs:integer) as item()* { $values , let $min := $algorithm=>this:fetch("min",$values[last()]) let $max := $algorithm=>this:fetch("max",$values[last()]) let $val := this:constant($min, $max) return $val } ) return if (empty($last)) then $result else $result[position()>1] ) return $values ) else ( (: Everything is fixed, just look up keys :) let $distribution := ($algorithm("distribution"),"uniform")[1] let $min := $algorithm("min") (: OK to be empty :) let $max := $algorithm("max") (: OK to be empty :) let $values := switch ($distribution) case "constant" return for $i in 1 to $N return this:constant($min, $max) case "uniform" return for $i in 1 to $N return this:uniform($min, $max) case "normal" return let $μ := ($algorithm("mean"),0)[1] let $σ := ($algorithm("std"),1)[1] for $i in 1 to $N return this:normal($μ, $σ) case "skewed" return let $μ := ($algorithm("mean"),0)[1] let $σ := ($algorithm("std"),1)[1] let $α := ($algorithm("skew"),0)[1] for $i in 1 to $N return this:skewed($μ, $σ, $α) case "bernoulli" return let $p := ($algorithm("p"),50)[1] for $i in 1 to $N return if (this:flip($p)) then 1 else 0 case "flip" return let $p := ($algorithm("p"),50)[1] for $i in 1 to $N return this:flip($p) case "sums" return let $sums := $algorithm("sums") let $max-rank := count($sums) for $i in 1 to $N return this:select-index($sums, $max-rank) case "multimodal" return ( let $distributions := $algorithm("distributions") let $selector := $algorithm("selector") let $flips := for $i in 1 to $N return this:randomize($selector) (: let $numd := count($distributions) let $flips := for $i in 1 to $N return this:cast(this:uniform(1, $numd),"integer") :) (: RAW: v11, v12, ..., v1N, v21, ..., v2N, ...., vd1, ... vdN :) (: Run this way so that $last applies :) let $raw := for $subdistribution in $distributions return this:randomize($N, $last, $subdistribution) for $i in 1 to $N return ( $raw[($flips[$i] - 1)*$N + $i] ) ) case "binomial" return let $sums := if (exists($algorithm("sums"))) then $algorithm("sums") else ( let $n := $algorithm("max") cast as xs:integer let $p := ($algorithm("p"),50)[1] div 100 return ( dist:simple-sums( for $k in 1 to $n return ( util:binomial($n, $k)*math:pow($p, $k)*math:pow(1 - $p, $n - $k) ) ) ) ) let $max-rank := count($sums) for $i in 1 to $N return this:select-index($sums, $max-rank) case "binomial-beta" return let $n := $algorithm("max") cast as xs:integer let $α := $algorithm("alpha") let $β := $algorithm("beta") for $i in 1 to $N return this:binomial-beta($n, $α, $β) case "binomial-poisson" return let $ps := $algorithm("probabilities") for $i in 1 to $N return this:binomial-poisson($ps) case "poisson" return let $λ := $algorithm("mean") for $i in 1 to $N return this:poisson($λ) case "exponential" return let $λ := $algorithm("lambda") for $i in 1 to $N return this:exponential($λ) case "gamma" return let $k := $algorithm("k") let $θ := $algorithm("theta") for $i in 1 to $N return this:gamma($k, $θ) case "beta" return let $α := $algorithm("alpha") let $β := $algorithm("beta") for $i in 1 to $N return this:beta($α, $β) case "zipf" return let $sums := if (exists($algorithm("sums"))) then $algorithm("sums") else this:zipf-sums( ($algorithm("alpha"),1)[1], ($algorithm("limit"),1000)[1] + 1 ) let $max-rank := count($sums) - 1 for $i in 1 to $N return this:select-index($sums, $max-rank) case "markov" return let $sums := if (exists($algorithm("sums"))) then $algorithm("sums") else ( let $matrix := $algorithm("matrix") let $dim := ( $algorithm("dim"), math:sqrt(count($matrix)) cast as xs:integer )[1] return this:markov-sums($dim, $matrix) ) let $dim := ( $algorithm("dim"), math:sqrt(count($sums)) cast as xs:integer )[1] let $start := ($last,$algorithm("start"),this:cast(this:uniform(1,$dim),"integer"))[1] return fold-left( 1 to $N, $start, function($z as item()*, $i as xs:integer) as item()* { $z, this:markov($z[last()],$dim,$sums) } )[position()>1] default return for $i in 1 to $N return this:constant($min, $max) return $values ) }; (:~ : select-random() : Given a set of values and a randomizer (see randomize()), use the output : of randomize as an index into the value set, and return the selected values. : : @param $N: How many values to return : @param $last: Previous value (needed for markov, available to key callback) : @param $algorithm: The algorithm to use, a map with various parameter keys : @param $values: The value set to choose from : @return random values :) declare %art:non-deterministic function this:select-random($N as xs:integer, $last as item()?, $algorithm as map(xs:string,item()*), $values as item()*) as item()* { let $count := count($values) let $indices := this:randomize($N, $last, $algorithm) for $raw-index in $indices (: let $index := 1 + (($raw-index cast as xs:integer) mod $count) :) let $index := max((min(($raw-index cast as xs:integer, $count)),1)) return $values[$index] }; (:~ : select-random() : Given a set of values and a randomizer (see randomize()), use the output : of randomize as an index into the value set, and return the selected values. : : @param $N: How many values to return : @param $algorithm: The algorithm to use, a map with various parameter keys : @param $values: The value set to choose from : @return random values :) declare %art:non-deterministic function this:select-random($N as xs:integer, $algorithm as map(xs:string,item()*), $values as item()*) as item()* { let $count := count($values) let $indices := this:randomize($N, (), $algorithm) for $raw-index in $indices let $index := max((min(($raw-index cast as xs:integer, $count)),1)) return $values[$index] }; (:~ : select-random() : Given a set of values and a randomizer (see randomize()), use the output : of randomize as an index into the value set, and return the selected values : : @param $algorithm: The algorithm to use, a map with various parameter keys : @param $values: The value set to choose from : @return random value :) declare %art:non-deterministic function this:select-random($algorithm as map(xs:string,item()*), $values as item()*) as item() { let $count := count($values) let $raw-index := this:randomize($algorithm) let $index := max((min(($raw-index cast as xs:integer, $count)),1)) return $values[$index] }; (:~ : select-random() : Given a set of values pick as (uniformly random) value from the : value set, and return it : : @param $values: The value set to choose from : @return random value :) declare %art:non-deterministic function this:select-random($values as item()*) as item() { let $count := count($values) let $algorithm := dist:uniform(1, $count)=>dist:cast("integer")=>dist:is-complex(false()) let $raw-index := this:randomize($algorithm) (: let $index := 1 + (($raw-index cast as xs:integer) mod $count) :) let $index := max((min(($raw-index cast as xs:integer, $count)),1)) let $result := $values[$index] return ( util:assert(exists($result), "No result! Index="||$index||" raw="||$raw-index||" range=[1:"||$count||"] dist="||util:quote($algorithm)), $result ) }; (:~ : randomize() : Return a sequence of random data distributed as per the algorithm. : : @param $N: How many values to return : @param $last: Previous value (needed for markov, available to key callback) : @param $algorithm: The algorithm to use, a map with various parameter keys : distribution: Distribution to use, one of "constant", "uniform", "normal", : "skewed", "bernoulli", "flip", "zipf", "markov", "sums", "multimodal", : "binomial-poisson", "poisson", "exponential", "gamma", "beta", "binomial" : "binomial-beta" : min: minimum value (optional) : max: maximum value (uniform, binomial, binomial-beta, else optional) : pre-multiplier: multiplier before min/max (optional) : post-multiplier: multiplier after min/max (optional) : post-shift: shift to add after min/max (optional) : cast: cast type (optional) : mean: mean of distribution (normal, skewed, poisson, exponential) (default=0) : std: standard deviation of distribution (normal, skewed) (default=1) : skew: skew of distribution (skewed) (default=0) : p: probability (bernoulli, flip, binomial) (default=50) : probabilities: probabilities (binomial-poisson) : sums: cumulative probability sums (zipf, markov, sums) : alpha: alpha parameter (zipf) (needed if no sums) (default=1) : alpha parameter (beta, binomial-beta) : limit: number of sums (zipf) (needed if no sums) (default=1000) : start: index of starting symbol (markov) (default=uniform[1,dim]) : dim: size of each dimension of Markov matrix (markov) : matrix: raw Markov matrix (used if sums not provided, not recommended) : distributions: sequence of distributions to combine (multimodal) : k: k parameter (gamma) : theta: theta parameter (gamma) : beta: beta parameter (beta, binomial-beta) : lambda: lambda parameter (exponential) : truncation: how to keep value in [min,max] range (not constant/uniform) : ceiling => ceiling/floor to min/max : drop => toss out (so may return < $N) : resample (default) => resample until an in-bound value is found or : resample limit is reached ($RESAMPLE-LIMIT=10) then use ceiling : @return random values :) declare %art:non-deterministic function this:randomize($N as xs:integer, $last as item()?, $algorithm as map(xs:string,item()*)) { if (this:is-complex($algorithm)) then ( if (this:is-complex("distribution", $algorithm) or this:is-complex("distributions", $algorithm) or this:is-complex("selector", $algorithm) ) then ( (: We change distributions, so we have to do this one value at a time :) let $result := fold-left( 1 to $N, $last, function($vals as item()*, $i as xs:integer) as item()* { $vals , this:randomize(1, $vals[last()], $algorithm) } ) return ( if (empty($last)) then $result else $result[position()>1] ) ) else ( (: Distribution is fixed, but we may alter any other parameter : so need to feed $last as appropriate to the sequence : For multimodal distributions is also fixed :) let $distribution := ($algorithm("distribution"),"uniform")[1] let $values := this:raw-distribution($N, $last, $algorithm) let $values := fold-left( $values, $last, function($vals as item()*, $val as item()?) as item()* { $vals , this:modify($val, $vals[last()], $algorithm) } ) return if (empty($last)) then $values else $values[position()>1] ) ) else ( (: Everything is fixed, just look up keys :) let $distribution := ($algorithm("distribution"),"uniform")[1] let $values := this:raw-distribution($N, $last, $algorithm) let $values := this:modify-all($values, $last, $algorithm) return ( $values ) ) }; (:~ : randomize() : Generate one random value with given previous value. : See randomize($N, $last, $algorithm) above. : : @param $N: How many values to return : @param $algorithm: The algorithm to use : @return random values :) declare %art:non-deterministic function this:randomize($N as xs:integer, $algorithm as map(xs:string,item()*)) { this:randomize($N, (), $algorithm) }; (:~ : randomize() : Generate one random value. : See randomize($N, $last, $algorithm) above. : : @param $algorithm: The algorithm to use : @return random value :) declare %art:non-deterministic function this:randomize($algorithm as map(xs:string,item()*)) { this:randomize(1, (), $algorithm) }; (:~ : histogram() : Print out the distribution as a histogram. : : @param $width: Size of each bucket (max - min) : @param $height: Number of values per star : @param $data: Data to analyze : @return sequence of strings, one per bucket :) declare function this:histogram($width as xs:integer, $height as xs:integer, $data as xs:double*) as xs:string* { let $buckets := fold-left( $data, map {}, function($buckets as map(xs:integer, xs:integer), $d as xs:double) as map(xs:integer,xs:integer) { let $key := round-half-to-even($d div $width, 0) cast as xs:integer let $num := $buckets($key) return ( if (exists($num)) then $buckets=>map:put($key, $num+1) else $buckets=>map:put($key, 1) ) } ) for $key in $buckets=>map:keys() let $num := xs:integer($buckets($key)) order by $key return ( format-number($key*$width," 00;-00")||" "|| format-number($num,"000")||" "|| string-join(for $i in 1 to $num idiv $height return "*","") ) }; (:~ : mean() : Return the mean of the distribution. Some distributions don't have a mean; : these will raise an error. : : @param $algorithm: The algorithm to use, a map with various parameter keys : (see randomize()) : @param $last: Previous value : @return mean : @error RAND-BADMINMAX if uniform distribution has max < min : @error RAND-UNIMPLEMENTED if don't know how to compute mean for that distribution :) declare function this:mean($algorithm as map(xs:string,item()*), $last as item()?) { switch ($algorithm("distribution")) case "constant" return this:randomize($algorithm) case "uniform" return let $min := $algorithm=>this:fetch("min",$last) (: OK to be empty :) let $max := $algorithm=>this:fetch("max",$last) (: OK to be empty :) return if ($max < $min) then errors:error("RAND-BADMINMAX",($min,$max)) else ($max - $min) idiv 2 case "normal" return ($algorithm=>this:fetch("mean",$last),0)[1] case "skewed" return ($algorithm=>this:fetch("mean",$last),0)[1] case "bernoulli" return let $p := ($algorithm=>this:fetch("p",$last),50)[1] return $p div 100 case "flip" return let $p := ($algorithm=>this:fetch("p",$last),50)[1] return $p div 100 case "binomial" return let $p := ($algorithm=>this:fetch("p",$last),50)[1] let $n := $algorithm=>this:fetch("max",$last) return $n * $p div 100 case "binomial-beta" return ( let $n := $algorithm=>this:fetch("max",$last) cast as xs:integer let $α := $algorithm=>this:fetch("alpha",$last) let $β := $algorithm=>this:fetch("beta",$last) return ( $n * $α div ($α + $β) ) ) case "binomial-poisson" return let $ps := $algorithm=>this:fetch("probabilities", $last) return sum($ps) div 100 case "poisson" return $algorithm=>this:fetch("mean",$last) case "exponential" return 1 div $algorithm=>this:fetch("lambda",$last) case "gamma" return ( $algorithm=>this:fetch("k",$last) * $algorithm=>this:fetch("theta",$last) ) case "beta" return ( let $α := $algorithm=>this:fetch("alpha",$last) let $β := $algorithm=>this:fetch("beta",$last) return ( $α div ($α + $β) ) ) default return errors:error("RAND-UNIMPLEMENTED", ($algorithm("distribution"), "mean")) }; (:~ : mean() : Return the mean of the distribution. Some distributions don't have a mean; : these will raise an error. : : @param $algorithm: The algorithm to use, a map with various parameter keys : (see randomize()) : @return mean : @error RAND-BADMINMAX if uniform distribution has max < min : @error RAND-UNIMPLEMENTED if don't know how to compute mean for that distribution :) declare function this:mean($algorithm as map(xs:string,item()*)) { this:mean($algorithm, ()) }; (:~ : this:id() : Construct an id of the form {$prefix}{hex-number} : : @param $prefix: prefix for the id : @return random id :) declare %art:non-deterministic function this:id($prefix as xs:string) as xs:string { $prefix||"-"||util:integer-to-hex(this:uniform(0,1E7)=>this:cast("integer")) }; (:~ : this:id() : Construct an id of the form {$prefix}{hex-number in range 0 to $n} : : @param $prefix: prefix for the id : @param $n: range limit : @return random id :) declare %art:non-deterministic function this:id($prefix as xs:string, $n as xs:integer) as xs:string { $prefix||"-"||util:integer-to-hex(this:uniform(0,$n)=>this:cast("integer")) }; (:~ : mutual-prime-pair() : A random pair of mutually prime numbers less than the given maximum, : which has to be more than 2, or you can't find two primes. : : @param $max: maximum value to return : @return random sequence of integers that are mutually prime : @error ML-BADARGS if $max is too small :) declare %art:non-deterministic function this:mutual-prime-pair($max as xs:integer) as xs:integer* { if ($max <= 2) then errors:error("ML-BADARGS", ("max", $max)) else (), let $base-selection := if ($max > $util:PRIMES100[last()]) then $util:CANVAS-PRIMES[. <= $max] else $util:PRIMES100[. <= $max] let $q := this:select-random($base-selection) let $r := util:while( function($r as xs:integer) as xs:boolean { max(($q,$r)) mod min(($q,$r)) = 0 }, function($r as xs:integer) as xs:integer { $r + 1 }, this:select-random(2 to 2 * util:round(math:sqrt($q) + 1)) ) return ($q, $r) }; (:~ : partition() : Make a partition (sum totalling to $sum) where no part is more than $max : Example: partition(10, 5) may be (4, 1, 2, 3) but not (6, 4) : : @param $sum: sum of values to divide : @param $max: maximum value to return : @return random sequence of integers that total to the sum :) declare %art:non-deterministic function this:partition($sum as xs:integer, $max as xs:integer) as xs:integer* { if ($sum = 0 or $max = 0) then () else if ($sum = 1) then 1 else ( let $limit := min(($sum, $max)) let $first := this:normal($limit div 2, $limit div 2)=> this:range(1, $limit)=> this:cast("integer") return ( $first, this:partition($sum - $first, $max) ) ) }; (:~ : partition() : Make a partition (sum totalling to $sum) where no part is more than $max : and no part is less than $min : Example: partition(10, 5) may be (4, 2, 2, 2) but not (4, 1, 2, 3) : : @param $sum: sum of values to divide : @param $max: maximum value to return : @param $min: minimum value to return : @return random sequence of integers that total to the sum :) declare %art:non-deterministic function this:partition($sum as xs:integer, $max as xs:integer, $min as xs:integer) as xs:integer* { if ($sum = 0 or $max = 0) then () else if ($sum = $min) then $min else ( let $tentative := this:do-partition($sum, $max, $min) let $tentative-sum := sum($tentative) let $delta := $sum - $tentative-sum return ( util:assert($delta >= 0, "Sum of tentative partition is too high"), if ($delta = 0) then $tentative else ( fold-left(1 to $delta, $tentative, function($seq as xs:integer*, $i as xs:integer) as xs:integer* { let $pick := this:shuffle( for $x at $j in $seq where $x < $max return $j )=>head() for $x at $j in $seq return ( if ($j = $pick) then $x + 1 else $x ) } ) ) ) ) }; declare %private %art:non-deterministic function this:do-partition($sum as xs:integer, $max as xs:integer, $min as xs:integer) as xs:integer* { if ($sum = 0 or $max = 0 or $min > $sum) then () else if ($sum = $min) then $min else ( let $limit := min(($sum, $max)) let $first := this:normal($limit div 2, $limit div 2)=> this:range($min, $limit)=> this:cast("integer") return ( $first, this:do-partition($sum - $first, $max, $min) ) ) }; (:~ : n-partition() : Make a partition (sum) of $sum with exactly $n parts : It is possible to get unlucky and end up with a 0 partition here: we have : no min constraint. If you care about that, you'll need to adjust the sequence : afterwards. : : @param $sum: sum of values to divide : @param $n: number of parts in division : @return random sequence of integers that total to the sum :) declare %art:non-deterministic function this:n-partition($sum as xs:integer, $n as xs:integer) as xs:integer* { if ($n = 0) then () else if ($n = 1) then $sum else if ($sum <= 1) then ( $sum, this:n-partition(0, $n - 1) ) else ( let $limit := max(($sum - 1, 1)) let $first := this:normal($limit div 2, $limit div 4)=> this:range(1, $limit)=> this:cast("integer") return ( $first, this:n-partition($sum - $first, $n - 1) ) ) };