Julia: Tutorial & Code-Collection¶
Source: https://github.com/markomlikota/CodingSoftware
MIT License, © Marko Mlikota, https://markomlikota.github.io
Ramdom Variables¶
In [ ]:
# Let's install the new packages needed in this section:
vPkgs = ["StatsBase",
"StatisticalRethinking",
"Random"]
Pkg.add.(vPkgs)
In [2]:
# The package "Distributions" yields a function for each distribution.
# e.g. Uniform(3,9) defines a uniform distribution between 3 and 9,
# Normal(1,2) defines a Normal distribution with mean 1 and variance 2,
# Exponential(3) defines an Exponential distribution with scale parameter 3,
# MvNormal(vμ,mΣ) defines a multivariate Normal distribution with mean vμ and variance mΣ,
# etc.
# See https://juliastats.org/Distributions.jl/stable/
# for the distributions supported and the parameters by which they are identified.
using Distributions
In [3]:
# To draw a random value from one of these distributions: e.g.
rand(Normal(1,2))
# for several (5) draws at once:
rand(Normal(1,2),5)
# for a matrix (5x3) of draws:
rand(Normal(1,2),5,3)
# To find out mean, variance, skewness, kurtosis:
mean(Normal(1,2))
var(Normal(1,2))
skewness(Normal(1,2))
kurtosis(Normal(1,2))
# To find out quantiles, e.g. the 30th percentile:
quantile(Normal(1,2), 0.3)
# the 5th and 95th percentile at once:
quantile(Normal(1,2), [0.05, 0.95])
Out[3]:
2-element Vector{Float64}: -2.2897072539029457 4.289707253902943
In [4]:
# To evaluate the pdf of such a distribution at a certain point: e.g.
pdf(Normal(1,2), 0.5)
# at several points at once:
pdf(Normal(1,2), [0.5,0.8])
# To evaluate the cdf:
cdf(Normal(1,2), 0.5)
# Logpdf and logcdf:
logpdf(Normal(1,2), 0.5)
logcdf(Normal(1,2), 0.5)
# Evaluating directly the log-pdf/-cdf is preferred to
# first evaluating the pdf/cdf and then taking logs,
# since pdf-values close to zero will result in the log being -Inf
# due to numerical rounding issues:
log(pdf(Normal(1,2), -400))
logpdf(Normal(1,2), -400)
Out[4]:
-20101.73708571376
In [5]:
# The syntax is analogous for multivariate distributions,
# whereby they need to be evaluated at a vector: e.g.
vμ = [1,3]
mΣ = [3.4 0.8; 0.8 1.2]
pdf(MvNormal(vμ,mΣ), [0.8,-0.1])
Out[5]:
0.0008523345816514483
In [6]:
# To draw from a Uniform distribution, can just type
rand()
# that's equivalent to
rand(Uniform(0,1))
Out[6]:
0.5543888354889855
In [7]:
# Can also add a truncation to distributions: e.g.
truncated(Normal(1,2), -2,4)
rand(truncated(Normal(1,2), -2,4))
logpdf(truncated(Normal(1,2), -2,4),0.5)
Out[7]:
-1.4999105069034409
In [8]:
# The "categorical" distribution (also referred to as "multinomial")
# is a discrete, univariate distribution with values ranging
# from 1 to N and manual probabilities for each value.
vProbs = [0.1, 0.05, 0.3, 0.2, 0.1, 0.25]
Categorical(vProbs)
rand(Categorical(vProbs))
# Of course, any other N values can be supported
# by regarding the value of Categorical(vProbs) as the index
# of the vector of desired values: e.g.
vValues = collect(range(3,step=3,length=6))
vValues[rand(Categorical(vProbs))]
Out[8]:
9
In [9]:
# Compute quantiles of a list (vector) of values (an empirical distribution):
vValues = rand(100)
quantile(vValues,0.05)
Out[9]:
0.0898719698595056
In [10]:
# Using the package "StatsBase", we can compute quantiles of
# a weighted list of values:
using StatsBase
vWeights = rand((0.8,1.2),100)
vWeights ./= sum(vWeights)
StatsBase.quantile(vValues,ProbabilityWeights(vWeights),0.05)
# (can also ignore weights:)
StatsBase.quantile(vValues,0.05)
# Can also compute weighted mean and variance:
StatsBase.mean(vValues,ProbabilityWeights(vWeights))
StatsBase.var(vValues
ParseError: # Error @ ]8;;file:///files/Julia/TutorialCodecollection/In[10]#16:22\In[10]:16:22]8;;\ StatsBase.mean(vValues,ProbabilityWeights(vWeights)) StatsBase.var(vValues # └ ── Expected `)` Stacktrace: [1] top-level scope @ In[10]:16
In [11]:
# Compute bounds of Bayesian HPD set:
using StatisticalRethinking
hpdi(vValues)
Out[11]:
2-element Vector{Float64}: 0.13428590744159596 0.9917666181077557
In [12]:
# To set the seed:
using Random
Random.seed!(232)
Out[12]:
TaskLocalRNG()
In [13]:
# This sets the seed globally,
# which can have unintended consequences.
# To set the seed locally, use MersenneTwister():
a = MersenneTwister(1234)
rand(a)
# Consider the following example:
function fRandA()
a = MersenneTwister(123);
return rand(a)
end
function fRandB()
return rand()
end
function fRandBoth()
return fRandA() + fRandB()
end
# Of course, every time it is called,
# fRandA() gives the same random number,
# while fRandB() gives different random numbers,.
# As expected, then, fRandBoth() gives different random numbers
# every time it is called:
println(fRandBoth())
println(fRandBoth())
println(fRandBoth())
1.1690346504461588 1.2356584834853748 0.8649207286627684
In [14]:
# Suppose we re-define fRandA() to set the seed
# using Random.seed!().
function fRandA()
Random.seed!(123)
return rand()
end
# This doesn't change the behavior of fRandA() nor fRandB()
# considered in isolation.
# However, it changes the behavior of fRandBoth():
# fRandA() sets the seed globally, i.e. also for subsequent
# calls of fRandB().
# Hence, fRandBoth() always gives the same random value:
fRandBoth()
# If we changed the order of fRandA() and fRandB() in fRandBoth(),
# the latter would give the same random value only from its second call onwards:
function fRandBoth()
return fRandB() + fRandA()
end
println(fRandBoth())
println(fRandBoth())
println(fRandBoth())
1.412092493628164 1.1080205529887315 1.1080205529887315
Try Exercises 11.a.i - 11.a.v