2021

René Staritzbichler

Describe reality by a few numbers

- mean / median / mode
- range / quantile / standard deviation
- confidence / significance
- correlation

- mean: average income
- median: income of average person
- median = 50th percentile
- mode: most common income
- normal distribution: mean, median and mode are the same

- range (sensitive to extreme values)
- inter quartile range
- standard deviation (symmetric data)

Two or more variables

- Pearson correlation
- Spearman rank correlation

Correlation is no causality!

Significance: p, t, z, -values

- z-test: normal distribution with known variance
- t-test: normal distribution with unknown variance

- Similarity / difference / effect
- Nullhypothesis: no significant relation
- Alternative hypothesis: significant relation
- Test whether Nullhypothesis can be rejected

- Select significance level, generally $ \alpha = 0.05$ or 0.01
- Perform e.g. t-test (returns p-value)
- $p < \alpha:$ reject Nullhypothesis $\Rightarrow$ significant
- $p \geq \alpha:$ Nullhypothesis not rejected $\Rightarrow$ insignificant

Compare mean values of 2 distributions

- one sample location test:
- two sample location test:

- Two sample test of independent variables
- Nullhypothesis: there are no significant differences in the mean values

- Normal
- t-Distribution
- Bernoulli
- Binomial
- Poisson
- Exponential
- Logarithmic

Example: income of all citizens

- many people with low to moderate income
- a few exceedingly rich people
- resulting issues:
- mean value is bad descriptor
- not possible to draw

- Inverse of exponential function $y=a^x \; \Rightarrow log_a y = x $
- Can show both small values and very large

- In x or y or both
- Can show both small values and very large

Beggars and billionaires

- too much confusion!

- Sensitivity
- Specificity
- Accuracy

- framing
- priming
- rounding
- social pressure

Wording has significant influence:

- 'giving 16 to 17 yrs old the right to vote': 52%+, 41%-
- 'reducing the voting age to 16': 37%+, 56%-

Answers depend on previous questions

- 10% of young people feel lonely
- BBC, after long list of questions: 42%

In surveys people tend to use round numbers

- 98% survival versus
- 2% death rate

- Biontech: 20% of german economical growth
- Biontech: 5 permille of german GNP

both are equivalent (2.5% growth rate)

- 99% divorce rate in Maine and per capita consumption of margarine
- 95% marriage rate in Kentucky and people drowning after falling out of a fisher boat

https://www.tylervigen.com

- "A nearby Waitrose adds £36,000 to house price"
- Moderate drinkers live longer than non-drinkers

- Being a pope helps living longer?
- Do right handers live longer?

Systems aim to return to their mean

Soccer: new trainer, return to normal

Speed cameras after accidents

- Some may be healed by the belief in something
- Some are healed by normal function of the body (return to mean)

"Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns. There are things we know we know. We also know there are known unknowns. That is to say, we know there are some things we do not know. But there are also unknown unknowns — the ones we don't know we don't know,"

Donald Rumsfeld, 2002

- too few samples
- relative vs absolute changes
- deceptive representation
- confusing representations

4 random normal distributions (logscale)

mean: 0, stdev: 1

20, 200, 2000, 20000 samples

__pit 1:__ IARC 2015: processed meat group I carcinogen

$\Rightarrow$ Daily Record: 'Bacon, ham and sausages have the same cancer risk as cigarettes warn experts'

$\Rightarrow$ IARC: confidence that there is an increased risk

__pit 2:__ 50g/day: relative: 18% (abs: 6% $\rightarrow$ 7%)

$\Rightarrow$ Media used absolute: 6% $\rightarrow$ 24%

All scans were taken from:

David Spiegelhalter 'The art of statistics'