Presentation of Statistical Data

A population refers to the entire set of data that we want to study. When we want to obtain reliable information about the population, it is often impossible or impractical for us to study the entire population as it may be too huge. Therefore, we need to select a sample which is a representative subset of the population.

To avoid any possible bias during sampling, we select a random sample such that any observations made are independent.

Discrete variables take on a countable number of possible values and are often restricted to integer values. Examples include shoe sizes and number of students in a class. When discrete data is presented in a frequency table, it may be individual or grouped into classes.

Continuous variables can take on any value in a certain range and are usually not restricted to integer values. Examples include height and weight. When continuous data is presented in a frequency distribution table, the data is grouped into ranges.

A frequency table shows the frequency of occurrence of each element in a dataset.

A bar chart is a diagram where discrete data is represented by horizontal or vertical bars. The size of each bar represents the frequency data.

A histogram is a vertical bar chart representing numerical information, often used for continuous data. The area of each rectangle is proportional to its frequency. By using class boundaries instead of class intervals, there will not be any gaps between the bars.

Class limits may be used to represent a certain class interval. However, to avoid any ambiguity, a class interval such as 50 to 59 will be taken as 49.5 to 59.5. In this example, the class width is 10 units.

The mean is the sum of the observations divided by the total number of observations in a set. To calculate the mean of grouped data, we represent all values in a class interval by the middle value of the interval.

Population mean, $\mu =\frac{{\displaystyle \sum _{i=1}^{k}{f}_{i}{x}_{i}}}{n}$

The median is the middle observation (for the case of an odd number of observations) or the mean of two middle observations (for the case of an even number of observations) when the set is arranged in ascending order.

The median divides the set into two halves. The median of the set of the lower values is known as the lower quartile.

The median of the set of higher values in a set divided into two halves is known as the upper quartile.

The interquartile range refers to the difference in value between the upper quartile and lower quartile.

Similarly, a set of values can be divided into 100 equal parts. Each of the parts is called a percentile. The median corresponds to the 50th percentile. The lower quartile corresponds to the 25th percentile, while the upper quartile corresponds to the 75th percentile.

The mode refers to the observation which has the highest number of occurrences in a set. In a grouped frequency distribution, the class with the highest number of occurrences is known as the modal class.

The population variance is an indicator of how spread out the observations is from the average.

Population variance, ${\sigma}^{2}=\frac{{\displaystyle \sum _{i=1}^{k}{f}_{i}{({x}_{i}-\mu )}^{2}}}{n}=\frac{{\displaystyle \sum _{i=1}^{k}{f}_{i}{x}_{i}^{2}}}{n}-{\mu}^{2}$

The sample standard deviation is the square root of the sample variance.

$\text{Standarddeviation}=\sqrt{\text{Variance}}$

When data is given in the form of a frequency distribution, the standard deviation is given by

${s}_{n}=\sqrt{\frac{{\displaystyle \sum _{i=1}^{n}{f}_{i}{({x}_{i}-\overline{x})}^{2}}}{{\displaystyle \sum _{i=1}^{n}{f}_{i}}}}$.

In a given possibility space (the set of all possible outcomes) U, each possible outcome is known as a sample point. If the possibility space has a finite number of sample points, the number of points is denoted by n(U).

For an event A which has m sample points, and that there are n sample points in the possibility space, the probability $\text{P}(A)=\frac{\text{n}(A)}{\text{n}(U)}=\frac{m}{n}$. In other words, for an event A which can happen m out of n equally likely outcomes (i.e. there is no bias), the probability of it happening is denoted by P(A).

Since A is a subset of U, we have $0\le m\le n$, which can be simplified to $0\le \frac{m}{n}\le 1$. In other words, $\text{0}\le \text{P}(A)\le 1$.

The complement of an event A refers to the event that A does not occur, and is usually denoted as ${A}^{\prime}$.

$\text{P}({A}^{\prime})=\frac{\text{n}({A}^{\prime})}{\text{n}(U)}=\frac{n-m}{n}=1-\frac{m}{n}=1-\text{P}(A)$

We may use a Venn diagram to illustrate the relationship between the probabilities of two events, A and B.

From the Venn diagram, we can observe this relationship:

$\text{P}(A\text{or}B)=\text{P}(A)+\text{P}(B)-\text{P}(A\text{and}B)$, where $\text{P}(A)\ne 0$ and $\text{P}(B)\ne 0$,

which is also written as $\text{P}(A\cup B)=\text{P}(A)+\text{P}(B)-\text{P}(A\cap B)$.

Given two events A and B such that $\text{P}(A)\ne 0$ and $\text{P}(B)\ne 0$, the probability of A, given that B has already occurred is

$\text{P}(A|B)=\frac{\text{P}(A\cap B)}{\text{P}(B)}$.

$\text{P}(A|B)\times \text{P}(B)=\text{P}(B|A)\times \text{P}(A)=\text{P}(A\cap B)$

Two events A and B are mutually exclusive if the probability of both events occurring at the same time is zero. In other words, both events A and B cannot occur together.

$\begin{array}{c}A\cap B=\varnothing \\ \text{n}(A\cap B)=0\end{array}$

$\begin{array}{c}\text{P}(A\cap B)=0\\ \text{P}(A\cup B)=\text{P}(A)+\text{P}(B)\end{array}$

Two events A and B are exhaustive if the probability of either event occurring is 1.

$\begin{array}{c}A\cup B=U\\ \text{P}(A\cup B)=1\end{array}$

The complementary events A and ${A}^{\prime}$ are mutually exclusive and exhaustive.

Two events A and B are independent if the occurrence of A does not affect the occurrence of B.

$\begin{array}{c}\text{P}(A|B)=\text{P}(A)\\ \text{P}(A\cap B)=\text{P}(A|B)\times \text{P}(B)=\text{P}(A)\times \text{P}(B)\end{array}$

When given two independent events, we notice that the probability of each event remains constant, regardless of whether the other event has occurred.

Consider $\text{P}(A)=0.3$ and $\text{P}(B)=0.2$, which gives us $\text{P}({A}^{\prime})=0.7$ and $\text{P}({B}^{\prime})=0.8$.

$\text{P}(A\cap B)=\text{P}(A)\times \text{P}(B)=0.06$. We can find the other values based on the given information.

When we consider that A has occurred, the probability of B occurring and the probability of B not occurring remains the same.

$\begin{array}{l}\text{P}(B|A)=\frac{0.06}{0.24+0.06}=0.2=\text{P}(B)\\ \text{P}({B}^{\prime}|A)=\frac{0.24}{0.24+0.06}=0.8=\text{P}({B}^{\prime})\end{array}$

When we consider that B has occurred, the probability of A occurring and the probability of A not occurring remains the same.

$\begin{array}{l}\text{P}(A|B)=\frac{0.06}{0.14+0.06}=0.3=\text{P}(A)\\ \text{P}({A}^{\prime}|B)=\frac{0.14}{0.14+0.06}=0.7=\text{P}({A}^{\prime})\end{array}$

When we consider that A has not occurred, the probability of B occurring and the probability of B not occurring remains the same.

$\begin{array}{l}\text{P}(B|{A}^{\prime})=\frac{0.14}{0.14+0.56}=0.2=\text{P}(B)\\ \text{P}({B}^{\prime}|{A}^{\prime})=\frac{0.56}{0.14+0.56}=0.8=\text{P}({B}^{\prime})\end{array}$

When we consider that B has not occurred, the probability of A occurring and the probability of A not occurring remains the same.

$\begin{array}{l}\text{P}(A|{B}^{\prime})=\frac{0.24}{0.24+0.56}=0.3=\text{P}(A)\\ \text{P}({A}^{\prime}|{B}^{\prime})=\frac{0.56}{0.24+0.56}=0.7=\text{P}({A}^{\prime})\end{array}$

From the above cases, we observe that the probability of an event occurring or not occurring is not affected by that of the other event.

$\text{P}(B|A)=\frac{\text{P}(B)\text{P}(A|B)}{\text{P}(B)\text{P}(A|B)+\text{P}({B}^{\prime})\text{P}(A|{B}^{\prime})}$

$\text{P}({B}_{i}|A)=\frac{\text{P}({B}_{i})\text{P}(A|{B}_{i})}{\text{P}({B}_{1})\text{P}(A|{B}_{1})+\text{P}({B}_{2})\text{P}(A|{B}_{2})+\dots \text{P}({B}_{n})\text{P}(A|{B}_{n})}$

The above results can be extended to three events.

$\begin{array}{l}\text{P}(A\cup B\cup C)=\text{P}(A)+\text{P}(B)+\text{P}(C)-\text{P}(A\cap B)-\text{P}(B\cap C)-\text{P}(C\cap A)+\text{P}(A\cap B\cap C)\\ \text{P}(A\cup B\cup C)=\text{P}(A)+\text{P}(B)+\text{P}(C)\text{formutuallyexclusiveevents}\\ \text{P}(A\cap B\cap C)=\text{P}(A)\times \text{P}(B)\times \text{P}(C)\text{forindependentevents}\end{array}$

A tree diagram can be used to determine the probability of obtaining specific results. In the tree diagram below, the probability of picking one red ball and one blue ball without replacement is $\text{P}(R,B)+\text{P}(B,R)=\frac{4}{9}\times \frac{5}{8}+\frac{5}{9}\times \frac{4}{8}=\frac{5}{9}$.

A variable describes a quantity being measured. When this variable comes from a random experiment, it is known as a random variable. Random variables are often denoted by capital letters such as X, while the possible values for the random variables are denoted by small letters such as x.

Discrete random variables take on a countable number of possible values and are often restricted to integer values.

For a discrete variable which can assume a countable number of values ${x}_{1},{x}_{2},\dots ,{x}_{n}$, the probabilities associated must be such that $0<\text{P}(X={x}_{i})\le 1$. 0 is excluded because ${x}_{i}$ will not be included in the list of values when it cannot occur.

$\sum _{\text{All}x}\text{P}(X=x)}=1$

The probability distribution of a discrete random variable shows the possible values of X and their associated probabilities. The probability distribution can be presented either in a table form or in a graphical form.

The probability density function (PDF) gives the relationship between the possible values of X and their associated probabilities. It is often expressed as a formula depending on the type of distribution.

The cumulative distribution function (CDF) gives the sum of the associated probabilities for the possible values of X up to a certain value not more than x.

Suppose that X can assume integer values between 0 and 6 inclusive.

$\text{P}(X\le 3\text{)}=\text{P}(X=0)+\text{P}(X=1)+\text{P}(X=2)+\text{P}(X=3)$

The expectation of a discrete random variable X is defined by

$\text{E}(X)={\displaystyle \sum _{\text{All}x}x\text{P}(X=x)}=\mu $.

Numerically, the expectation is equal to the population mean, μ.

If E(X) is the expectation of X,

E(k) = k, where k is a constant,

E(kX) = kE(X), where k is a constant,

$\text{E}(aX\pm bY)=a\text{E}(X)\pm b\text{E}(Y)$, where a and b are constants.

The population variance is defined by

$\text{Var}(X)=\text{E}({X}^{2})-{\left[\text{E}(X)\right]}^{2}$

For the case of a discrete random variable X,

$\text{Var}(X)={\displaystyle \sum _{\text{All}x}{(x-\mu )}^{2}\text{P}(X=x)}={\displaystyle \sum _{\text{All}x}{x}^{2}\text{P}(X=x)}-{\mu}^{2}={\sigma}^{2}$.

The population standard deviation of the discrete random variable X is $\sigma =\sqrt{\text{Var}(x)}$.

If Var(X) is the variance of X,

Var(k) = 0, where k is a constant,

Var(kX) = ${k}^{2}\text{Var}(X)$, where k is a constant,

$\text{Var}(aX\pm bY)={a}^{2}\text{Var}(X)+{b}^{2}\text{Var}(Y)$, where a and b are constants.

We must add two variances because variance is always positive.

In an experiment with n repeated independent trials and two mutually exclusive outcomes, where either success or failure can occur, we will obtain a binomial distribution. It is necessary for the probability of success p to remain the same throughout the experiment.

When X follows a binomial distribution, we say that $X~\text{B}(n,p)$, where there is a total of n trials and the probability of success is p.

$\text{P}(X=x)=\left(\begin{array}{c}n\\ x\end{array}\right){p}^{x}{(1-p)}^{n-x}$, where $x=0,1,\dots ,n$

$\begin{array}{c}\text{E}(X)=np\\ \text{Var}(X)=np(1-p)\end{array}$

The Poisson distribution is often used to model the probability of a number of events occurring in a fixed period of time, with a known mean value.

When X follows a Poisson distribution, we say that $X~\text{Po}(m)$, where m is the parameter such that m > 0. Observe that there is no upper limit for x.

$\text{P}(X=x)=\frac{{m}^{x}{\text{e}}^{-m}}{x!}$, where $x=0,1,2,\dots $

$\begin{array}{c}\text{E}(X)=m\\ \text{Var}(X)=m\end{array}$

A binomial distribution can be approximated using the Poisson distribution with mean m = np when

n is large (n > 50), and

p is sufficiently small such that np < 5.

Continuous random variables can take on any value on the number line. A continuous probability density function (PDF) is a function where

${\int}_{a}^{b}f(x)}\text{}dx=1$, on a given interval.

The expectation for a continuous random variable is

$\text{E}(X)={\displaystyle {\int}_{-\infty}^{\infty}x\text{}f(x)}\text{}dx=\mu $,

while the variance for a continuous random variable is

$\text{Var}(X)={\displaystyle {\int}_{-\infty}^{\infty}{(x-\mu )}^{2}\text{}f(x)}\text{}dx={\displaystyle {\int}_{-\infty}^{\infty}{x}^{2}\text{}f(x)}\text{}dx-{\mu}^{2}$.

When a continuous random variable X follows a normal distribution, we say that $X~\text{N}(\mu ,{\sigma}^{2})$.

The probability density function $f(x)=\frac{1}{\sigma \sqrt{2\pi}}{\text{e}}^{-\frac{1}{2}{\left(\frac{x-\mu}{\sigma}\right)}^{2}}$, where $x\in \mathbb{R}$. The area under the probability density function gives us the probabilities. For all values of x, $f(x)\ge 0$ and ${\int}_{-\infty}^{\infty}f(x)}\text{}dx=1$.

To find $\text{P}(a<x<b)$, we may use the GDC as the above function is difficult to integrate.

The normal distribution is bell-shaped and symmetrical about the line x = μ. The maximum value of f(x) occurs at x = μ, where $f(x)=\frac{1}{\sigma \sqrt{2\pi}}$ at that point. There are two points of inflexion at $x=\mu \pm \sigma $.

When μ and ${\sigma}^{2}$ are unknown, we need to standardise the variable $X~\text{N}(\mu ,{\sigma}^{2})$ such that $\text{E}(X)=\mu =0$ and $\text{Var}(X)={\sigma}^{2}=1$. This distribution is known as the standard normal distribution, where $Z~\text{N}(0,1)$.

$Z=\frac{X-\mu}{\sigma}$

$\text{P}(X\le x)=\text{P}(Z\le \frac{x-\mu}{\sigma})$

We can also make use of the symmetrical properties of the curve when finding positive and negative values of z.

$\begin{array}{l}\text{P}(Z<a)=\text{P}(Z>-a)=1-\text{P}(Z\ge a)\\ \text{P}(Z>a)=\text{P}(Z<-a)=1-\text{P}(Z\le a)\end{array}$

To find the value of x given the value of p such that P(X < x) = p, we may use the inverse normal function.