Descriptive Statistics

Photo by Mick Haupt on Unsplash

Descriptive Statistics

The way of summarizing huge numbers

·

3 min read

Descriptive statistics provide brief, easy to understand, summaries of data. When we have large dataset filled with numbers descriptive statistics provide us quick summaries of the data like what's the average value (mean), what's the most frequent value (mode) etc.

Descriptive statistics can be categorized into 3 types, measures of central tendency, measures of variability and measures of position. Note that any measurable characteristic of population is called parameter and any measurable characteristic of a sample is a statistic.

Measures of Central Tendency

These represent the measures of central location in a set of numerical observations.

Mean

It's calculated by sum of all values divided by total no of values. The main disadvantage of mean is, it's very sensitive to the outliers or extreme values in a dataset, presence of outliers can significantly change the mean values. mean.png

Median

This is a central value, which doesn't suffer the distorting effects caused by outliers. It is not sensitive to outliers like mean, so it can be an alternate descriptive statistic when outliers are present. Half of the dataset values are less than median and other half are greater than median, median lies at the center of data values.

Mode

It is the most frequent value in data, there can be multiple modes or no mode at all.

Weighted Mean

A weighted mean assigns more weight to some data values than to others. wm.png

Measures of Variability

These represent the spread or dispersion of the data values. It is important to know the variability of numbers in the dataset. The best descriptions of variability concern distances or deviations of the data from some measure of central tendency (typically mean).

Range

It is the difference between the maximum and minimum values in a dataset. The range isn't a good measure of variability mainly if data has outliers.

Variance

Variance is the squared differences between data values and the mean. va.png

Standard Deviation

The standard deviation measures standard value by which the values in a dataset differ from the mean. sd.png

Degrees of Freedom

it represents the no of data elements that are free to vary. the n-1 in the denominator of sample variance represents degrees of freedom. Why do we need this? If the sum of squared deviations from mean were divided by sample size n, a biased variance would result. the value of sample variance which is an estimate of unknown population variance would tend to be slightly too small over many trials. this is because the numerator sum of squared deviations provides a minimum value unless the population mean is used in calculating the deviations instead of sample mean. To compensate this bias we reduce the denominator by using n-1.

Coefficient of Variation

It specifies the size of standard deviation as a percentage of mean. cef.png

Measures of Position

It evaluates the position of a single element in relation to other values.

Percentile

The pth percentile is the data value such that p% of observations fall below pth percentile and (100-p)% lie above it. Percentiles divide observations into 100 parts of equal size, deciles divide observations into 10 parts of equal size, Quartiles divide observations into 4 parts of equal size.

Standard Score

It determines the no of standard deviations by which a particular value lies above or below mean.

Chebyshev's Theorem

According to Chebyshev, irrespective of the shape of a data distribution at least 75% of all values lie within -2standard_deviation to +2standard_deviation. And atleast 85% lie within -3sd to +3sd (sd = standard_deviation).

Empirical Rule

In a normal distribution or in a symmetrical distribution with most values close to mean, 68% of all values lie within one standard deviation from mean, 95% lie within two standard deviations from mean, 99.7% lie within three standard deviations of mean.

empirical-rule-vs-chebyshev-theorem.png

Did you find this article valuable?

Support Dataset Stories by becoming a sponsor. Any amount is appreciated!