Mathematical Statistics Functions
Data error may appear at any stage of data acquisition and pre-processing, and that will affect study results and leads to erroneous statistical interpretation and conclusions.
Mean
The mean is the most common measure of center. It is what most people think of when they hear the word "average." However, the mean is affected by extreme values, so it may not be the best measure of center to use in a skewed distribution.
Procedure for finding
- Add all the data values together
- Divide by the sampling counter
Properties
- The mean always exists
- The mean does not have to be one of the data values
- The mean uses all the data values
- The mean is affected by extreme values
Formula
\(\bar x = \frac{{{x_1} + {x_2} + \cdots + {x_n}}}{n} = \frac{{\sum\limits_{i = 1}^n {{x_i}} }}{n}\)
Example
>> mean([12, 7, 3, 4.2, 18, 2, 54, -21, 8, -5])
8.22
>> mean([1, 2, 3, 4, 4])
2.8
>> mean([-1.0, 2.5, 3.25, 5.75])
2.625
Standard Mean Calculation
#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <stdbool.h> #define NUM_SAMPLES 10 bool GetMean(double fValue, double * fAvg) { static int iNumSamples = 0; static double fTotalValues = 0.0; if (iNumSamples < NUM_SAMPLES) { fTotalValues += fValue; ++iNumSamples; return false; } else { *fAvg = fTotalValues / iNumSamples; fTotalValues = 0.0; iNumSamples = 0; return true; } } int main() { double fTempe; // Temperature value double fAvg; // Average Value Setup(); // Initialize peripherals while(1) { fTempe = Read_Temperature(); if (GetMean( fTemp, &fAvg)) { printf("The Average Temperature = %f \n", fAvg); } Delay(100); // Delay 100 ms } }
Moving Mean Calculation
#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <stdbool.h> #define NUM_SAMPLES 10 double MovingMean(const double fNewVal) { static double fSample[NUM_SAMPLES]; static double fTotalVal = 0.0; static int fNumSamples = 0; static int iArrIdx = 0; if (fNumSamples < NUM_SAMPLES) { fTotalVal += fNewVal; fSample[fNumSamples] = fNewVal; ++fNumSamples; } else { if (iArrIdx >= NUM_SAMPLES ) iArrIdx = 0; fTotalVal = fTotalVal - fSample[iArrIdx] + fNewVal; fSample[iArrIdx] = fNewVal; ++iArrIdx; } return fTotalVal / fNumSamples; } int main() { double fDistanceCm; // Obstacle Distance in cm double fNewAvg; Setup(); // Initialize peripherals while(1) { fDistanceCm = Read_Distance(); fNewAvg = MovingMean( fDistanceCm); printf("Obstacle Distance : %f cm \n", fNewAvg); Delay(250); // Delay 100 ms } }
Trimmed Mean
A trimmed mean is averaging that removes a small designated percentage of the largest and smallest values before calculating the mean. After removing the specified outlier observations, the trimmed mean is found using a standard arithmetic averaging formula. The trimmed mean has the benefit over the regular mean that the extreme values have been cast out, and so the trimmed mean more resistant to change than the mean.
Procedure for finding
- Rank the data from lowest to highest
- Remove the smallest 10% and the largest 10% of the values from the data
- Add the remaining value together
- Divide the total by the number of the remaining value
Example
>> mean([12, 7, 3, 4.2, 18, 2, 54, -21, 8, -5], trim = 0.2)
6.033333
>>
Trimmed Mean Calculation
#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <stdbool.h> #define NUM_SAMPLES 10 #define NUM_TREMMED 2 // Trim 20% (2/10 * 100%) bool TrimmedMean(const double fNewVal, double * fAVg) { static double fSample[NUM_SAMPLES]; static int fNumSamples = 0; double fTemp, fTotalVal = 0; int iMax; int i, j; if (fNumSamples < NUM_SAMPLES) { fSample[fNumSamples] = fNewVal; ++fNumSamples; return false; } else { // Selection Sort for (i = NUM_SAMPLES - 1; i >= 1; --i) { iMax = 0; for (j = 1; j <= i; j++) { if (fSample[j] > fSample[iMax]) iMax = j; } if ( i != iMax) { // swap (fSample[i], fSample[iMax]) fTemp = fSample[i]; fSample[i] = fSample[iMax]; fSample[iMax] = fTemp; } } for (i = NUM_TREMMED; i < NUM_SAMPLES-NUM_TREMMED; ++i) fTotalVal += fSample[i]; *fAvg = fTotalVal / (NUM_SAMPLES - NUM_TREMMED *2); fNumSamples = 0; return true; } } int main() { double fDistanceCm; // Obstacle Distance in cm double fNewAvg; Setup(); // Initialize peripherals while(1) { fDistanceCm = Read_Distance(); if (TrimmedMean(fDistanceCm, &fNewAvg)) { printf("Obstacle Distance : %f cm \n", fNewAvg); } Delay(250); // Delay 250 ms } }
Median
The middlemost value in a data series is called the median.
Procedure for finding
- Rank the data so that it is in order from lowest to highest
- Find the values in the middle
- If the number of data (n) is odd, then the median is the value in the n/2 position. If n is an even number, the median is the average of the two adjacent values in the (n/2-1) position and (n/2) position.
Properties
- The median always exists.
- The median does not have to be one of the data values.
- The median does not use all of the data values, only the one(s) in the middle.
- The median is resistant to change, and it is not affected by extreme values.
Example
>> median([12, 7, 3, 4.2, 18 ,2, 54, -21, 8, -5])
5.6
>> median([1, 3, 5])
3
>> median([1, 3, 5, 7])
4.0
Median Calculation
#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <stdbool.h> #define NUM_SAMPLES 10 bool Median(const double fNewVal, double * fRetVal) { static double fSample[NUM_SAMPLES]; static int fNumSamples = 0; double fMedian; int iMax; int i, j; if (fNumSamples < NUM_SAMPLES) { fSample[fNumSamples] = fNewVal; ++fNumSamples; return false; } else { // Selection Sort for (i = NUM_SAMPLES - 1; i >= 1; --i) { iMax = 0; for (j = 1; j <= i; j++) { if (fSample[j] > fSample[iMax]) iMax = j; } if ( i != iMax) { // swap (fSample[i], fSample[iMax]) fTemp = fSample[i]; fSample[i] = fSample[iMax]; fSample[iMax] = fTemp; } } fMedian = fSample[NUM_SAMPLES/2]; if (!(NUM_SAMPLES & 0x01)) fMedian = (fMedian + fSample[NUM_SAMPLES/2 - 1]) / 2.0; *fRetVal = fMedian; fNumSamples = 0; return true; } } int main() { double fTempe; // Temperature value double fNewVal; Setup(); // Initialize peripherals while(1) { fTempe = Read_Temperature(); if (Median( fTemp, &fNewVal)) { printf("The Average Temperature = %f \n", fNewVal); } Delay(100); // Delay 100 ms } }
Midrange
The midrange is the midpoint between the lowest and highest values.
Procedure for finding
- Add the lowest and highest values together.
- Divide by 2.
Properties
- The midrange always exists.
- The midrange does not have to be one of the values.
- The midrange does not use all of the values, only the lowest and highest.
- The midrange is greatly affected by extreme values since it uses only the extreme values.
Formula
\(\frac{{low + high}}{2}\)
Midrange Calculation
#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <stdbool.h> #define NUM_SAMPLES 10 bool Midrange(const double fNewVal, double * fAVg) { static double fSample[NUM_SAMPLES]; static int fNumSamples = 0; double fTemp; int iMax; int i, j; if (fNumSamples < NUM_SAMPLES) { fSample[fNumSamples] = fNewVal; ++fNumSamples; return false; } else { // Selection Sort for (i = NUM_SAMPLES - 1; i >= 1; --i) { iMax = 0; for (j = 1; j <= i; j++) { if (fSample[j] > fSample[iMax]) iMax = j; } if ( i != iMax) { // swap (fSample[i], fSample[iMax]) fTemp = fSample[i]; fSample[i] = fSample[iMax]; fSample[iMax] = fTemp; } } *fAvg = (fSample[0] + fSample[NUM_SAMPLES - 1]) / 2.0; fNumSamples = 0; return true; } } int main() { double fTempe; // Temperature value double fVal; // Average Value Setup(); // Initialize peripherals while(1) { fTempe = Read_Temperature(); if (Midrange( fTemp, &fVal)) { printf("The Temperature = %f \n", fVal); } Delay(100); // Delay 100 ms } }
Mode
The mode is the value that has the highest frequent value in a set of data. If no value appears more than any other, then there is no mode. If two or more values appear more than the others, then the data is bimodal or multi-modal.
Procedure for finding
- Rank the data in order from lowest to highest. This is not necessary, but it makes it easier to count how many times a certain value appears when they are in order.
- Find the frequency of each value.
- The most frequent value is the mode.
Properties
- The mode may or may not exist. If it does exist, there may be one or several modes.
- The mode has to be one of the data values.
- The mode does not use all the data values.
- It is probably not affected by extreme values since it's unlikely the extreme values are not the most common.
Example
>> mode([1, 1, 2, 3, 3, 3, 3, 4])
3
>> mode[(2, 1, 2, 3, 1, 2, 3, 4, 1, 5, 5, 3, 2, 3)]
2
>> mode[("o", "it", "the", "it", "it")]
"it"
>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
"red"
Geometric Mean
The Geometric Mean is a special type of average where we multiply the numbers together and then take a square root (for two numbers), cube root (for three numbers), etc. The geometric mean only exists when all of the data values are positive. It is often used when finding the average rates of change, rates of growth, or ratios.
Procedure for finding
- Multiply all of the data values together.
- Take the root of the product where the index is equal to the sample size. In other words, if there are 8 numbers, take the 8th root.
Formula
\(\sqrt[n]{{{x_1}\,{x_2} \cdots {x_n}}} = \sqrt[n]{{\prod\limits_{i = 1}^n {{x_i}} }}\)
Example
>> GeometricMean([2, 18])
6
>> GeometricMean([10, 51.2, 8])
16
>> GeometricMean([1, 3, 9, 27, 81])
9
Quadratic Mean
The quadratic mean is used in some physical applications such as power distribution systems. It is also called the Root Mean Square (R.M.S.).
Procedure for finding
- Square each value.
- Total the squares of each value.
- Divide the total by the number of values.
- Take the square root.
Formula
\(\sqrt {\frac{{x_1^2 + x_2^2 + \cdots + x_n^2}}{n}} = \sqrt {\frac{{\sum\limits_{i = 1}^n {x_i^2} }}{n}} \)
Harmonic Mean
The harmonic mean only exists when all of the values are positive. It is often used when the data consists of rates of change, such as speeds.
Procedure for finding
- Take the reciprocal of each data value.
- Find the sum of all the reciprocals.
- Divide the sample size by the total of the reciprocals.
Formula
\({\left( {\frac{{x_1^{ - 1} + x_2^{ - 1} + \cdots + x_n^{ - 1}}}{n}} \right)^{ - 1}} = \frac{n}{{\sum\limits_{i = 1}^n {\frac{1}{{{x_i}}}} }}\)