How To Use Z Score To Find Probability

A z-score measures the distance betwixt a information bespeak and the mean using standard deviations. Z-scores can be positive or negative. The sign tells you whether the observation is in a higher place or below the mean. For instance, a z-score of +ii indicates that the data bespeak falls two standard deviations above the mean, while a -2 signifies it is two standard deviations below the mean. A z-score of nix equals the mean. Statisticians too refer to z-scores equally standard scores, and I'll utilize those terms interchangeably.

Standardizing the raw data past transforming them into z-scores provides the following benefits:

Understand where a information bespeak fits into a distribution.
Compare observations betwixt dissimilar variables.
Place outliers
Summate probabilities and percentiles using the standard normal distribution.

In this post, I embrace all these uses for z-scores along with using z-tables, z-score calculators, and I prove you how to do it all in Excel.

How to Detect a Z-score

To calculate z-scores, take the raw measurements, subtract the mean, and divide by the standard deviation.

The formula for finding z-scores is the following:

$Z = {\displaystyle \frac {\text {X} - \mu}{\sigma}}$

X represents the data signal of interest. Mu and sigma represent the mean and standard deviation for the population from which y'all drew your sample. Alternatively, utilize the sample mean and standard difference when you exercise not know the population values.

Z-scores follow the distribution of the original data. Consequently, when the original data follow the normal distribution, so practice the respective z-scores. Specifically, the z-scores follow the standard normal distribution, which has a hateful of 0 and a standard deviation of 1. However, skewed data will produce z-scores that are similarly skewed.

In this post, I include graphs of z-scores using the standard normal distribution because they bring the concepts to life. Additionally, z-scores are well-nigh valuable when your data are normally distributed. However, be aware that when your data are nonnormal, the z-scores are also nonnormal, and the interpretations might non be valid.

Learn how to identify the distribution of your data!

Related posts: The Mean in Statistics and Standard Deviation

Using Z-scores to Understand How an Observation Fits into a Distribution

Z-scores help you lot sympathise where a specific observation falls inside a distribution. Sometimes the raw examination scores are not informative. For instance, Sat, ACT, and GRE scores do not have real-world interpretations on their own. An Sat score of 1340 is non fundamentally meaningful. Many psychological metrics are simply sums or averages of responses to a survey. For these cases, yous need to know how an individual score compares to the entire distribution of scores. For example, if your standard score for whatever of these tests is a +2, that's far above the mean. Now that's helpful!

In other cases, the measurement units are meaningful, simply you want to see the relative standing. For instance, if a baby weighs five kilograms, yous might wonder how her weight compares to others. For a one-month-old baby daughter, that equates to a z-score of 0.74. She weighs more than average, but not by a total standard deviation. Now you lot understand where she fits in with her cohort!

In all these cases, you're using standard scores to compare an observation to the average. You're placing that value within an entire distribution.

When your data are normally distributed, you can graph z-scores on the standard normal distribution, which is a particular course of the normal distribution. The hateful occurs at the peak with a z-score of zero. Higher up average z-scores are on the right half of the distribution and beneath average values are on the left. The graph below shows where the baby's z-score of 0.74 fits in the population.

Analysts ofttimes convert standard scores to percentiles, which I embrace afterwards in this post.

Related post: Understanding the Normal Distribution

Using Standard Scores to Compare Unlike Types of Variables

Z-scores allow you to have data points fatigued from populations with different means and standard deviations and place them on a common scale. This standard scale lets you lot compare observations for unlike types of variables that would otherwise exist difficult. That's why z-scores are also known every bit standard scores, and the procedure of transforming raw data to z-scores is chosen standardization. Information technology lets you compare data points beyond variables that have unlike distributions.

In other words, yous can compare apples to oranges. Isn't statistics g!

Imagine we literally need to compare apples to oranges. Specifically, we'll compare their weights. We take a 110-gram apple tree and a 100-gram orangish.

By comparing the raw values, it's like shooting fish in a barrel to see the apple tree weighs slightly more than the orange. Nonetheless, let's compare their z-scores. To do this, nosotros need to know the means and standard deviations for the populations of apples and oranges. Assume that apples and oranges follow a normal distribution with the following properties:

	Apples	Oranges
Mean weight grams	100	140
Standard Deviation	15	25

Let'due south calculate the Z-scores for our apple and orangish!

Apple tree = (110-100) / 15 = 0.667

Orange = (100-140) / 25 = -one.6

The apple's positive z-score (0.667) signifies that it is heavier than the average apple. It's non an extreme value, simply it is higher up the mean. Conversely, the orange has a markedly negative Z-score (-1.6). It'south well beneath the mean weight for oranges. I've positioned these standard scores in the standard normal distribution beneath.

Our apple is a bit heavier than boilerplate, while the orange is puny! Using z-scores, we learned where each fruit falls within its distribution and how they compare.

Using Z-scores to Detect Outliers

Z-scores can quantify the unusualness of an observation. Raw information values that are far from the average are unusual and potential outliers. Consequently, we're looking for high absolute z-scores.

The standard cutoff values for finding outliers are z-scores of +/-3 or more extreme. The standard normal distribution plot below displays the distribution of z-scores. Z-scores across the cutoff are so unusual you lot can inappreciably see the shading under the bend.

In populations that follow a normal distribution, Z-score values exterior +/- 3 have a probability of 0.0027 (2 * 0.00135), approximately 1 in 370 observations. However, if your data don't follow a normal distribution, this arroyo might non be correct.

For the example dataset, I brandish the raw data points and their z-scores. I circled an observation that is a potential outlier.

Caution: Z-scores can exist misleading in small datasets because the maximum z-score is limited to (n−1) / √ n.

Samples with 10 or fewer data points cannot have Z-scores that exceed the cutoff value of +/-3.

Additionally, an outlier'southward presence throws off the z-scores because it inflates the hateful and standard difference. Notice how all z-scores are negative except the outlier's value. If we calculated Z-scores without the outlier, they'd be dissimilar! If your dataset contains outliers, z-values appear to be less extreme (i.e., closer to zero).

Related post: Five Ways to Find Outliers

Using Z-tables to Calculate Probabilities and Percentiles

The standard normal distribution is a probability distribution. Consequently, if yous have only the mean and standard deviation, and you tin can reasonably assume your data follow the normal distribution (at least approximately), you can easily use z-scores to calculate probabilities and percentiles. Typically, you'll apply online calculators, Excel, or statistical software for these calculations. We'll get to that.

Just kickoff I'll show you the one-time-fashioned way of doing that by hand using z-tables.

Let'southward go dorsum to the z-score for our apple (0.667) from before. We'll use it to calculate its weight percentile. A percentile is the proportion of a population that falls below a value. Consequently, we demand to notice the area nether the standard normal distribution curve respective to the range of z-scores less than 0.667. In the portion of the z-table beneath, I'll use the standard score that is closest to our apple, which is 0.65.

Click here for a total Z-table and illustrated instructions for using information technology!

Related post: Understanding Probability Distributions and Probability Fundamentals

The Nuts and Bolts of Using Z-tables

Using these tables to calculate probabilities requires that you lot understand the properties of the normal distribution. While the tables provide an answer, it might not exist the respond you lot need. However, by applying your knowledge of the normal distribution, you can find your answer!

For example, the table indicates that the area of the curve between -0.65 and +0.65 is 48.43%. Unfortunately, that'southward non what we desire to know. We demand to find the expanse that is less than a z-score of 0.65.

We know that the two halves of the normal distribution are symmetrical, which helps u.s.a. solve our problem. The z-table tells us that the expanse for the range from -0.65 and +0.65 is 48.43%. Because of the symmetry, the interval from 0 to +0.65 must be one-half of that: 48.43/2 = 24.215%. Additionally, the area for all scores less than zero is one-half (50%) of the distribution.

Therefore, the area for all z-scores up to 0.65 = 50% + 24.215% = 74.215%

That's how y'all convert standard scores to percentiles. Our apple tree is at approximately the 74^th percentile.

If you want to calculate the probability for values falling between ranges of standard scores, summate the percentile for each z-score then subtract them.

For example, the probability of a z-score between 0.40 and 0.65 equals the difference between the percentiles for z = 0.65 and z = 0.40. We calculated the percentile for z = 0.65 above (74.215%). Using the same method, the percentile for z = 0.xl is 65.540%. Now we subtract the percentiles.

74.215% – 65.540% = viii.675%

The probability of an observation having a z-score between 0.40 and 0.65 is 8.675%.

Using just simple math and a z-table, you tin can easily observe the probabilities that you need!

Alternatively, utilise the Empirical Rule to find probabilities for values in a normal distribution using ranges based on standard deviations.

Related post: Percentiles: Interpretations and Calculations

Using Z-score Calculators

In this day and age, yous'll probably use software and online z-score calculators for these probability calculations. Statistical software produced the probability distribution plot below. It displays the apple'southward percentile with a graphical representation of the area under the standard normal distribution curve. Graphing is a great way to get an intuitive feel for what you're calculating using standard scores.

The percentile is a tad different because nosotros used the z-score of 0.65 in the table while the software uses the more precise value of 0.667.

Alternatively, yous tin enter z-scores into calculators, like this ane.

If yous enter the z-score value of 0.667, the left-tail p-value matches the shaded region in the probability plot above (0.7476). The right-tail value (0.2524) equals all values to a higher place our z-score, which is equivalent to the unshaded region in the graph. Unsurprisingly, those values add to 1 because you're covering the entire distribution.

How to Observe Z-scores in Excel

Yous can calculate z-scores and their probabilities in Excel. Let'southward work through an case. We'll return to our apple example and start past calculating standard scores for values in a dataset. I accept all the data and formulas in this Excel file: Z-scores.

To detect z-scores using Excel, you'll need to either calculate the sample hateful and standard departure or use population reference values. In this example, I use the sample estimates. If yous need to use population values supplied to y'all, enter them into the spreadsheet rather than calculating them.

My apple weight data are in cells A2:A21.

To summate the mean and standard departure, I use the following Excel functions:

Mean: =Boilerplate(A2:A21)
Standard deviation (sample): =STDEV.South(A2:A21)

So, in column B, I employ the following Excel formula to summate the z-scores:

=(A2-A$24)/A$26

Cell A24 is where I accept the mean, and A26 has the standard deviation. This formula takes a information value in column A, subtracts the mean, so divides by the standard deviation.

I copied that formula for all rows from B2:B21 and Excel displays z-scores for all data points.

Using Excel to Calculate Probabilities for Standard Scores

Side by side, I apply Excel's NORM.Southward.DIST office to calculate the probabilities associated with z-scores. I work with the standard score from our apple example, 0.667.

The NORM.S.DIST (Z, Cumulative) function provides either the cumulative distribution office (TRUE) or probability mass office (FALSE) for the z-score you specify. The probability mass function is the height value in the z-table earlier in this post, and it corresponds to the y-axis value on a probability distribution plot for the z-score. We'll use the cumulative function, which calculates the cumulative probability for all z-scores less than the value we specify.

In the part, nosotros need to specify the z-value (0.667) and utilize the TRUE parameter to obtain the cumulative probability.

I'll enter the post-obit:

= NORM.S.DIST(0.667,TRUE)

Excel displays 0.747613933, matching the output in the probability distribution plot higher up.

If yous desire to find the probability for values greater than the z-score, remember that the values to a higher place and below it must sum to 1. Therefore, subtract from one to summate probabilities for larger values:

= one – NORM.Southward.DIST(0.667,TRUE)

Excel displays 0.252386067.

Here's what my spreadsheet looks like.