top of page

Z test using Python

Z -Test using Python package statsmodels

Conditions:

  1. Used when the sample distribution follows a normal distribution. To measure if normal, use skewness and kurtosis statistics.

  2. When the population variance is known. However, for most cases population variance is not known. However, when the sample size (n) is large enough (>= 30), we can substitute the sample variance but divide by n.

Test of normality:

We learn from basic statistics that bar plot, skewness and kurtosis can be used to test normality of a given data set. If skewness is between -0.5 and 0.5, the distribution is approximately symmetric. If the kurtosis is near the value of 3, then the sample data distribution is near normal.

Python has a package to test for normality using scipy:

scipy.stats.normaltest(x) where x is the sample data

Result: a tuple (m, p-value)

where m = s^2 + k^2, where s is the z-score returned by skewtest and k is the z-score returned by kurtosistest.

If p <= 0.05 then not normal distribution, otherwise normal distribution

Once sample data is found to be near normal distribution, ztest can be computed using statsmodel in Python.

statsmodels.stats.weightstats.ztest(x1, value= <population Mean>, alternative='two-sided')

where x1 is sample distribution, value is population mean

and alternative can also be ‘larger (default) or ‘smaller’

Result: a tuple (t statistic, p-value) if p-value <= 0.05 then sample distribution is not equal to the population mean

Summary:

If population variance is unknown, use t-test

Alternatively, if population variance is unknown, sample variance / n can approximate population variance if n >= 30, use Z-test

If population variance is known and n >= 30, then use Z-test

If population variance is known and n < 30, then use t-test

Recent Posts
bottom of page