Z test using Python
Z -Test using Python package statsmodels
Conditions:
Used when the sample distribution follows a normal distribution. To measure if normal, use skewness and kurtosis statistics.
When the population variance is known. However, for most cases population variance is not known. However, when the sample size (n) is large enough (>= 30), we can substitute the sample variance but divide by n.
Test of normality:
We learn from basic statistics that bar plot, skewness and kurtosis can be used to test normality of a given data set. If skewness is between -0.5 and 0.5, the distribution is approximately symmetric. If the kurtosis is near the value of 3, then the sample data distribution is near normal.
Python has a package to test for normality using scipy:
scipy.stats.normaltest(x) where x is the sample data
Result: a tuple (m, p-value)
where m = s^2 + k^2, where s is the z-score returned by skewtest and k is the z-score returned by kurtosistest.
If p <= 0.05 then not normal distribution, otherwise normal distribution
Once sample data is found to be near normal distribution, ztest can be computed using statsmodel in Python.
statsmodels.stats.weightstats.ztest(x1, value= <population Mean>, alternative='two-sided')
where x1 is sample distribution, value is population mean
and alternative can also be ‘larger (default) or ‘smaller’
Result: a tuple (t statistic, p-value) if p-value <= 0.05 then sample distribution is not equal to the population mean
Summary:
If population variance is unknown, use t-test
Alternatively, if population variance is unknown, sample variance / n can approximate population variance if n >= 30, use Z-test
If population variance is known and n >= 30, then use Z-test
If population variance is known and n < 30, then use t-test