Neil Schemenauer's Web Log

[Archives]

April 26, 2017

Normal CDF in Python

I needed a single function provided by scipy, i.e. scipy.stats.lognorm.sf(). I think scipy is a high quality library but having to include such a huge library just for a single function bothers me. So, I did a bit of research as to how to implement it myself. I need an accurate and high performance version and my initial table-based linear interpolation version was not accurate enough.

It turns out that modern versions of Python have the ERF function included. That gets you most of the way there. Computing the normal (Gaussian) CDF is pretty simple.

def norm_cdf(x):
    '''Probability that a statistic is less than x. This equates to the
    area of the distribution below x.
    '''
    return 0.5*(1.0 + math.erf(x/math.sqrt(2)))

What my algorithm actually needs is the log-normal survival function, but getting to that from the normal CDF is also pretty simple. Traditially, the log-normal distribution takes two additional parameters, mu and sigma (i.e. location parameter, scale parameter). The survival function then is as follows.

def lognormal_sf(x, mu, sig):
    return 1 - norm_cdf((math.log(x)-mu)/sig)

Trivial test:

>>> scipy.stats.lognorm.sf(1, 2, 0, math.exp(.5))
0.5987063256829237
>>> lognormal_sf(1, .5, 2)
0.5987063256829237

Thanks to Mark Dickinson for adding math.erf() to Python. Implementing those special functions accurately is non-trivial and the versions in Python look to be high quality.

[comments] [validate]