Disabling multithreading for numpy, scikit-learn, etc. in conda
I was running RidgeCV
from scikit-learn
and realized that it was devouring all my cores. Moreover, I wrapped everything into a joblib
parallel loop, so my poor server was hanging there, starving for more power.
It turns out that anaconda implements an optimized version of the Math Kernel Library (MKL), using multithreading for most vectorized operations. This is usually great, but if you use singularity in a shared HPC environment and you’re submitting jobs to a queue, then you better curb your processes.
The solution comes from StackOverflow: you can either set an environment variable (MKL_NUM_THREADS
), or add this to your code:
import mkl
mkl.set_num_threads(2)
For once, sysadmins won’t send me angry emails about my running jobs.