I was running RidgeCV from scikit-learn and realized that it was devouring all my cores. Moreover, I wrapped everything into a joblib parallel loop, so my poor server was hanging there, starving for more power.

It turns out that anaconda implements an optimized version of the Math Kernel Library (MKL), using multithreading for most vectorized operations. This is usually great, but if you use singularity in a shared HPC environment and you’re submitting jobs to a queue, then you better curb your processes.

The solution comes from StackOverflow: you can either set an environment variable (MKL_NUM_THREADS), or add this to your code:

import mkl