Numpy and dependencies hell.

Data mining or machine learning in general has greatly benefited from the development of Numpy library. Many years back when I was moving my tech stack from R to Python, numpy and pandas packages helped me master data mining in Python. Almost every data mining and machine learning package rely heavily on these two packages. Numpy itself is going through a frequent upgrade and release cycle.

Such a heavy reliance on these packages and frequent upgrade of numpy, has lead to a dependency hell. If you want to use a version of some ABC package, you need a specific version of numpy, but the version you are using of XYZ package rely on some other version of numpy, whereas you have some newer version of numpy that you got with latest version of python. Python itself is going through a much faster release cycle.

One of issue I faced recently is related to pickle and numpy dependency. I picked an object from one machine to use on another machine. Unfortunately the numpy version is slightly different between these two machines and I started running into below error.

ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'

After some research I found that it is related to different versions of numpy as also stated here:  https://github.com/numpy/numpy/issues/11871

One of the solution I am using these days is to not pickle, but instead save the object in a text file and read it using “ast” library. Though it may be slow, but it is more reliable.

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s