Anaconda is growing in popularity for managing python packages and versions across different environments. Anaconda version (from different packagers) comes with some prepackaged libraries.
Sometimes, they libraries may have install order dependencies. One such dependency I have noticed is in matplotlib and cycler packages.
I recently installed an anaconda package. Every time I tried to import matplotlib, I would receive an error “ImportError: No module named ‘cycler’ “. I checked the versions of packages installed to look for any dependency based issues. But both had compatible versions. To remediate this, I reinstalled matplotlib package and the issue was still there. Finally I reinstalled cycler and followed by fresh installation matplotlib (with compatible kiwisolver) to fix the issue.
One of the major issue I have noticed working around with Spark and Scala is dependency between two packages.
Scala developers are upgrading Scala little faster than folks at Apache Spark can absorb. At the time of writing, Scala 2.12.8 is already out, but Spark’s latest release 2.4.0 is still stuck at Scala 2.11.X. I understand that Spark team is working on upgrading the Spark to framework to be compatible with Scala 2.12. But I think by the time they will release that upgrade, Scala may move to higher version.
So, the lesson for developers is to first get the Spark and then get the compatible version of Scala. Otherwise, you will get errors when trying to run your Scala programs with non compatible Spark (NoSuchMethodError).
Unrelated to the compatibility issue, one other very common issue I noticed when trying to run a spark program on a VM is with failure to bind on ports. It can be fixed by setting SPARK_LOCAL_IP environment variable. If you are running the Spark locally then set it to 127.0.0.1