One of the major issue I have noticed working around with Spark and Scala is dependency between two packages.
Scala developers are upgrading Scala little faster than folks at Apache Spark can absorb. At the time of writing, Scala 2.12.8 is already out, but Spark’s latest release 2.4.0 is still stuck at Scala 2.11.X. I understand that Spark team is working on upgrading the Spark to framework to be compatible with Scala 2.12. But I think by the time they will release that upgrade, Scala may move to higher version.
So, the lesson for developers is to first get the Spark and then get the compatible version of Scala. Otherwise, you will get errors when trying to run your Scala programs with non compatible Spark (NoSuchMethodError).
Unrelated to the compatibility issue, one other very common issue I noticed when trying to run a spark program on a VM is with failure to bind on ports. It can be fixed by setting SPARK_LOCAL_IP environment variable. If you are running the Spark locally then set it to 127.0.0.1
Categories: Big Data/Machine Learning