Category: Big Data/Machine Learning
-
UNNEST in BigQuery
BigQuery is a scalable and highly efficient data warehouse solution. As we move more and more existing databases into BigQuery with existing structure, some data types are little difficult to query and manipulate, such as “struct” data type. For example, we may get a data row that may look like this. In order to query […]
-
Numpy and dependencies hell.
Data mining or machine learning in general has greatly benefited from the development of Numpy library. Many years back when I was moving my tech stack from R to Python, numpy and pandas packages helped me master data mining in Python. Almost every data mining and machine learning package rely heavily on these two packages. […]
-
matplotlib and cycler dependency.
Anaconda is growing in popularity for managing python packages and versions across different environments. Anaconda version (from different packagers) comes with some prepackaged libraries. Sometimes, they libraries may have install order dependencies. One such dependency I have noticed is in matplotlib and cycler packages. I recently installed an anaconda package. Every time I tried to […]
-
Scala and Spark compatibility issues
One of the major issue I have noticed working around with Spark and Scala is dependency between two packages. Scala developers are upgrading Scala little faster than folks at Apache Spark can absorb. At the time of writing, Scala 2.12.8 is already out, but Spark’s latest release 2.4.0 is still stuck at Scala 2.11.X. I […]