Posts

How we processed data of over 100gb with 16gb of ram

What I learnt using dask If I told you that it is possible to process data as large as 100gb in your personal computer with a ram of 16gb would you believe that? I didn't too but I'll tell you how in this write up. It makes sense to represent data in a dataframe of rows. You can mainupulate this data in various forms like getting the mean, grouping by a certain column in order to transform or obtain some statistical information on the sub groups. Softwares like mysql and Matlab work this way. However in recent times, pandas has become very popular in the python community one of the reasons is because of the open source community that maintains the code base another reason is because of how optimized it is for vectorized operations( which is encouraging given python is slow compared to c) and how it has readily implemented tools for data anlysis and exploration. Despite all these advantages, it comes with a pitfall. You have to load all your data into memory. This is ...