maratishe.github.io Deep Cleaning after H2O Deep Learning Author: maratishe@gmail.com -- created 150618

So, I needed to run some DeepLearning on my machine.  As usual, the first option I looked into was an R package. And, yep, there is H2O

Now, you can find many examples of how to run H2O from R. The trick is that H2O is a platform and uses server-client model as basis.  You can run one server and many clients - the case of distributed processing, or you can run both the server and client from the same code.  I went for the latter type.

I needed to run several hundreds of training/testing sessions.  As usual, I generated Rscript files from PHP and than ran them in commandline using system() in PHP.  I quickly noticed that my handdisk was getting really really full , very very rapidly. I lost like 20Gbytes within an hour.  I had to abort my software and look for a place with the guarbage.


Meet the scanner. A small a free utility for Windows whcih runs very quickly and shows you the large chunks of disk usage.  I first noticed that my cygwin folder was bloated,  Having looked inside, I quickly realized that a single folder had more stuff than the rest of cygwin.  That was cygwin/tmp , naturally, where I later learned there were huge temporary folders created by H2O.  You can tell them apart because they have h2o in them.  

Interesting that it took rm -Rf a minute and a half to run, but in the end I had by 20+GBytes of harddisk space back.  Take a note to organize some deep cleaning after your deep learning*.


  


Written with my own local WYSIWYG editor.