maratishe.github.io Deep Cleaning after H2O Deep Learning
Author: maratishe@gmail.com -- created 150618
So, I needed to run some DeepLearning on my machine. As usual, the first option I looked into was an R package. And, yep, there is H2O
Now, you can find many examples of how to run H2O from R. The trick is that H2O is a platform and uses server-client model as basis. You can run one server and many clients - the case of distributed processing, or you can run both the server and client from the same code. I went for the latter type.
I needed to run several hundreds of training/testing sessions. As usual, I generated Rscript files from PHP and than ran them in commandline using system() in PHP. I quickly noticed that my handdisk was getting really really full , very very rapidly. I lost like 20Gbytes within an hour. I had to abort my software and look for a place with the guarbage.
Meet the scanner. A small a free utility for Windows whcih runs very quickly and shows you the large chunks of disk usage. I first noticed that my cygwin folder was bloated, Having looked inside, I quickly realized that a single folder had more stuff than the rest of cygwin. That was cygwin/tmp , naturally, where I later learned there were huge temporary folders created by H2O. You can tell them apart because they have h2o in them.
Interesting that it took rm -Rf a minute and a half to run, but in the end I had by 20+GBytes of harddisk space back. Take a note to organize some deep cleaning after your deep learning*.
Written with my own local WYSIWYG editor.