Parallelization of the USHCN dataset download and initial conversion

Open giacomoguiduzzi opened this issue 2 years ago • 0 comments

I was trying to test the model and I noticed that it was taking me around 2 hours to download and parse each file singularly. I decided to try parallelizing the process as it looked like each archive was downloaded and then parsed independently, so I sped up the process successfully with less than a hundred lines of code (or at least this is what I observed from my experiments). I took the occasion to reformat the code using PyCharm to improve readability. I also wanted to propose the inclusion of other smaller packages in the requirements.txt file as I was missing some packages when I created a conda environment following the guide in your README. PyCharm helped me with that. I'm at your disposal for any questions or anything else. Let me know what you think.

Best Regards, Giacomo Guiduzzi

Jun 01 '23 16:06 giacomoguiduzzi