Add explicit HuggingFace cache dir
Some of these datasets can be fairly large and I don't like the fact it's hard to figure out where HuggingFace is storing all of it.
I've set it to the default location in the code, but at least it's now explicit.
e.g. I have 2 partitions and HF ends up saving on the one that has only few GBs as opposed to few TBs.
hmm i feel like this change makes assumptions, and overrides possible pre-existing user configurations of cache dirs
Can you maybe give me an example? Like if they were hardcoding an env variable?
Because this is precisely what huggingface does behind the scenes, makes ~./.cache the default location and I haven't changed that, just made it explicit.