concerning the destination path for `download_corpora`
Unsure if it was the tool I wanted for my task at hand, I installed TextBlob in a virtualenv, tried to call .sentences on a TextBlob, and it failed with a message saying that I needed to run python -m textblob.download_corpora to download additional data needed for this feature. I did so, and was somewhat surprised and disappointed to find that it created a non-dotfile directory in my $HOME—I would have expected and wanted it to stay within the virtualenv. It looks like we're calling a download function from nltk, which does seem to have a download_dir kwarg, so downloading the corpora to a less-intrusive place by default (or making the destination directory prominently configurable) seems like a plausibly feasible user-experience enhancement.
(Sorry, I feel guilty about filing an issue without a patch, but ...)
The documented method is to set the NLTK_DATA environment variable, though this suggests that the setting is not honoured by the NLTK downloader anyway, though it's used to find the corpus at runtime if you managed to get the data there another way.
I'm in favor of:
- exposing the
download_dirkwarg, but leave the defaults for consistency with NLTK. - Improving the docs to clarify how to download to a custom dir (
download_dirkwarg, and to then use it (set theNLTK_DATAenvironment variable).
It's not pretty. Open to suggestions on a better approach.