dianna
dianna copied to clipboard
torchtext end-of-life and broken
As noted in #827 and other recent PRs with breaking CI workflows, the torchtext package seems to be breaking down. It is no longer being developed (see https://github.com/pytorch/text/issues/2250).
Some options:
- Find a workaround ourselves.
- Look for a fork that is still maintained and switch to that.
- Replace torchtext as a dependency.
Option 3 seems the most attractive to me, naively, but I haven't looked deeply into how unique the functionality is that we use. We only use torchtext in two ways:
-
from torchtext.data import get_tokenizer
inutils/tokenizer.py
-
from torchtext.vocab import Vectors
intest/utils.py
, in a couple of notebooks and in the dashboard.
Can these easily be replaced? If not, a fourth option presents itself:
- Cannibalize torchtext for these parts only.