dianna icon indicating copy to clipboard operation
dianna copied to clipboard

torchtext end-of-life and broken

Open egpbos opened this issue 6 months ago • 3 comments

As noted in #827 and other recent PRs with breaking CI workflows, the torchtext package seems to be breaking down. It is no longer being developed (see https://github.com/pytorch/text/issues/2250).

Some options:

  1. Find a workaround ourselves.
  2. Look for a fork that is still maintained and switch to that.
  3. Replace torchtext as a dependency.

Option 3 seems the most attractive to me, naively, but I haven't looked deeply into how unique the functionality is that we use. We only use torchtext in two ways:

  • from torchtext.data import get_tokenizer in utils/tokenizer.py
  • from torchtext.vocab import Vectors in test/utils.py, in a couple of notebooks and in the dashboard.

Can these easily be replaced? If not, a fourth option presents itself:

  1. Cannibalize torchtext for these parts only.

egpbos avatar Jul 29 '24 14:07 egpbos