Elijah Rippeth
Elijah Rippeth
As a quick note: if I replace `import re` with `import regex as re`, the timeit microbenchmark is `1.62 s ± 117 ms per loop (mean ± std. dev. of...
It seems like one way of doing this will require is implementing [this class](https://github.com/google/sentencepiece/blob/bc53923a9147dc8ffa54034c8ed774de78cc4d39/python/src/sentencepiece/sentencepiece.i#L101) and making the analog to [this function](https://github.com/pytorch/text/blob/e3799a6eecef451f6e66c9c20b6432c5f078697f/torchtext/csrc/sentencepiece.cpp#L54-L56) accept a `PyObject *iter` (which requires inclusion of ``...
`Multi30k` is (and, indeed, all torchtext datasets are) iterable-style and therefore does not implement `__getitem__`. You can convert it to a map-style dataset (which implements `__getitem__`) by using `torchtext.data.functional.to_map_style_dataset`: ```python...
Batch size can certainly be element dependent in NLP cases where you may want to form batches based on the length of examples (like max-token post-pad batching). Some datasets in...
Where can these be found? I'd be happy to take a stab at this.
Just started this. Tracking here. - [ ] pytorch_struct - [ ] aiayn
I think the pytorch_struct refactor is ready for review: https://github.com/pytorch/benchmark/pull/673 cc @mthrok @abhinavarora
I'd definitely be interested in contributing!
As an aside, I've also got a [max token batch sampler](https://gist.github.com/erip/81d2816f71ba2e95668095e5a1e1040e) which is a bit different, but may be of interest. Not sure if it makes sense to include it...
@nateanl thanks for the pointer! I think it looks pretty good, but a couple of questions: - it looks like minibatch shuffling happens unconditionally. Does it make sense to add...