Kyle Gorman
Kyle Gorman
I am working on a simple wrapper class which loads the model and yields predictions, but the interface in `predict.py` is somewhat unfriendly for this...it has nice abstractions but they're...
This is a notice of my plans to move the encoding methods (which take strings and make tensors) and decoding methods (which convert tensors back into strings) into the Index...
[This continues the discussion in #12.] Both the transducer and the pointer-generator treat features in architecture-specific ways; issue #12 deals with their ideal treatment in the transducer, since the Makarov/Clematide...
After #71, we now can control, for a given training batch, whether teacher or student forcing is used. [Some recent work](https://aclanthology.org/2020.coling-main.255/) suggests that for sequence-to-sequence models there is an advantage...
As a postlude to #72, I propose we make it possible to use [ByT5](https://aclanthology.org/2022.tacl-1.17/) as the source (e.g., `--source_encoder_arch byt5_base`) and/or feature encoder. ByT5 is a byte-based pretrained transformer; in...
This is a draft. Important simplifications while I get the basics right: * I assume that we are always padding to the max (for source and target); I can make...
I have a need (discussed off-thread) to make it so that every batch is the same size. This seems to me to require the following: 1. datasets know the length...
[copied from CUNY-CL/abstractness/issues/123] There are lot of pure Python loops in the transducer implementation and many can be replaced with PyTorch functions.
Testing
[copied from CUNY-CL/abstractness/issues/87] We should add integration tests (I hesitate to call these unit tests), simply limiting ourselves to the model sizes and data quantities we can run on CircleCI's...
We should add a benchmarking suite. I have reserved a separate repo, CUNY-CL/yoyodyne-benchmarks for this. Here are a list of shared tasks (and related papers) from which we can pull...