Kyle Gorman
                                            Kyle Gorman
                                        
                                    Thanks for the doc. A couple thoughts: * I don't love it but I'd probably prefer a run-time if/else check (do I need to use a feature encoder or not...
> That works too. I think that was used before we separated models into `NoFeature` vs. `Feature` duals so thought it would rankle feathers. More specifically the 'aliasing' idea would...
Okay, thanks for that Travis. that's what I needed. Everything looks good, but two suggestions: * Just use the same embedding size for everything. The additional complexity is not yet...
Now that #72 is in what else of this bug remains @bonham79?
> Only thing to fix would be to extend `feature_encoder` to all model types. So like: `transformer` can now have a `feature_encoder` flag like `pointer_generator` and `transducer`. But this seems...
Our way of doing that (e.g., in Zhang et al. 2019 and earlier papers) was way more constrained than generalized sequence-to-sequence learning, so I think we’d have to basically have...
Yeah short answer: write it in C++, thinking about cache locality, and wrap and expose to Python.
[Yoyodyne test strategy.pdf](https://github.com/CUNY-CL/yoyodyne/files/10326390/Yoyodyne.test.strategy.pdf) The above describes my current thinking about the test strategy.
What specifically in that doc do you want to follow? My thoughts: * I never want to truncate the source string. * I never want to truncate the target string...
Yeah, I think we just want a flag that opts us into `padding='max_length'` (subject to `--max_source_length` and/or `--max_target_length` constraints) rather than the default `padding='longest'`. Truncation makes more sense in BERT-ish...