Kyle Gorman
Kyle Gorman
I do acknowledge the distinction between changing LR mid-epoch. I just don't see any reason to suppose it matters yet, which is why I am slightly nonchalant about this change....
I took this in a totally different direction: I now tag the supported schedulers with the config data they need so we can support step-based stuff. (I still maintain that...
gentle ping on this @Adamits, I redesigned as described in previous comment.
> This might be more appropriate for an email but while thinking about this, I was wondering if we could simplify the metric abstraction by porting the evaluator code into...
I thought it existed but I don't think it ever did. I think you concatenate the source and features encodings, yes.
Hi @Adamits, I was able to do ths with vanilla RNNs but I'm a bit at a loss how to make this work for (vanilla) transformers and for pointer-generator transformers....
I have an alternative proposal to consider. Right now we have two categories: * "optional": {vanilla RNNs, transvanilla transformers later?, pointer-generator transformers later?}: concatenation is the default, a separate encoder...
+1 I was wondering about this too, do our characteristic problems have enough data for learned positional embeddings?
Suggested interface: `--transformer_positional_embedding {nope,rope,sinusoidal}` (etc.). I would be very happy to just have RoPE as a second option.
Correct me if I'm wrong, but we already have a max sequence length built into our transformers anyways, I think. If you can lift this restriction, sure, but otherwise I'd...