Kyle Gorman
Kyle Gorman
Nested tensors are a prototype feature in PyTorch; if certain conditions are met, they are used in the transformers implementation. I believe what actually happens is that it uses this...
One simple possibility is to merge #225, which just silences the scary warning but doesn't otherwise change current behavior, and then investigate #224 at lower priority as an alternative.
I don't understand this report: what's the consequence of this, behaviorally? Is this a design flaw or a bug? Is it an issue you want to assign to yourself? Is...
It seems to me the bug here is just "hard monotonic attention is broken". We already have a shared embedding space (which is a great feature) and we are doomed...
Putting the relevant traceback here for my debugging: ``` File "/home/user/miniconda3/envs/py310/bin/yoyodyne-train", line 8, in sys.exit(main()) File "/home/user/miniconda3/envs/py310/lib/python3.10/site-packages/yoyodyne/train.py", line 423, in main model = get_model_from_argparse_args(args, datamodule) File "/home/user/miniconda3/envs/py310/lib/python3.10/site-packages/yoyodyne/train.py", line 247, in...
> Is there any docs on this? [`nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) and ["modules as building blocks"](https://pytorch.org/docs/stable/notes/modules.html#modules-as-building-blocks). It's easy to tell when the module tracking is broken because there will be no gradients flowing...
Is this still open?
@Adamits I think this would pose exactly the same issue as the multi-GPU case.
Yeah I also find the transducer is better on CPU. (The same was true with the original code in DyNet.)
Is this made redundant by #247? @bonham79 @Adamits