yoyodyne
yoyodyne copied to clipboard
Wrap LSTM Outputs in own module class
Current LSTM outputs create ugly typing as mentioned in https://github.com/CUNY-CL/yoyodyne/pull/195. Replace with a dataclass wrapper to make eyes not bleed.
Could you link to an example in the codebase? Dataaclasses are great but I want to apply at least some skepticism to any additions that make things feel further from just pytorch (tensors go in, tensors come out).
Could you link to an example in the codebase?
A good example are the two classes here.
Is this an example of the solution, not the problem?
Okay the code in batches.py is an example of the solution. For an example of where this would be a problem of sorts, we have return types like Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]] in the LSTM encoder here, then in that same file we have a bunch of instances of Tuple[torch.Tensor, torch.Tensor] as return types for the LSTM.
@bonham79 can jump in if he has something different in mind though, since this was his proposal.
Nah @kylebgorman is correct around my thoughts. Understand @Adamits but I think type theory prefers new types for awkward typing.
Actually think it may be a good style idea to convert all model outputs into some general dataclass that contains tensor outputs. As long as there are some properties that can be assumed (log_probs, loss, output) then it would be more user friendly and modular.
Understand @Adamits but I think type theory prefers new types for awkward typing.
Sure but I think this argument assumes we have a yoyodyne-specific type system? Maybe we do, but if we want to commit to that and start adding types in this way, then I feel like we need to think about docs for the types, a general philosophy about typing so our custom types are intuitive to use, maybe more consistent naming conventions, etc.
Thinking about e.g. HuggingFace where I use an HF function and get back 5 layers of abstraction over a Dict or a Tensor and it takes me 2.5 hours to figure out how to actually use it (excuse the hyperbole :D). Essentially, my opinion is that adding types should favor usability over any specific developer convention for cleaning up methods.
Actually think it may be a good style idea to convert all model outputs into some general dataclass that contains tensor outputs. As long as there are some properties that can be assumed (log_probs, loss, output) then it would be more user friendly and modular.
Yeah maybe this is a solution to my concern.
A lightweight alternative is to just declare a particular type and not a class. That'd be better than nothing.
I agree that HF went too far.
A lightweight alternative is to just declare a particular type and not a class. That'd be better than nothing.
I am not sure I follow. Is there a convention in python for defining types that are not classes? Does this refer JUST to the typing annotation stuff?
Actually think it may be a good style idea to convert all model outputs into some general dataclass that contains tensor outputs. As long as there are some properties that can be assumed (log_probs, loss, output) then it would be more user friendly and modular.
Coming back to this, I guess we have what Kyle pointed to: PaddedTensor and PaddedBatch, which put a tensor and its mask together; and put source, features, target together, respectively. We also have a ModuleOutput that we can hang whatever we need from a particular module on. Do we need anything beyond these? I guess you are distinguishing something like ModelOutput and ModuleOutput?
I am not sure I follow. Is there a convention in python for defining types that are not classes? Does this refer JUST to the typing annotation stuff?
Yes and, I think yes. Like: LstmReturnValue = Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]] would work.
Coming back to this, I guess we have what Kyle pointed to:
PaddedTensorandPaddedBatch, which put a tensor and its mask together; and put source, features, target together, respectively.
PaddedTensor greatly reduced redundancy and inscrutable unpacking, IMO, so I think it's a big win. If the proposed class has even half as much DRYing effect it'd be a good idea.
we have return types like Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]] in the LSTM encoder here,
Just a note, that typing hint is just out-of-date. That Module returns ModuleOutput.
PaddedTensorgreatly reduced redundancy and inscrutable unpacking, IMO, so I think it's a big win. If the proposed class has even half as much DRYing effect it'd be a good idea.
Sure. So basically the top-level things in my mind are that a PaddedBatch goes in---if I am a developer and want to write a new model that needs more than src, feats, tgt, I can hang those things on the PaddedBatch basically without consequence.
Then, if we add a new type, a ModelOutput comes out. Here we get a loss, log likelihoods, and maybe a decoded output.
We also have two more internal types: ModuleOutput, where intermediate representations that get built up by the internal torch blocks go---output tensors, hidden states, embeddings, etc---which we have so we can abstract away from the particular Encoder. Then we have EvalItem, which you need to deal with if you want to add a new evaluation or modify the validation code.
This all seems pretty reasonable. Actually ModuleOutput probably feels the most awkward/unintuitive to me but Idk a better way to get the abstraction we want over encoders.
Another proposal is that we could just hang everything we need to pass around for a single step on PaddedBatch (i.e. it accumulates a loss, log_likelihoods, etc), but I think this is more likely to have unintended consequences than just adding another type, and anyway hanging the current ModuleOutput values on PaddedBatch feels kinda weird.
we have return types like Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]] in the LSTM encoder here,
Just a note, that typing hint is just out-of-date. That Module returns
ModuleOutput.
You're right, just fixed that for LSTM modules in #205.
This was completed in #205.