yoyodyne Wrap LSTM Outputs in own module class

Current LSTM outputs create ugly typing as mentioned in https://github.com/CUNY-CL/yoyodyne/pull/195. Replace with a dataclass wrapper to make eyes not bleed.

Jun 11 '24 16:06 bonham79

Could you link to an example in the codebase? Dataaclasses are great but I want to apply at least some skepticism to any additions that make things feel further from just pytorch (tensors go in, tensors come out).

Jun 13 '24 03:06 Adamits

Could you link to an example in the codebase?

A good example are the two classes here.

Jun 13 '24 05:06 kylebgorman

Could you link to an example in the codebase?

A good example are the two classes here.

Is this an example of the solution, not the problem?

Jul 01 '24 19:07 Adamits

Okay the code in batches.py is an example of the solution. For an example of where this would be a problem of sorts, we have return types like Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]] in the LSTM encoder here, then in that same file we have a bunch of instances of Tuple[torch.Tensor, torch.Tensor] as return types for the LSTM.

@bonham79 can jump in if he has something different in mind though, since this was his proposal.

Jul 01 '24 22:07 kylebgorman

Nah @kylebgorman is correct around my thoughts. Understand @Adamits but I think type theory prefers new types for awkward typing.

Actually think it may be a good style idea to convert all model outputs into some general dataclass that contains tensor outputs. As long as there are some properties that can be assumed (log_probs, loss, output) then it would be more user friendly and modular.

Jul 03 '24 16:07 tbartley94

Understand @Adamits but I think type theory prefers new types for awkward typing.

Sure but I think this argument assumes we have a yoyodyne-specific type system? Maybe we do, but if we want to commit to that and start adding types in this way, then I feel like we need to think about docs for the types, a general philosophy about typing so our custom types are intuitive to use, maybe more consistent naming conventions, etc.

Thinking about e.g. HuggingFace where I use an HF function and get back 5 layers of abstraction over a Dict or a Tensor and it takes me 2.5 hours to figure out how to actually use it (excuse the hyperbole :D). Essentially, my opinion is that adding types should favor usability over any specific developer convention for cleaning up methods.

Actually think it may be a good style idea to convert all model outputs into some general dataclass that contains tensor outputs. As long as there are some properties that can be assumed (log_probs, loss, output) then it would be more user friendly and modular.

Yeah maybe this is a solution to my concern.

Jul 03 '24 20:07 Adamits

A lightweight alternative is to just declare a particular type and not a class. That'd be better than nothing.

I agree that HF went too far.

Jul 03 '24 20:07 kylebgorman

A lightweight alternative is to just declare a particular type and not a class. That'd be better than nothing.

I am not sure I follow. Is there a convention in python for defining types that are not classes? Does this refer JUST to the typing annotation stuff?

Actually think it may be a good style idea to convert all model outputs into some general dataclass that contains tensor outputs. As long as there are some properties that can be assumed (log_probs, loss, output) then it would be more user friendly and modular.

Coming back to this, I guess we have what Kyle pointed to: PaddedTensor and PaddedBatch, which put a tensor and its mask together; and put source, features, target together, respectively. We also have a ModuleOutput that we can hang whatever we need from a particular module on. Do we need anything beyond these? I guess you are distinguishing something like ModelOutput and ModuleOutput?

Jul 04 '24 21:07 Adamits

I am not sure I follow. Is there a convention in python for defining types that are not classes? Does this refer JUST to the typing annotation stuff?

Yes and, I think yes. Like: LstmReturnValue = Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]] would work.

Coming back to this, I guess we have what Kyle pointed to: PaddedTensor and PaddedBatch, which put a tensor and its mask together; and put source, features, target together, respectively.

PaddedTensor greatly reduced redundancy and inscrutable unpacking, IMO, so I think it's a big win. If the proposed class has even half as much DRYing effect it'd be a good idea.

Jul 05 '24 00:07 kylebgorman

we have return types like Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]] in the LSTM encoder here,

Just a note, that typing hint is just out-of-date. That Module returns ModuleOutput.

PaddedTensor greatly reduced redundancy and inscrutable unpacking, IMO, so I think it's a big win. If the proposed class has even half as much DRYing effect it'd be a good idea.

Sure. So basically the top-level things in my mind are that a PaddedBatch goes in---if I am a developer and want to write a new model that needs more than src, feats, tgt, I can hang those things on the PaddedBatch basically without consequence.

Then, if we add a new type, a ModelOutput comes out. Here we get a loss, log likelihoods, and maybe a decoded output.

We also have two more internal types: ModuleOutput, where intermediate representations that get built up by the internal torch blocks go---output tensors, hidden states, embeddings, etc---which we have so we can abstract away from the particular Encoder. Then we have EvalItem, which you need to deal with if you want to add a new evaluation or modify the validation code.

This all seems pretty reasonable. Actually ModuleOutput probably feels the most awkward/unintuitive to me but Idk a better way to get the abstraction we want over encoders.

Another proposal is that we could just hang everything we need to pass around for a single step on PaddedBatch (i.e. it accumulates a loss, log_likelihoods, etc), but I think this is more likely to have unintended consequences than just adding another type, and anyway hanging the current ModuleOutput values on PaddedBatch feels kinda weird.

Jul 05 '24 02:07 Adamits

we have return types like Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]] in the LSTM encoder here,

Just a note, that typing hint is just out-of-date. That Module returns ModuleOutput.

You're right, just fixed that for LSTM modules in #205.

Jul 05 '24 13:07 kylebgorman

This was completed in #205.

Aug 27 '24 15:08 kylebgorman

yoyodyne yoyodyne copied to clipboard

Wrap LSTM Outputs in own module class

yoyodyne
yoyodyne copied to clipboard