yoyodyne icon indicating copy to clipboard operation
yoyodyne copied to clipboard

adding fixes so transducer can work again

Open bonham79 opened this issue 5 months ago • 1 comments

Merges: https://github.com/CUNY-CL/yoyodyne/pull/233 https://github.com/CUNY-CL/yoyodyne/pull/197

Fixes: https://github.com/CUNY-CL/yoyodyne/issues/192 https://github.com/CUNY-CL/yoyodyne/issues/191

Dependent on: https://github.com/CUNY-CL/maxwell/pull/17

Summary: I fixed maxwell so TQDM isn't a property of the SED parameters anymore, this allows pickling of the expert module again and thus allows multigpu training across the transducer.

In progress I changed how the expert module initializes so that it just copies the index vocabulary from the dataloader that's passed to it. So now you just need to pass an index to the action vocabulary and everything is managed in the backend. This allows free initialization of expert modules from checkpoints and thus skips epochs of em when resuming from a checkpointing. (I just do the same thing we do with indexes in which you write the sed parameters to the experiment directory and load it when initializing the model.)

I also added new flags so that you can just skip em training for the expert. This is for an upcoming change in which the transducer is no longer dependent on having an oracle function. (I've found through training that the SED actually doesn't add that much to training.) It also allows the creation of dummy experts that just hold the action vocabulary. Added error checks to prevent unsafe use. Feel free to point out more.

I also moved adams changes from https://github.com/CUNY-CL/yoyodyne/pull/233 into the trainer so that there's no weird attribute managing going on in the init anymore. ( Turns out that checkpointing pickles that kwargs dict, so simply adding the action vocabulary was creating too large an embedding space.)

I ran experiments over the Polish data and was able to write to predictions fairly easily. Only major issue is that we're wasting some parameters on creating target vocabulary embeddings that will never be used. But that's a low bar on the efficiency stack.

bonham79 avatar Sep 15 '24 06:09 bonham79