meshed-memory-transformer
meshed-memory-transformer copied to clipboard
register_state or register_buffer ?
Hi, Thank you for open-sourcing your codes. I really enjoyed reading your paper. I am having a problem when try to understand:
- here. How does register_state work here? Does it have any difference with register_buffer?
- Does the enc_output and mask_enc defined in register_state have something to do with the output here?
Thank you for you time.
How long does it take to train the network?