multitask-learning-transformers Shared attention

Shared attention

Open MehwishFatimah opened this issue 2 years ago • 2 comments

Hi @shahrukhx01,

Thank you so much for sharing a nice repo. How can we combine the attention of all task heads for the shared encoder model and multiple prediction head model? Any lead in this direction will be very helpful.

Thanks

Sep 26 '22 12:09 MehwishFatimah

Hi @MehwishFatimah Thank you for your feedback. This can be done by modifying the multitask_data_collator and adding parallel batches for all tasks. Then in the forward function of the model by doing aggregation losses for all tasks into joint loss i.e., L = loss_task1 + loss_task2 + ... + loss_taskn. Please let me know if you have follow-up questions about this.

Sep 27 '22 08:09 shahrukhx01

Hi @MehwishFatimah Thank you for your feedback. This can be done by modifying the multitask_data_collator and adding parallel batches for all tasks. Then in the forward function of the model by doing aggregation losses for all tasks into joint loss i.e., L = loss_task1 + loss_task2 + ... + loss_taskn. Please let me know if you have follow-up questions about this.

Thank you, @shahrukhx01, for your quick response. We aggregated the loss; however, I think (cross) attention should be concatenated in a Seq2Seq task.

Sep 27 '22 10:09 MehwishFatimah

multitask-learning-transformers multitask-learning-transformers copied to clipboard

Shared attention

multitask-learning-transformers
multitask-learning-transformers copied to clipboard