multitask-learning-transformers
multitask-learning-transformers copied to clipboard
Shared attention
Hi @shahrukhx01,
Thank you so much for sharing a nice repo. How can we combine the attention of all task heads for the shared encoder model and multiple prediction head model? Any lead in this direction will be very helpful.
Thanks
Hi @MehwishFatimah Thank you for your feedback. This can be done by modifying the multitask_data_collator
and adding parallel batches for all tasks. Then in the forward function of the model by doing aggregation losses for all tasks into joint loss i.e., L = loss_task1 + loss_task2 + ... + loss_taskn
. Please let me know if you have follow-up questions about this.
Hi @MehwishFatimah Thank you for your feedback. This can be done by modifying the
multitask_data_collator
and adding parallel batches for all tasks. Then in the forward function of the model by doing aggregation losses for all tasks into joint loss i.e.,L = loss_task1 + loss_task2 + ... + loss_taskn
. Please let me know if you have follow-up questions about this.
Thank you, @shahrukhx01, for your quick response. We aggregated the loss; however, I think (cross) attention should be concatenated in a Seq2Seq task.