bert_document_classification backpropgation on chunks?

backpropgation on chunks?

Open vr25 opened this issue 3 years ago • 2 comments

Hi,

When the document chunks are fed to the data parallel model, how is the loss backpropagated? Is it for every chunk?

Also, do you unfreeze and fine-tune for the classification task?

Thank you!

Oct 02 '20 15:10 vr25

Yes, separate for every chunk.
In our datasets we found it sufficient to fine-tune only the final transformer layer.

On Fri, Oct 2, 2020, 11:12 AM Vipula Rawte [email protected] wrote:

Hi,

When the document chunks are fed to the data parallel model, how is the loss backpropagated? Is it for every chunk?

Also, do you unfreeze and fine-tune for the classification task?

Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AndriyMulyar/bert_document_classification/issues/14, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJ4TBT57F53RQMV4ACPIJTSIXUWRANCNFSM4SB2AJKA .

Oct 05 '20 01:10 AndriyMulyar

More explanation on how loss is calculated for every chunk separately? I mean the entire document has a target label and so AFAIU, the loss would be calculated for this target, right? Please let me know if I am missing something.

Also, what is the maximum number of chunks in the entire dataset?

The default config has bert_batch_size=7 but I have some documents with a total number of chunks=125 per document. In such cases, if I set bert_batch_size to 125, I run into CUDA OOM error.

Any suggestions for this?

Thanks!

Oct 05 '20 02:10 vr25

bert_document_classification bert_document_classification copied to clipboard

backpropgation on chunks?

bert_document_classification
bert_document_classification copied to clipboard