BERT4doc-Classification icon indicating copy to clipboard operation
BERT4doc-Classification copied to clipboard

Dealing with multiple sentences

Open LivC193 opened this issue 5 years ago • 2 comments

Hi sorry to bother you, but I have one question.

Documents have multiple sentences so how do you deal with that ? Do you split the text into sentences and the concatenate the final embeddings for each sentence or do you remove all punctuation marks so the text won't have any [SEP] tokens.

LivC193 avatar Feb 27 '20 23:02 LivC193

thank you for your issue for document classification, we do not split the text into sentences (except the Hierarchical methods) we do not remove punctuation masks. for the whole document, we regard it as a long sentence.

xuyige avatar Mar 10 '20 14:03 xuyige

thank you for your issue for document classification, we do not split the text into sentences (except the Hierarchical methods) we do not remove punctuation masks. for the whole document, we regard it as a long sentence.

hi, could you tell me how to code with different numbers of sentences in the hierachical methods? (variant length of inputs)

AnastasiaMaugham avatar Dec 01 '20 08:12 AnastasiaMaugham