BERT4doc-Classification
BERT4doc-Classification copied to clipboard
Dealing with multiple sentences
Hi sorry to bother you, but I have one question.
Documents have multiple sentences so how do you deal with that ? Do you split the text into sentences and the concatenate the final embeddings for each sentence or do you remove all punctuation marks so the text won't have any [SEP] tokens.
thank you for your issue for document classification, we do not split the text into sentences (except the Hierarchical methods) we do not remove punctuation masks. for the whole document, we regard it as a long sentence.
thank you for your issue for document classification, we do not split the text into sentences (except the Hierarchical methods) we do not remove punctuation masks. for the whole document, we regard it as a long sentence.
hi, could you tell me how to code with different numbers of sentences in the hierachical methods? (variant length of inputs)