pragmatic_segmenter icon indicating copy to clipboard operation
pragmatic_segmenter copied to clipboard

doc_type

Open djstrong opened this issue 6 years ago • 1 comments

What kind of doc_types are supported? I have tried html, but it is not working.

djstrong avatar Jul 18 '18 14:07 djstrong

Hi @djstrong, thanks for the question. Currently pdf is the only supported doc_type. It is to handle cases where text extracted from a PDF often tends to have a line break in the middle of a sentence. An html doc_type could definitely be something worth supporting. Please feel free to submit a pull request if you are willing or able.

diasks2 avatar Jul 18 '18 22:07 diasks2