pragmatic_segmenter
pragmatic_segmenter copied to clipboard
doc_type
What kind of doc_types
are supported? I have tried html
, but it is not working.
Hi @djstrong, thanks for the question. Currently pdf
is the only supported doc_type
. It is to handle cases where text extracted from a PDF often tends to have a line break in the middle of a sentence. An html
doc_type
could definitely be something worth supporting. Please feel free to submit a pull request if you are willing or able.