Use with plain text input
Hi Guys,
Not sure this is the right place for this...
I would like to try out your library - looks great! Only issue is that I have a plain text (sometimes html) dataset. Is there an input workflow for data other than .pdf?
Thanks in advance for your advice!
Plain text and HTML should clearly work. It would just need a simple input module to open and parse the file. For HTML it could remove the tags and keep just the text.
For HTML you will be better off using something like CSS or Xpath to match your data. Since you have the HTML structure to use. This project is for unstructured input.
Of course HTML could be unstructured. E.g. a plain text file or a very badly formatted HTML. In that case we could add a plain text input module that just passes the input on. But I don't see much demand for this.
Implemented in the f59d6838db62dd6c322f99cf274e7f5687adb00d ("input: add support for plain text")