Support the TEI format
I recently learn that the TEI XML format is becoming popular in the linguistics community. In this format, texts are saved in small chunks with associated meta information (e.g. speaker), and, sometime, POS tags.
See: https://tei-c.org/ https://tei-c.org/activities/projects/ https://dracor.org/
Great idea. There is a package called https://github.com/michaelgavin/tei2r/tree/master/R, but it looks pretty inactive.
This would be cool. Not in the least because tools like GROBID allow you to parse out things like references and headers/footers etc. and saving it as TEI-xml. [I'm just starting to look into quanteda, so sorry if quanteda can do this natively already]