OpenConvert
OpenConvert copied to clipboard
Text conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)
OpenConvert
The OpenConvert tools output TEI from a number of input formats.
Using the command line
The OpenConvert distribution can be accessed at https://github.com/INL/OpenConvert.
The command line can be used as follows:
java -jar OpenConvert.jar -from <input_format> -to <output_format> <input> <output>
Options:
-
-frominput format: text, TEI, alto, doc, docx, HTML -
-tooutput format: TEI, text or folia
Arguments:
-
inputfilename, directory name or zip archive name (ending with .zip) -
outputfilename, directory name or zip archive name (ending with .zip)
If the from and to flags are omitted, the conversion to be applied will be guessed from file name extensions.
NOTE: the default setting for server is a bit unfortunately set to an INT-internal address. You should be able to run your own test run with the following command line:
java -jar openconvert.client.jar -f text -t tei -a chn-tagger -s https://openconvert.ivdnt.org/openconvert/file test.txt