OpenConvert icon indicating copy to clipboard operation
OpenConvert copied to clipboard

Can OpenConvert convert plain text file to TEI?

Open vanabel opened this issue 10 years ago • 5 comments

I just clone it to a dir, and run the command java -jar OpenConvert.jar -from text -to TEI test/test.txt test where test.txt with a single sentence: Just a test But it output errors: Could not find conversion from text to TEI Did I do something wrong?

vanabel avatar Oct 22 '15 15:10 vanabel

Or can you get some example of text files, which will convert to TEI properly.

vanabel avatar Oct 23 '15 09:10 vanabel

You can do the conversion to TEI online here: http://openconvert.clarin.inl.nl/openconvert/tagger/ui#file

(you need a CLARIN account, which you should be able to get here: https://user.clarin.eu/user/register)

I didn't develop this code, so I'm not sure about the commandline tool, sorry.

jan-niestadt avatar Oct 23 '15 10:10 jan-niestadt

@jan-niestadt Thanks, so If I want to build my-self corpus, How can I combine multi TEI into one? I mean, in practice, I would like to add one sentence containing a key word in plain text format each time (which can be converted to TEI by the tools as you mentioned above), then upload the TEI to my Black Lab-server such that it can be queried by the user. It will be useful for scientific writing, since then I can query by key word.

vanabel avatar Oct 23 '15 10:10 vanabel

Hello all, sorry to catch up only today

  • The right command line for conversion from txt to TEI is (txt not text) java -jar OpenConvert.jar -from txt -to TEI test/test.txt test/test.tei
  • For use with blacklab, (only available in the online version), it is best to enable the tokenizer in OpenConvertWeb
  • To combine TEI files, there is no special tool. The element (teiCorpus http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teiCorpus.html) may contain an arbitrary number of TEI elements containing documents. It also requires a corpus header, but for blacklab indexing, is should be sufficient to start with <teiCorpus>, then cat all the individual files, and then end the teiCorpus element.

JessedeDoes avatar Oct 24 '15 10:10 JessedeDoes

Currently, I grub the data (submit text, and output tei) from the OpenConvert. Since the site may change, I want to have a local version of it, that means, I need a similar function of convert plain text to TEI format. I have noted that you have provided openconvert.client.jar, did it design for this? (In fact, I can't execute it on my server, did it need this openconvert git project?)

vanabel avatar Oct 25 '15 12:10 vanabel