Francis Tyers
Francis Tyers
@reuben so in this case I should collect them in `processBatch`, store them in `StreamingState`? The problem there is that the Metadata is returned in: ```cpp Metadata* ModelState::decode_metadata(const DecoderState& state,...
Ok, I moved it to `StreamingState`. There is something I don't like, which is having to reallocate the memory for the metadata because we can't just update the struct apparently:...
@Bachstelze it looks like the first step might be to annotate a corpus in Universal Dependencies. I'd be interested in working on that, please feel free to contact me if...
Maybe, but it would take you longer and you would end up with a worse end result. It's easier to just annotate from scratch. If there is glossed or tagged...
Some languages, like the upcoming [Chukchi treebank](https://github.com/UniversalDependencies/UD_Chukchi-HSE/) also have enhanced dependencies in the annotation. It would be great to be able to train on those too.
Fixed by installing `python-future` and running with python2 in Debian. But, still the Ts'eltal downloading doesn't work: ``` $ ./corpuscrawler --language tzh --output output-tzh/ Cache-Hit: http://listen.bible.is/robots.txt Cache-Hit: http://listen.bible.is/TZHSBM/Matt/1 $ ```
Can you give the output of `hfst-fst2strings -W` and `hfst-fst2strings -W -X print-space` for `min3.hfst` ?
``` $ hfst-fst2txt min3.hfst 0 1 c d 1.000000 1 2 a o 0.000000 2 3 t g 0.000000 3 0.000000 -- 0 1 c d 2.000000 1 2 a...
You can potentially use `hfst-split` to split them and then `hfst-union` to union them. Having a -j option to `hfst-txt2fst` sounds like an nice idea, you should file a separate...
@TinoDidriksen could you check this out ?