alvations comments

Results 155 comments of


                                            alvations

trafficstars

Add lowercase script?

@mjpost Good news on chaining the commands for pipelining https://click.palletsprojects.com/en/7.x/commands/#multi-command-pipelines =) Gonna be a fun Tuesday tomorrow, implementing this!!

Add lowercase script?

Here's some updates on a POC on the pipeline, it seems like doing any simplistic stdin pipelining with `click` requires some full storage of the data into some memory first....

Add lowercase script?

Maybe there's some usefulness in loading the whole dataset into memory instead of processing one sentence at time. Empirically it seems to be a few seconds faster on a dataset...

Add lowercase script?

With `pipeline` feature out of the way, coming back to `lowercase`, any ideas/suggestions of what options one would need for `sacremoses lowercase`? I guess with the pipeline global, the lowercase...

Add lowercase script?

There's something better coming up, upper, lower and a surprise. But it'll take a couple of days to free myself up for some more coding and finishing up the feature...

Weird results for Tamil and Russian tokenization

Yes actually this is the same as the Hindi problem at #42 There's a way to resolve this but it requires a little more digging and understanding of Indian languages...

Weird results for Tamil and Russian tokenization

Same outputs from default `mosesdecoder` (Commit: https://github.com/moses-smt/mosesdecoder/commit/05788925812f0d3265e355565cbb1701a0ad7510) : ``` $ echo "Ру́сский язы́к ([ˈruskʲɪi̯ jɪˈzɨk] Информация о файле слушать)[~ 3][⇨] — один из восточнославянских языков, национальный язык русского народа." |...

alvations

Add lowercase script?

Add lowercase script?

Add lowercase script?

Add lowercase script?

Add lowercase script?

Weird results for Tamil and Russian tokenization

Weird results for Tamil and Russian tokenization

Space deduplication

Max depth of the all wordnet POS should be returned and kept as static

GPT4All and create_csv_agent : llama_generate: error: prompt is too long (680 tokens, max 508)