alvations
alvations
@mjpost Good news on chaining the commands for pipelining https://click.palletsprojects.com/en/7.x/commands/#multi-command-pipelines =) Gonna be a fun Tuesday tomorrow, implementing this!!
Here's some updates on a POC on the pipeline, it seems like doing any simplistic stdin pipelining with `click` requires some full storage of the data into some memory first....
Maybe there's some usefulness in loading the whole dataset into memory instead of processing one sentence at time. Empirically it seems to be a few seconds faster on a dataset...
With `pipeline` feature out of the way, coming back to `lowercase`, any ideas/suggestions of what options one would need for `sacremoses lowercase`? I guess with the pipeline global, the lowercase...
There's something better coming up, upper, lower and a surprise. But it'll take a couple of days to free myself up for some more coding and finishing up the feature...
Yes actually this is the same as the Hindi problem at #42 There's a way to resolve this but it requires a little more digging and understanding of Indian languages...
Same outputs from default `mosesdecoder` (Commit: https://github.com/moses-smt/mosesdecoder/commit/05788925812f0d3265e355565cbb1701a0ad7510) : ``` $ echo "Ру́сский язы́к ([ˈruskʲɪi̯ jɪˈzɨk] Информация о файле слушать)[~ 3][⇨] — один из восточнославянских языков, национальный язык русского народа." |...
Thanks @goodmami for the analysis!! Hmmm, seems like considering things like em-space and ideographic space might be problematic because I think it affects other regexes in Moses. Have to take...
Actually this is already fixed in https://github.com/nltk/wordnet , we're still looking at how to integrate the stand-alone library without breaking existing usage in the main nltk library.
```python from pygpt4all.models.gpt4all import GPT4All model = GPT4All('ggml-gpt4all-l13b-snoozy.bin', n_ctx=2048) model.generate("Hello world", n_predict=500, new_text_callback=lambda x: print(x, end="")) ```