Richard Ulmer
Richard Ulmer
It's unfortunate that the natrium node is that unreliable and I can give you no recommendation for a better node, but after thinking about it, I don't think this is...
@c0bra There has not yet been a new release of huggingface/transformers since the fix has been merged: https://github.com/huggingface/transformers/releases. I assume we still need to wait for this. The already existing...
When I compare the three tokenizers, they seem to be the same: ``` $ curl -L https://huggingface.co/openlm-research/open_llama_3b/resolve/main/tokenizer.model -o tokenizer.model.3b $ curl -L https://huggingface.co/openlm-research/open_llama_7b/resolve/main/tokenizer.model -o tokenizer.model.7b $ curl -L https://huggingface.co/openlm-research/open_llama_13b_600bt/resolve/main/tokenizer.model -o...
FYI: There are already people finetuning OpenLLaMa to follow instructions using the [databricks-dolly-15k dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k): https://github.com/yxuansu/OpenAlpaca
Duplicate of #46 . I would suggest taking a look at https://github.com/yxuansu/OpenAlpaca .
I've just encountered a similar issue and after some investigation I think there is a more general problem. There is special logic in the `addTextToSegment` function, which adds spaces between...
@andydotxyz FYI: I've started looking into fixing this (including the "split link"). Just a heads up, so that we don't do the same work in parallel by accident.
Sorry, I didn't explain it enough. `String()` should not be affected. No superfluous spaces are inserted. Besides the swap from trailing to leading spaces, there are these two new special...
I have added an explicit test for the last segment of a paragraph. Is this OK? I can also adapt the existing test functions, if you'd prefer that.
Thanks for accepting this PR! I kinda feared it would be rejected because it changes too much and really appreciate you taking the time to consider it.