Richard Ulmer

Results 43 comments of Richard Ulmer

It's unfortunate that the natrium node is that unreliable and I can give you no recommendation for a better node, but after thinking about it, I don't think this is...

@c0bra There has not yet been a new release of huggingface/transformers since the fix has been merged: https://github.com/huggingface/transformers/releases. I assume we still need to wait for this. The already existing...

When I compare the three tokenizers, they seem to be the same: ``` $ curl -L https://huggingface.co/openlm-research/open_llama_3b/resolve/main/tokenizer.model -o tokenizer.model.3b $ curl -L https://huggingface.co/openlm-research/open_llama_7b/resolve/main/tokenizer.model -o tokenizer.model.7b $ curl -L https://huggingface.co/openlm-research/open_llama_13b_600bt/resolve/main/tokenizer.model -o...

FYI: There are already people finetuning OpenLLaMa to follow instructions using the [databricks-dolly-15k dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k): https://github.com/yxuansu/OpenAlpaca

Duplicate of #46 . I would suggest taking a look at https://github.com/yxuansu/OpenAlpaca .

I've just encountered a similar issue and after some investigation I think there is a more general problem. There is special logic in the `addTextToSegment` function, which adds spaces between...

@andydotxyz FYI: I've started looking into fixing this (including the "split link"). Just a heads up, so that we don't do the same work in parallel by accident.

Sorry, I didn't explain it enough. `String()` should not be affected. No superfluous spaces are inserted. Besides the swap from trailing to leading spaces, there are these two new special...

I have added an explicit test for the last segment of a paragraph. Is this OK? I can also adapt the existing test functions, if you'd prefer that.

Thanks for accepting this PR! I kinda feared it would be rejected because it changes too much and really appreciate you taking the time to consider it.