J S
J S
Thanks for the PR! We need to release the StreamCallbacks version first. Also, would it be possible to add some tests for behavior of `convert_openai_to_gemini`? Also, is that really that...
yes, that's what I'm comparing against
As per the test shown above, I attempt to tokenize text containing mask token. If I use vanilla BPE from your library, it splits it fully (separate "[" etc) If...
> @svilupp The [hardcoded result in the test](https://github.com/svilupp/ModernBert.jl/blob/01819a6a762eb0d6ff8ca0c63e3d0418b9a48ce9/examples/verify.jl#L35) has an extra space. Is that expected? Yes, in Python, the space is folded into the special token. ```python from transformers import...
Thanks! I then had a problem with getting the encoding working (not getting the unknown token with `-1`, but I just added the variation in the `DictBackedLookupDict`. It's not glamorous,...
I know, but I was referring to matching " [MASK]" as unknown!
That is my understanding: ```python text = "[MASK][MASK] is a [MASK]" inputs = tokenizer(text, return_tensors="pt") >>> print("Tokens: ", tokenizer.tokenize(text)) Tokens: ['[MASK]', '[MASK]', 'Ġis', 'Ġa', ' [MASK]'] >>> print(inputs["input_ids"]) tensor([[50281, 50284,...
Thank you for opening the PR! Deleting files is easy, it's the imports/references and getting the RAGTools package off the ground. I don't think we should delete anything until RAGTools.jl...
@pabvald that should already be possible. Have you seen this FAQ: https://siml.earth/PromptingTools.jl/dev/frequently_asked_questions#Using-Custom-API-Providers-like-Azure-or-Databricks ? It’s like Databricks and everyone has a little bit different URL, so you just need to provide...
Thanks for the tip! Yes, I can see the slowdown, on the first load it's quite significant but I had no choice as it was happening too frequently. I'll explore...