goerch
goerch
That is a nice test. I made some modifications to get more detailed outputs of the tests and see differences like 1. Problem with `endoftext`  2. Non greediness ...
Intermediate results of debugging: `bpe_gpt2_preprocess` seems to do the right thing, `llm_tokenizer_bpe::tokenize` seems to be subtly broken, although it looks very similar to `examples/gptneox-wip`. Paging @cmp-nct in need for help,...
> `llm_tokenizer_bpe::tokenize` seems to be subtly broken I implemented an independent port of the [gpt2-tokenizer](https://github.com/openai/gpt-2/blob/master/src/encoder.py#L55-L101)(will share the code if someone is interested) and it shows the same behavior as the...
I could imagine this to be hairy problem, because I'd assume a couple of models have been trained with the fast tokenizers?
Modifying OnceDifferentiable to ``` mutable struct OnceDifferentiable{F, DF, FDF, TF, TDF, TX}
From the looks of it, #316 fixes this?
If you are testing these please [lock](https://discourse.julialang.org/t/improving-performance-on-nested-for-loops-sparsearrays-libgeos/74038/4) access to the shared resource.
> From https://trac.osgeo.org/geos/wiki/RFC3 > > > Function names in the new API will be updated with an _r, as is the familiar C standard for reentrant/thread safe versions. > >...
At least we are aware of the problem. We currently have this [assertion](https://github.com/ggerganov/llama.cpp/blob/48edda30ee545fdac2e7a33d505382888f748bbf/llama.cpp#L2065C12-L2065).
@Malte0621 : not sure what you are trying to achieve here: ggml builds for me successfully with CMake and the Visual Studio 2022 toolkit?