Tiktoken icon indicating copy to clipboard operation
Tiktoken copied to clipboard

This project implements token calculation for OpenAI's gpt-4 and gpt-3.5-turbo model, specifically using `cl100k_base` encoding.

Results 9 Tiktoken issues
Sort by recently updated
recently updated
newest added

Hi guys, I love your library—great job. However, I have some suggestions about the API you used in the last package. Designing the API like this one is not really...

Thanks for creating and maintaining this great library! ## What would you like to be added: Today, `ModelToEncoder` calls `ModelToEncoding` which statically initializes a dictionary of 7 encodings. 6/7 are...

## What would you like to be added: It would be great to generate/load encoder from tokenizer.json file like https://huggingface.co/CohereForAI/aya-101/resolve/main/tokenizer.json or https://huggingface.co/openai-community/gpt2/raw/main/tokenizer.json ## Why is this needed: Easy use of...

### Discussed in https://github.com/tryAGI/Tiktoken/discussions/28 Originally posted by **CodePlacer** May 10, 2024 When calling below method to get the encoding for model gpt-4, am getting out of memory exception. Exception is...

Not sure if this would fit into the scope of this project, but could be a real killer feature, since none of the others do it. If not, please feel...

### Describe the bug ``` Stack overflow. at Tiktoken.Core.BytePairEncoding.FindParts(System.ReadOnlyMemory`1, Int32*, System.Collections.Generic.IReadOnlyDictionary`2) at Tiktoken.Core.BytePairEncoding.BytePairEncodeCountTokens(System.ReadOnlyMemory`1, System.Collections.Generic.IReadOnlyDictionary`2) at Tiktoken.CoreBpe.CountTokensNative(System.String) at CodeAlive.Domain.TokensCounter+c__DisplayClass4_0+d.MoveNext() at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=8.0.0.0,...

bug

.NET 9 adds support for [collection lookups with spans](https://github.com/dotnet/core/blob/main/release-notes/9.0/preview/preview6/libraries.md#collection-lookups-with-spans). This permits zero-allocation lookups in the `FastEncoder` dictionary using segments of an input string. This PR is a proof-of-concept to start...

https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.enumeratesplits?view=net-9.0