SharpToken icon indicating copy to clipboard operation
SharpToken copied to clipboard

Anthropic (claude) support

Open omri-suissa-clearmash opened this issue 1 year ago • 6 comments
trafficstars

Can we use SharpToken for Anthropic? I could not find if claude is using "cl100k_base" or other encoding

omri-suissa-clearmash avatar Mar 14 '24 09:03 omri-suissa-clearmash

Hello @omri-suissa-clearmash !

Could you share a bit more what is claude or Anthropic? What is there and how it works?

Thanks

dmitry-brazhenko avatar Mar 14 '24 09:03 dmitry-brazhenko

@dmitry-brazhenko claude is the LLM of Anthropic (https://www.anthropic.com/). This is what I could find: https://github.com/anthropics/anthropic-sdk-python/blob/e84645b07ca5267066700a104b4d8d6a8da1383d/src/anthropic/_tokenizers.py

omri-suissa-clearmash avatar Mar 14 '24 10:03 omri-suissa-clearmash

Thanks for sharing.

I will check that. Probably they use some already existing encoding (cl100k_base) or just some custom one. I will check.

dmitry-brazhenko avatar Mar 14 '24 10:03 dmitry-brazhenko

@dmitry-brazhenko also found this: https://github.com/19h/claude_tokenizer (rust)

omri-suissa-clearmash avatar Mar 14 '24 12:03 omri-suissa-clearmash

Hello @omri-suissa-clearmash !

I checked the algorithm. Seems that there is a difference, but it can be potentially implemented into Sharptoken lib. I will try to do that within a few days

dmitry-brazhenko avatar Mar 15 '24 09:03 dmitry-brazhenko