llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

Please replace the Tiktoken dependency with a library in pure python (not Rust)

Open Emasoft opened this issue 2 years ago • 3 comments

Please replace the Tiktoken dependency with a tokenizer library in pure python (not Rust dependencies) for people like me that cannot compile and run Rust binaries on their system (for various reasons: package managers support, company policy, intranet or local machine security, docking containers limitations, vm restrictions, environment virtualization, lack of Rust support in jupyter notebooks remote hosting, etc).

Emasoft avatar Feb 27 '23 18:02 Emasoft

I would not say "replace" but I endorse providing an alternate pure-python path. at some places it's very difficult to install anything unfamiliar.

fredzannarbor avatar Mar 01 '23 22:03 fredzannarbor

Yes, you can add the pure python version as a fallback. @jerryjliu ?

Emasoft avatar Mar 02 '23 10:03 Emasoft

@Emasoft thanks for flagging, hopefully will take a look today or tomorrow

jerryjliu avatar Mar 06 '23 21:03 jerryjliu

@jerryjliu Any update on this?

Emasoft avatar Apr 27 '23 20:04 Emasoft

If you are running python3.8, the tokenizer from transformers will be used instead of tiktoken

logan-markewich avatar Jun 05 '23 15:06 logan-markewich

Hi, @Emasoft. I'm Dosu, and I'm helping the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you opened this issue requesting the replacement of the Tiktoken dependency with a pure Python tokenizer library. Some users were unable to compile and run Rust binaries on their systems, so there was a need for an alternate solution. Fredzannarbor supported the idea of providing a pure-python path, and Jerryjliu has been asked to look into it. Logan-markewich mentioned that if running python3.8, the tokenizer from transformers will be used instead of tiktoken.

I wanted to check with you if this issue is still relevant to the latest version of the LlamaIndex repository. If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LlamaIndex project! Let us know if you have any further questions or concerns.

dosubot[bot] avatar Sep 04 '23 16:09 dosubot[bot]

Wow, this is way better than a human would have produced, if at all.

Fred Zimmerman, Publisher Nimble Books LLC The AI Lab for Book-Lovers http://NimbleBooks.com

On Sep 4, 2023 at 12:39:25 PM, dosu-beta[bot] @.***> wrote:

Hi, @Emasoft https://github.com/Emasoft. I'm Dosu, and I'm helping the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you opened this issue requesting the replacement of the Tiktoken dependency with a pure Python tokenizer library. Some users were unable to compile and run Rust binaries on their systems, so there was a need for an alternate solution. Fredzannarbor supported the idea of providing a pure-python path, and Jerryjliu has been asked to look into it. Logan-markewich mentioned that if running python3.8, the tokenizer from transformers will be used instead of tiktoken.

I wanted to check with you if this issue is still relevant to the latest version of the LlamaIndex repository. If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LlamaIndex project! Let us know if you have any further questions or concerns.

— Reply to this email directly, view it on GitHub https://github.com/jerryjliu/llama_index/issues/567#issuecomment-1705524576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI4TS5VBPFMBQHCZJQNMVTXYX733ANCNFSM6AAAAAAVJYZF2Q . You are receiving this because you commented.Message ID: @.***>

fredzannarbor avatar Sep 04 '23 19:09 fredzannarbor

Thank you for your response, @fredzannarbor! We appreciate your kind words. We will be closing this issue now. If you have any further questions or concerns, please let us know.

dosubot[bot] avatar Sep 04 '23 19:09 dosubot[bot]