ctransformers Support Microsoft Guidance

I am trying to use a 'custom tokenizer' but I am unable to see how can I invoke it. Also can we use a standard tokenizer from HF by pulling it or loading from the local path?

May 26 '23 05:05 vmajor

Never mind, I can just load it from transformers...

May 26 '23 05:05 vmajor

Yes, custom/HF tokenizer can be used with the generate() method:

from ctransformers import AutoModelForCausalLM
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('gpt2')
llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')

tokens = tokenizer.encode('AI is going to')

for token in llm.generate(tokens):
    print(tokenizer.decode(token))

Please let me know if you were able to use this library with your custom/HF tokenizer.

May 26 '23 13:05 marella

I am actually trying to use ctransformers with Microsoft guidance, but I am encountering an error with protobuf. I posted a bug report there, but I am unsure what is happening.

EDIT, changed the code as it was a massive brainfart...

import guidance
from ctransformers import AutoModelForCausalLM
from transformers import LlamaTokenizer
from transformers import AutoTokenizer

# we will use LLaMA for most of the examples in this tutorial
path = '/home/vmajor/models/gpt4-alpaca-lora_mlp-65B'
llm = AutoModelForCausalLM.from_pretrained('/home/vmajor/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin', model_type='llama')
tokenizer = LlamaTokenizer.from_pretrained('/home/vmajor/models/llama-tokenizer-65b/tokenizer.model')
#print(llm('AI is going to'))
guidance.llm = guidance.llms.transformers.LLaMA(llm, tokenizer, device="cpu")

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[16], line 11
      9 tokenizer = LlamaTokenizer.from_pretrained('/home/vmajor/models/llama-tokenizer-65b/tokenizer.model')
     10 #print(llm('AI is going to'))
---> 11 guidance.llm = guidance.llms.transformers.LLaMA(llm, tokenizer, device="cpu")

File [~/anaconda3/envs/guidance/lib/python3.10/site-packages/guidance/llms/_transformers.py:42](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/vmajor/guidance/notebooks/~/anaconda3/envs/guidance/lib/python3.10/site-packages/guidance/llms/_transformers.py:42), in Transformers.__init__(self, model, tokenizer, caching, token_healing, acceleration, temperature, device, **kwargs)
     40 self.acceleration = acceleration
     41 if device is not None: # set the device if requested
---> 42     self.model_obj = self.model_obj.to(device)
     43 self.device = self.model_obj.device # otherwise note the current device
     45 self._prefix_ids = [self._tokenizer.bos_token_id, 100] # token ids that we use to decode tokens after a prefix

File [~/anaconda3/envs/guidance/lib/python3.10/site-packages/ctransformers/llm.py:197](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/vmajor/guidance/notebooks/~/anaconda3/envs/guidance/lib/python3.10/site-packages/ctransformers/llm.py:197), in LLM.__getattr__(self, name)
    195 if name.startswith('ctransformers_llm_') and hasattr(lib, name):
    196     return partial(getattr(lib, name), llm)
--> 197 raise AttributeError(f"'LLM' object has no attribute '{name}'")

AttributeError: 'LLM' object has no attribute 'to'

So, now the error is with ctransformers, but again I am uncertain if it is a real error, or just due to me trying something rather strange: mixing model with tokenizer that may not be the correct one, and trying to plug ctransformers and transformers implementations into guidance().

May 26 '23 14:05 vmajor

I haven't used the guidance library but the guidance.llms.transformers.LLaMA class is expecting HF transformers object but you are passing ctransformers object, so it won't work. It looks like there is already an issue opened to support ggml models https://github.com/microsoft/guidance/issues/58

May 28 '23 12:05 marella

Yes, I saw that thread, but progress slowed down. I would really like the ability to leverage open source community efforts with what comes out of commercial, or well funded (eg HF) groups. ctransformers to me sounds like a great way to achieve that, but perhaps I am misunderstanding the drivers and philosophy behind it. Ideally I would love to be able to do what the other people in the thread were saying, just drop in a local quantized model in place of HF hosted model. This way us independent users, developers and tinkerers can plug into the much better resourced projects.

May 28 '23 12:05 vmajor

I would also like to add support for it but it doesn't seem to have documentation on how to add new models. I will try to follow this example and see if I can make it work. I will look into it next weekend.

May 28 '23 13:05 marella

@marella Any update on this? I'm looking forward to using StarChat-ggml weights in guidance via ctransformers~

I will take a stab at this later this week, but I don't want to repeat work, especially given the assumption that you might have spent time on this already. Were there any gotchas or difficulties that I could maybe help with?

Jun 12 '23 16:06 bluecoconut

Hey, I implemented a 🤗 Transformers compatible model and tokenizer using ctransformers and was able to run one of the examples but I think it has some bugs. I will push the code to GitHub later this week and will let you know. I'm trying to make it work like this:

model = # ctransformers model
tokenizer = # ctransformers tokenizer
llm = guidance.llms.Transformers(model=model, tokenizer=tokenizer)

Jun 12 '23 19:06 marella

Hey @marella, any update on this? I like the idea of having a transformers compatible model and tokenizer object for using ctransformers. I'd love to try it out, if you make a public branch with your work (even with the bugs), I can jump off from your starting point and see where it fails for me and offer some fixes if that's helpful.

Jun 19 '23 02:06 bluecoconut

Hi, I pushed the changes to guidance branch. You can install using:

git clone https://github.com/marella/ctransformers
cd ctransformers
git checkout guidance
pip install -e .

and use it as:

import guidance
from ctransformers import AutoModelForCausalLM
from ctransformers.transformers import Model, Tokenizer

llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml")
tokenizer = Tokenizer(llm)
model = Model(llm)
llm = guidance.llms.Transformers(model=model, tokenizer=tokenizer)

I fixed some of the bugs. It is working (finishing without errors) with guidance 0.0.61 but getting stuck in latest version.

Jun 19 '23 19:06 marella

Thanks very much for this @marella - for the life of me though I can't figure out why it hangs around 450 tokens?

Jul 11 '23 02:07 qeternity

Ok nevermind, it appears to be an issue with ctransformers_llm_context_length returning an incorrect 512 for a llama model. I've overridden and everything is working now.

In doing some benchmarking llm.eval is much slower than llama.cpp

Jul 11 '23 23:07 qeternity

@marella thanks for putting a branch together. I quickly tried to put together a prototype with it with your HF model above marella/gpt-2-ggml and guidance==0.0.61 but it's hanging around _stream_then_save. Any thoughts / tweaks I can make?

Here's a colab to reproduce: https://colab.research.google.com/drive/1YzBvp97pLwAdfl7tlKwCtYH2DZhigXiI?usp=sharing

Jul 31 '23 13:07 kw2828

@marella seconded! This would be killer

Aug 11 '23 02:08 Jchang4

Hello, would like address some decode on Japanese character with wrong when decode in streaming. Does anybody can help how to fix it? (this happened because of llama tokenizer vocab size small, Japanese characters need more than one token to decode correctly)

for example: ��当时年少，��

Aug 29 '23 03:08 lucasjinreal

Hi, I pushed the changes to guidance branch. You can install using:
git clone https://github.com/marella/ctransformers
cd ctransformers
git checkout guidance
pip install -e .
and use it as:
import guidance
from ctransformers import AutoModelForCausalLM
from ctransformers.transformers import Model, Tokenizer

llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml")
tokenizer = Tokenizer(llm)
model = Model(llm)
llm = guidance.llms.Transformers(model=model, tokenizer=tokenizer)
I fixed some of the bugs. It is working (finishing without errors) with guidance 0.0.61 but getting stuck in latest version.

mistral doesn't work with guidance Model type 'mistral' is not supported.

Oct 21 '23 11:10 barinov274