Support Microsoft Guidance
I am trying to use a 'custom tokenizer' but I am unable to see how can I invoke it. Also can we use a standard tokenizer from HF by pulling it or loading from the local path?
Never mind, I can just load it from transformers...
Yes, custom/HF tokenizer can be used with the generate() method:
from ctransformers import AutoModelForCausalLM
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('gpt2')
llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')
tokens = tokenizer.encode('AI is going to')
for token in llm.generate(tokens):
print(tokenizer.decode(token))
Please let me know if you were able to use this library with your custom/HF tokenizer.
I am actually trying to use ctransformers with Microsoft guidance, but I am encountering an error with protobuf. I posted a bug report there, but I am unsure what is happening.
EDIT, changed the code as it was a massive brainfart...
import guidance
from ctransformers import AutoModelForCausalLM
from transformers import LlamaTokenizer
from transformers import AutoTokenizer
# we will use LLaMA for most of the examples in this tutorial
path = '/home/vmajor/models/gpt4-alpaca-lora_mlp-65B'
llm = AutoModelForCausalLM.from_pretrained('/home/vmajor/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin', model_type='llama')
tokenizer = LlamaTokenizer.from_pretrained('/home/vmajor/models/llama-tokenizer-65b/tokenizer.model')
#print(llm('AI is going to'))
guidance.llm = guidance.llms.transformers.LLaMA(llm, tokenizer, device="cpu")
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[16], line 11
9 tokenizer = LlamaTokenizer.from_pretrained('/home/vmajor/models/llama-tokenizer-65b/tokenizer.model')
10 #print(llm('AI is going to'))
---> 11 guidance.llm = guidance.llms.transformers.LLaMA(llm, tokenizer, device="cpu")
File [~/anaconda3/envs/guidance/lib/python3.10/site-packages/guidance/llms/_transformers.py:42](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/vmajor/guidance/notebooks/~/anaconda3/envs/guidance/lib/python3.10/site-packages/guidance/llms/_transformers.py:42), in Transformers.__init__(self, model, tokenizer, caching, token_healing, acceleration, temperature, device, **kwargs)
40 self.acceleration = acceleration
41 if device is not None: # set the device if requested
---> 42 self.model_obj = self.model_obj.to(device)
43 self.device = self.model_obj.device # otherwise note the current device
45 self._prefix_ids = [self._tokenizer.bos_token_id, 100] # token ids that we use to decode tokens after a prefix
File [~/anaconda3/envs/guidance/lib/python3.10/site-packages/ctransformers/llm.py:197](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/vmajor/guidance/notebooks/~/anaconda3/envs/guidance/lib/python3.10/site-packages/ctransformers/llm.py:197), in LLM.__getattr__(self, name)
195 if name.startswith('ctransformers_llm_') and hasattr(lib, name):
196 return partial(getattr(lib, name), llm)
--> 197 raise AttributeError(f"'LLM' object has no attribute '{name}'")
AttributeError: 'LLM' object has no attribute 'to'
So, now the error is with ctransformers, but again I am uncertain if it is a real error, or just due to me trying something rather strange: mixing model with tokenizer that may not be the correct one, and trying to plug ctransformers and transformers implementations into guidance().
I haven't used the guidance library but the guidance.llms.transformers.LLaMA class is expecting HF transformers object but you are passing ctransformers object, so it won't work.
It looks like there is already an issue opened to support ggml models https://github.com/microsoft/guidance/issues/58
Yes, I saw that thread, but progress slowed down. I would really like the ability to leverage open source community efforts with what comes out of commercial, or well funded (eg HF) groups. ctransformers to me sounds like a great way to achieve that, but perhaps I am misunderstanding the drivers and philosophy behind it. Ideally I would love to be able to do what the other people in the thread were saying, just drop in a local quantized model in place of HF hosted model. This way us independent users, developers and tinkerers can plug into the much better resourced projects.
I would also like to add support for it but it doesn't seem to have documentation on how to add new models. I will try to follow this example and see if I can make it work. I will look into it next weekend.
@marella Any update on this? I'm looking forward to using StarChat-ggml weights in guidance via ctransformers~
I will take a stab at this later this week, but I don't want to repeat work, especially given the assumption that you might have spent time on this already. Were there any gotchas or difficulties that I could maybe help with?
Hey, I implemented a 🤗 Transformers compatible model and tokenizer using ctransformers and was able to run one of the examples but I think it has some bugs. I will push the code to GitHub later this week and will let you know. I'm trying to make it work like this:
model = # ctransformers model
tokenizer = # ctransformers tokenizer
llm = guidance.llms.Transformers(model=model, tokenizer=tokenizer)
Hey @marella, any update on this? I like the idea of having a transformers compatible model and tokenizer object for using ctransformers. I'd love to try it out, if you make a public branch with your work (even with the bugs), I can jump off from your starting point and see where it fails for me and offer some fixes if that's helpful.
Hi, I pushed the changes to guidance branch. You can install using:
git clone https://github.com/marella/ctransformers
cd ctransformers
git checkout guidance
pip install -e .
and use it as:
import guidance
from ctransformers import AutoModelForCausalLM
from ctransformers.transformers import Model, Tokenizer
llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml")
tokenizer = Tokenizer(llm)
model = Model(llm)
llm = guidance.llms.Transformers(model=model, tokenizer=tokenizer)
I fixed some of the bugs. It is working (finishing without errors) with guidance 0.0.61 but getting stuck in latest version.
Thanks very much for this @marella - for the life of me though I can't figure out why it hangs around 450 tokens?
Ok nevermind, it appears to be an issue with ctransformers_llm_context_length returning an incorrect 512 for a llama model. I've overridden and everything is working now.
In doing some benchmarking llm.eval is much slower than llama.cpp
@marella thanks for putting a branch together. I quickly tried to put together a prototype with it with your HF model above marella/gpt-2-ggml and guidance==0.0.61 but it's hanging around _stream_then_save. Any thoughts / tweaks I can make?
Here's a colab to reproduce: https://colab.research.google.com/drive/1YzBvp97pLwAdfl7tlKwCtYH2DZhigXiI?usp=sharing
@marella seconded! This would be killer
Hello, would like address some decode on Japanese character with wrong when decode in streaming. Does anybody can help how to fix it? (this happened because of llama tokenizer vocab size small, Japanese characters need more than one token to decode correctly)
for example: ��当时年少,��
Hi, I pushed the changes to
guidancebranch. You can install using:git clone https://github.com/marella/ctransformers cd ctransformers git checkout guidance pip install -e .and use it as:
import guidance from ctransformers import AutoModelForCausalLM from ctransformers.transformers import Model, Tokenizer llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml") tokenizer = Tokenizer(llm) model = Model(llm) llm = guidance.llms.Transformers(model=model, tokenizer=tokenizer)I fixed some of the bugs. It is working (finishing without errors) with
guidance0.0.61but getting stuck in latest version.
mistral doesn't work with guidance Model type 'mistral' is not supported.