quiet-star
quiet-star copied to clipboard
Code for Quiet-STaR
First of all, I would like to express my gratitude for your excellent research. I have a question about using your code for inference or evaluation. Initially, looking at your...
Huggingface -> Hugging Face
This PR add a file that contains the minimal code to infer the model with a consistent output. This seems very slow to infer 100 tokens but output a consistent...
could you please show a simple inference example with the thought tokens masked as you suggested in the README.md?
Thanks so much for this. Would love a simple starter code with `transformers`!
warning
Some weights of the model checkpoint at ezelikman/quietstar-8-ahead were not used when initializing MistralForCausalLM: ['end_embedding', 'start_embedding', 'talk_head.0.0.bias', 'talk_head.0.0.weight', 'talk_head.0.2.bias', 'talk_head.0.2.weight', 'talk_head.0.4.weight'] - This IS expected if you are initializing MistralForCausalLM...
Hi, Thanks for releasing this! If I finetune this on conversational tasks, do you know if it will lose the ability to reason? Thanks!
Trying to replicate your results in training - but I'm running into vram issues - I believe the issue lies with my accelerate settings. Please advise.
please make a jupiter notebook ... i have tried to get it working unsucessfully ... problem when loading the model .... the first parameter is : merged_talk_heads=merged_talk_heads, perhaps i have...
The version `transformers-4.37.0.dev0` is no longer exists. I don't know how to patch your repo to the `transformers latest version`. Do I have to copy `modeling_mistral.py` and `configuration_mistral.py` to the...