quiet-star icon indicating copy to clipboard operation
quiet-star copied to clipboard

Code for Quiet-STaR

Results 10 quiet-star issues
Sort by recently updated
recently updated
newest added

First of all, I would like to express my gratitude for your excellent research. I have a question about using your code for inference or evaluation. Initially, looking at your...

Huggingface -> Hugging Face

This PR add a file that contains the minimal code to infer the model with a consistent output. This seems very slow to infer 100 tokens but output a consistent...

could you please show a simple inference example with the thought tokens masked as you suggested in the README.md?

Thanks so much for this. Would love a simple starter code with `transformers`!

Some weights of the model checkpoint at ezelikman/quietstar-8-ahead were not used when initializing MistralForCausalLM: ['end_embedding', 'start_embedding', 'talk_head.0.0.bias', 'talk_head.0.0.weight', 'talk_head.0.2.bias', 'talk_head.0.2.weight', 'talk_head.0.4.weight'] - This IS expected if you are initializing MistralForCausalLM...

Hi, Thanks for releasing this! If I finetune this on conversational tasks, do you know if it will lose the ability to reason? Thanks!

Trying to replicate your results in training - but I'm running into vram issues - I believe the issue lies with my accelerate settings. Please advise.

please make a jupiter notebook ... i have tried to get it working unsucessfully ... problem when loading the model .... the first parameter is : merged_talk_heads=merged_talk_heads, perhaps i have...

The version `transformers-4.37.0.dev0` is no longer exists. I don't know how to patch your repo to the `transformers latest version`. Do I have to copy `modeling_mistral.py` and `configuration_mistral.py` to the...