mistral.rs
mistral.rs copied to clipboard
example grammar is failed
Describe the bug
I got error
directory: /Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs/examples
mymachine environment ```
ProductName: macOS
ProductVersion: 14.4.1
Hardware Overview:
Model Name: MacBook Pro
Model Identifier: MacBookPro18,4
Model Number: Z15H0016ZJ/A
Chip: Apple M1 Max
Total Number of Cores: 10 (8 performance and 2 efficiency)
Memory: 64 GB
❯ cargo run --example grammar --release
Compiling mistralrs-quant v0.3.0 (/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs-quant)
Compiling mistralrs-core v0.3.0 (/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs-core)
Compiling mistralrs-vision v0.3.0 (/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs-vision)
Compiling mistralrs v0.3.0 (/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs)
Finished release profile [optimized] target(s) in 30.75s
Running /Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/target/release/examples/grammar
2024-09-27T05:38:45.324420Z INFO hf_hub: Token file not found "/Users/yuta/.cache/huggingface/token"
2024-09-27T05:38:45.324590Z INFO mistralrs_core::utils::tokens: Could not load token at "/Users/yuta/.cache/huggingface/token", using no HF token.
2024-09-27T05:38:45.325083Z INFO mistralrs_core::pipeline::normal: Loading tokenizer.json at microsoft/Phi-3.5-mini-instruct
2024-09-27T05:38:45.325540Z INFO mistralrs_core::pipeline::normal: Loading config.json at microsoft/Phi-3.5-mini-instruct
2024-09-27T05:38:45.993416Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors"]
2024-09-27T05:38:46.198383Z INFO mistralrs_core::pipeline::normal: Loading generation_config.json at microsoft/Phi-3.5-mini-instruct
2024-09-27T05:38:46.933785Z INFO mistralrs_core::pipeline::normal: Loading tokenizer_config.json at microsoft/Phi-3.5-mini-instruct
2024-09-27T05:38:46.935057Z INFO mistralrs_core::pipeline::normal: Loading model microsoft/Phi-3.5-mini-instruct on cpu.
2024-09-27T05:38:46.935316Z INFO mistralrs_core::utils::log: Automatic loader type determined to be phi3
2024-09-27T05:38:46.935866Z INFO mistralrs_core::utils::normal: DType selected is F16.
2024-09-27T05:38:46.935898Z INFO mistralrs_core::pipeline::normal: Model config: Config { vocab_size: 32064, hidden_act: Silu, hidden_size: 3072, intermediate_size: 8192, num_hidden_layers: 32, num_attention_heads: 32, num_key_value_heads: 32, rms_norm_eps: 1e-5, rope_theta: 10000.0, bos_token_id: Some(1), eos_token_id: Some(32000), rope_scaling: Some(Classic { short_factor: [1.0, 1.0199999809265137, 1.0299999713897705, 1.0299999713897705, 1.0499999523162842, 1.0499999523162842, 1.0499999523162842, 1.0499999523162842, 1.0499999523162842, 1.069999933242798, 1.0999999046325684, 1.1099998950958252, 1.1599998474121094, 1.1599998474121094, 1.1699998378753662, 1.2899998426437378, 1.339999794960022, 1.679999828338623, 1.7899998426437378, 1.8199998140335083, 1.8499997854232788, 1.879999756813049, 1.90999972820282, 1.9399996995925903, 1.9899996519088743, 2.0199997425079346, 2.0199997425079346, 2.0199997425079346, 2.0199997425079346, 2.0199997425079346, 2.0199997425079346, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0799996852874756, 2.0899996757507324, 2.189999580383301, 2.2199995517730713, 2.5899994373321533, 2.729999542236328, 2.749999523162842, 2.8399994373321533], long_factor: [1.0800000429153442, 1.1100000143051147, 1.1399999856948853, 1.340000033378601, 1.5899999141693115, 1.600000023841858, 1.6200000047683716, 2.620000123977661, 3.2300000190734863, 3.2300000190734863, 4.789999961853027, 7.400000095367432, 7.700000286102295, 9.09000015258789, 12.199999809265137, 17.670000076293945, 24.46000099182129, 28.57000160217285, 30.420001983642575, 30.840002059936523, 32.590003967285156, 32.93000411987305, 42.32000350952149, 44.96000289916992, 50.34000396728515, 50.45000457763672, 57.55000305175781, 57.93000411987305, 58.21000289916992, 60.1400032043457, 62.61000442504883, 62.62000274658203, 62.71000289916992, 63.1400032043457, 63.1400032043457, 63.77000427246094, 63.93000411987305, 63.96000289916992, 63.970001220703125, 64.02999877929688, 64.06999969482422, 64.08000183105469, 64.12000274658203, 64.41000366210938, 64.4800033569336, 64.51000213623047, 64.52999877929688, 64.83999633789063], scaling_type: Su }), max_position_embeddings: 131072, use_flash_attn: false, sliding_window: Some(262144), original_max_position_embeddings: 4096, quantization_config: None }
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:04<00:00, 11.71it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:10<00:00, 7.58it/s]
2024-09-27T05:39:05.613379Z INFO mistralrs_core::pipeline::isq: Applying in-situ quantization into Some(Q4K) to 129 tensors.
2024-09-27T05:39:05.613615Z INFO mistralrs_core::pipeline::isq: Applying ISQ on 10 threads.
2024-09-27T05:39:12.267294Z INFO mistralrs_core::pipeline::isq: Applied in-situ quantization into Some(Q4K) to 129 tensors out of 129 total tensors. Took 6.65s
2024-09-27T05:39:12.311255Z INFO mistralrs_core::pipeline::chat_template: bos_toks = "", eos_toks = "<|endoftext|>", "<|end|>", "<|assistant|>", unk_tok =
## Latest commit or version
1eb9cae2a4ec89d7cf8a5fc8d9f57b82f2f747fa
Mac OS 18 rev = "86f37fa803c40e9ee14c43e0028ad32f841ceb07"
The error only occurs with 1) "microsoft/Phi-3-mini-128k-instruct", 2) with a constrained grammar. There is no error when using "meta-llama/Llama-3.1-8B-Instruct" with constrained grammar.
I believe the error arrises because the vocab_size is set at 32064, but "https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/tokenizer.json" only has 3200 tokens, and "https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/added_tokens.json" has 11 tokens. I don't know where the missing 53 tokens are.
this is related : microsoft/phi-2/discussions/97, #epfl-dlab/transformers-CFG/pull/83
i suggest somethings like:
pub(crate) fn build_tok_trie(tokenizer: Tokenizer, cfg_vocal_size: usize) -> TokTrie {
let bt = ByteTokenizer::from_tokenizer(tokenizer, cfg_vocal_size).unwrap();
TokTrie::from(&bt.tokrx_info(), &bt.token_bytes())
}
impl ByteTokenizer {
pub fn from_tokenizer(mut hft: Tokenizer, cfg_vocal_size: usize) -> Result<ByteTokenizer> {
...
for tok_id in 0..vocab_size {
...
}
if cfg_vocal_size > res.vocab_size {
let vocab_size_diff = cfg_vocal_size - res.vocab_size;
res.vocab_size = cfg_vocal_size;
res.token_bytes.extend(
(0..vocab_size_diff)
.map(|_| Vec::new())
.collect::<Vec<_>>(),
);
}
}
}
@haricot please feel free to contribute the change. We have a draft PR for reworking the entire grammar system to use llguidance, though, which should be much cleaner.
@EricLBuehler Thanks for the info, this looks great and the draft PR with llguidance fixes this issue when the embed size is different from the vocabulary size, it seems that the resolution of this issue is related to the upgrade of the toktrie_hf_tokenizers crate.