mistral.rs
mistral.rs copied to clipboard
dolphin-2.9-mixtral-8x22b.Q8_0.gguf "Error: cannot find tensor info for blk.0.ffn_gate.0.weight"?
I attempted to run mistralrs-server
to serve my local copy of dolphin-2.9-mixtral-8x22b.Q8_0.gguf
. This file isn't available on huggingface because it's broken into four parts here.
Ideally, I'd like to serve from completely offline files. But that's not critical atm.
$ git show --oneline
fc02ccebd8b4 (HEAD -> master, origin/master, origin/HEAD) Merge pull request #348 from EricLBuehler/expose_api
Built with
$ cargo build --release --features metal
And attempting to run all flavors of gguf
resulted in:
$ ./target/release/mistralrs-server --serve-ip 127.0.0.1 -p 8888 gguf -t cognitivecomputations/dolphin-2.9-mixtral-8x22b -m cognitivecomputations/dolphin-2.9-mixtral-8x22b -f ~/models/dolphin-2.9-mixtral-8x22b.Q8_0.gguf
2024-05-28T00:09:39.318968Z INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2024-05-28T00:09:39.319009Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-05-28T00:09:39.319029Z INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-05-28T00:09:39.319066Z INFO hf_hub: Token file not found "/Users/psyv/.cache/huggingface/token"
2024-05-28T00:09:39.319080Z INFO mistralrs_core::utils::tokens: Could not load token at "/Users/psyv/.cache/huggingface/token", using no HF token.
2024-05-28T00:09:39.319221Z INFO hf_hub: Token file not found "/Users/psyv/.cache/huggingface/token"
2024-05-28T00:09:39.319228Z INFO mistralrs_core::utils::tokens: Could not load token at "/Users/psyv/.cache/huggingface/token", using no HF token.
2024-05-28T00:09:39.321434Z DEBUG ureq::stream: connecting to huggingface.co:443 at 18.154.227.67:443
2024-05-28T00:09:39.341684Z DEBUG rustls::client::hs: No cached session for DnsName("huggingface.co")
2024-05-28T00:09:39.341758Z DEBUG rustls::client::hs: Not resuming any session
2024-05-28T00:09:39.365238Z DEBUG rustls::client::hs: Using ciphersuite TLS13_AES_128_GCM_SHA256
2024-05-28T00:09:39.365255Z DEBUG rustls::client::tls13: Not resuming
2024-05-28T00:09:39.365339Z DEBUG rustls::client::tls13: TLS1.3 encrypted extensions: [ServerNameAck]
2024-05-28T00:09:39.365345Z DEBUG rustls::client::hs: ALPN protocol is None
2024-05-28T00:09:39.365548Z DEBUG ureq::stream: created stream: Stream(RustlsStream)
2024-05-28T00:09:39.365553Z DEBUG ureq::unit: sending request GET https://huggingface.co/api/models/cognitivecomputations/dolphin-2.9-mixtral-8x22b/revision/main
2024-05-28T00:09:39.365559Z DEBUG ureq::unit: writing prelude: GET /api/models/cognitivecomputations/dolphin-2.9-mixtral-8x22b/revision/main HTTP/1.1
Host: huggingface.co
Accept: */*
User-Agent: unkown/None; hf-hub/0.3.2; rust/unknown
accept-encoding: gzip
2024-05-28T00:09:39.408596Z DEBUG ureq::response: Body entirely buffered (length: 6027)
2024-05-28T00:09:39.408620Z DEBUG ureq::pool: adding stream to pool: https|huggingface.co|443 -> Stream(RustlsStream)
2024-05-28T00:09:39.408627Z DEBUG ureq::unit: response 200 to GET https://huggingface.co/api/models/cognitivecomputations/dolphin-2.9-mixtral-8x22b/revision/main
2024-05-28T00:09:39.408840Z DEBUG ureq::stream: dropping stream: Stream(RustlsStream)
2024-05-28T00:09:39.408861Z INFO mistralrs_core::pipeline::gguf: Loading model `cognitivecomputations/dolphin-2.9-mixtral-8x22b` on Metal(MetalDevice(DeviceId(1)))...
2024-05-28T00:09:39.472560Z INFO mistralrs_core::pipeline::gguf: Model config:
general.architecture: llama
general.file_type: 7
general.name: .
general.quantization_version: 2
general.source.url: https://huggingface.co/cognitivecomputations/dolphin-2.9-mixtral-8x22b
general.url: https://huggingface.co/mradermacher/dolphin-2.9-mixtral-8x22b-GGUF
llama.attention.head_count: 48
llama.attention.head_count_kv: 8
llama.attention.layer_norm_rms_epsilon: 0.00001
llama.block_count: 56
llama.context_length: 65536
llama.embedding_length: 6144
llama.expert_count: 8
llama.expert_used_count: 2
llama.feed_forward_length: 16384
llama.rope.dimension_count: 128
llama.rope.freq_base: 1000000
llama.vocab_size: 32002
mradermacher.quantize_version: 2
mradermacher.quantized_at: 2024-05-03T03:00:02+02:00
mradermacher.quantized_by: mradermacher
mradermacher.quantized_on: backup1
mradermacher.vocab_type: spm
Error: cannot find tensor info for blk.0.ffn_gate.0.weight
Am I doing something wrong? Or is mistral-8x22b not supported yet (seems unlikely for a project named mistral.rs ;-) )
Here's my diff against HEAD
since there's not a verbose flag yet:
diff --git a/mistralrs-server/src/main.rs b/mistralrs-server/src/main.rs
index 361a556b53a4..a0e81a14daba 100644
--- a/mistralrs-server/src/main.rs
+++ b/mistralrs-server/src/main.rs
@@ -254,7 +254,7 @@ async fn main() -> Result<()> {
let device = Device::cuda_if_available(0)?;
let filter = EnvFilter::builder()
- .with_default_directive(LevelFilter::INFO.into())
+ .with_default_directive(LevelFilter::DEBUG.into())
.from_env_lossy();
tracing_subscriber::fmt().with_env_filter(filter).init();