mistral.rs Improve getting started docs

Describe the bug Trying to follow getting started commands on Mac but they doesn't seem to work out of the box:

% cp ./target/release/mistralrs-server .
cp: cannot overwrite directory ./mistralrs-server with non-directory ./target/release/mistralrs-server
# there is already dir with that name

or

% ./target/release/mistralrs-server --port 1234 --log output.log plain -m TheBloke/Mistral-7B-Instruct-v0.1-GGUF 
error: the following required arguments were not provided:
  --arch <ARCH>

Usage: mistralrs-server plain --model-id <MODEL_ID> --arch <ARCH>

# what is arch? 

% ./target/release/mistralrs-server plain --help
Select a plain model

Usage: mistralrs-server plain [OPTIONS] --model-id <MODEL_ID> --arch <ARCH>

Options:
  -m, --model-id <MODEL_ID>
          Model ID to load from. This may be a HF hub repo or a local path
  -t, --tokenizer-json <TOKENIZER_JSON>
          Path to local tokenizer.json file. If this is specified it is used over any remote file
      --repeat-last-n <REPEAT_LAST_N>
          Control the application of repeat penalty for the last n tokens [default: 64]
  -a, --arch <ARCH>
          
  -h, --help
          Print help

or

% ./target/release/mistralrs-server --port 1234 ggml -t meta-llama/Llama-2-13b-chat-hf -m TheBloke/Llama-2-13B-chat-GGML -f llama-2-13b-chat.ggmlv3.q4_K_M.bin
2024-04-27T07:09:48.321090Z  INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2024-04-27T07:09:48.321126Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-04-27T07:09:48.321330Z  INFO mistralrs_server: Loading model `meta-llama/Llama-2-13b-chat-hf` on Metal(MetalDevice(DeviceId(1)))...
2024-04-27T07:09:48.321575Z  INFO mistralrs_server: Model kind is: quantized from ggml (no adapters)
thread 'main' panicked at mistralrs-core/src/pipeline/ggml.rs:269:9:
File "tokenizer.json" not found at model id "meta-llama/Llama-2-13b-chat-hf"

Latest commit 0825227

Apr 27 '24 07:04 igo

@igo, thank you for raising this. I have improved the docs in #221 so it should be clearer.

there is already dir with that name

Fixed in #221

what is arch?

--arch specified the model architecture: see this section.

File "tokenizer.json" not found at model id "meta-llama/Llama-2-13b-chat-hf"

This works on my machine when I run after 82e3ebf.

./target/release/mistralrs-server --port 1234 ggml -t meta-llama/Llama-2-13b-chat-hf -m TheBloke/Llama-2-13B-chat-GGML -f llama-2-13b-chat.ggmlv3.q4_K_M.bin

Does the error persist on your machine?

Apr 27 '24 09:04 EricLBuehler

@igo, I think you may get this error if the token source is not set up correctly. For a gated model such as Llama 3 that would cause the error.

Apr 27 '24 11:04 EricLBuehler

Refs #222.

Apr 27 '24 11:04 EricLBuehler

None of those command to start server work:

mistral.rs % ./mistralrs_server --port 1234 --token-source none plain -m mistralai/Mistral-7B-Instruct-v0.1 -a mistral
2024-04-27T12:15:40.393964Z  INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2024-04-27T12:15:40.394006Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-04-27T12:15:40.394033Z  INFO mistralrs_server: Loading model `mistralai/Mistral-7B-Instruct-v0.1` on Metal(MetalDevice(DeviceId(1)))...
2024-04-27T12:15:40.394097Z  INFO mistralrs_server: Model kind is: normal (no quant, no adapters)
thread 'main' panicked at mistralrs-core/src/pipeline/normal.rs:232:9:
File "tokenizer.json" not found at model id "mistralai/Mistral-7B-Instruct-v0.1"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

mistral.rs % ./mistralrs_server --port 1234 --log output.log gguf -m TheBloke/Mistral-7B-Instruct-v0.1-GGUF -t mistralai/Mistral-7B-Instruct-v0.1 -f mistral-7b-instruct-v0.1.Q4_K_M.gguf
2024-04-27T12:16:23.925638Z  INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2024-04-27T12:16:23.925665Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-04-27T12:16:23.925683Z  INFO mistralrs_server: Loading model `mistralai/Mistral-7B-Instruct-v0.1` on Metal(MetalDevice(DeviceId(1)))...
2024-04-27T12:16:23.925690Z  INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
thread 'main' panicked at mistralrs-core/src/pipeline/gguf.rs:303:9:
File "tokenizer.json" not found at model id "mistralai/Mistral-7B-Instruct-v0.1"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

mistral.rs % ./mistralrs_server -i gguf -t mistralai/Mistral-7B-Instruct-v0.1 -m TheBloke/Mistral-7B-Instruct-v0.1-GGUF -f mistral-7b-instruct-v0.1.Q4_K_M.gguf
2024-04-27T12:16:33.769826Z  INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2024-04-27T12:16:33.769846Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-04-27T12:16:33.769850Z  INFO mistralrs_server: Loading model `mistralai/Mistral-7B-Instruct-v0.1` on Metal(MetalDevice(DeviceId(1)))...
2024-04-27T12:16:33.769856Z  INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
thread 'main' panicked at mistralrs-core/src/pipeline/gguf.rs:303:9:
File "tokenizer.json" not found at model id "mistralai/Mistral-7B-Instruct-v0.1"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

mistral.rs % ./mistralrs_server --port 1234 x-lora-plain -o orderings/xlora-paper-ordering.json -x lamm-mit/x-lora
error: unrecognized subcommand 'x-lora-plain'

  tip: some similar subcommands exist: 'x-lora-gguf', 'lora', 'x-lora-ggml', 'x-lora'

Usage: mistralrs_server [OPTIONS] <COMMAND>

For more information, try '--help'.

mistral.rs % ./mistralrs_server --port 1234 lora-gguf -o orderings/xlora-paper-ordering.json -m TheBloke/zephyr-7B-beta-GGUF -f zephyr-7b-beta.Q8_0.gguf -x lamm-mit/x-lora
error: unexpected argument '-x' found

Usage: mistralrs_server lora-gguf [OPTIONS] --quantized-model-id <QUANTIZED_MODEL_ID> --quantized-filename <QUANTIZED_FILENAME> --adapters-model-id <ADAPTERS_MODEL_ID> --order <ORDER>

For more information, try '--help'.
mistral.rs % ./mistralrs_server --port 1234 gguf -t mistralai/Mistral-7B-Instruct-v0.1 -m TheBloke/Mistral-7B-Instruct-v0.1-GGUF -f mistral-7b-instruct-v0.1.Q4_K_M.gguf
2024-04-27T12:17:09.522733Z  INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2024-04-27T12:17:09.522757Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-04-27T12:17:09.522761Z  INFO mistralrs_server: Loading model `mistralai/Mistral-7B-Instruct-v0.1` on Metal(MetalDevice(DeviceId(1)))...
2024-04-27T12:17:09.522772Z  INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
thread 'main' panicked at mistralrs-core/src/pipeline/gguf.rs:303:9:
File "tokenizer.json" not found at model id "mistralai/Mistral-7B-Instruct-v0.1"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

mistral.rs % ./mistralrs_server --port 1234 gguf -m mistralai/Mistral-7B-Instruct-v0.1
error: the following required arguments were not provided:
  --tok-model-id <TOK_MODEL_ID>
  --quantized-filename <QUANTIZED_FILENAME>

Usage: mistralrs_server gguf --tok-model-id <TOK_MODEL_ID> --quantized-model-id <QUANTIZED_MODEL_ID> --quantized-filename <QUANTIZED_FILENAME>

For more information, try '--help'.

I guess there should be a clear way to say what is tokenizer.json, where to get it from, or something like that. There are also some unrecognized arguments

commit 092deeec5ed9c45b36df280d6eba2b0632d4f415

Apr 27 '24 12:04 igo

@igo, do you have a HF token set in your cache? Mistral requires an HF token, so if you set it to 'none' it will not work.

Apr 27 '24 12:04 EricLBuehler

Yes in ~/.cache/huggingface/token, from here:

Apr 27 '24 12:04 igo

@igo, #225 fixed #223 which looks similar. On my machine, the examples you gave work. Does it work for you?

Regarding this command:

./mistralrs_server --port 1234 lora-gguf -o orderings/xlora-paper-ordering.json -m TheBloke/zephyr-7B-beta-GGUF -f zephyr-7b-beta.Q8_0.gguf -x lamm-mit/x-lora
error: unexpected argument '-x' found

You should specify the adapter repository with -a instead of -x because you are trying to load a LoRA model. -x means --x-lora-model-id and -a means --adapter-model-id.

Apr 27 '24 14:04 EricLBuehler

It does not compile anymore:

% cargo build --release --features metal
   Compiling mistralrs-lora v0.1.1 (/Users/igo/playground/mistral.rs/mistralrs-lora)
   Compiling http-body-util v0.1.1
   Compiling tokio-rayon v2.1.0
   Compiling minijinja v1.0.21
   Compiling rand_isaac v0.3.0
   Compiling pin-project-internal v1.1.5
   Compiling cpufeatures v0.2.12
   Compiling range-checked v0.1.0 (https://github.com/EricLBuehler/range-checked.git#655349cc)
   Compiling same-file v1.0.6
   Compiling mime v0.3.17
   Compiling overload v0.1.1
   Compiling byteorder v1.5.0
   Compiling rustc-hash v1.1.0
   Compiling httpdate v1.0.3
error[E0599]: no variant or associated item named `TensorF16` found for enum `QMatMul` in the current scope
   --> mistralrs-lora/src/loralinear.rs:152:54
    |
152 |             QMatMul::Tensor(w_base_layer) | QMatMul::TensorF16(w_base_layer) => {
    |                                                      ^^^^^^^^^
    |                                                      |
    |                                                      variant or associated item not found in `QMatMul`
    |                                                      help: there is a variant with a similar name: `Tensor`

error[E0599]: no variant or associated item named `TensorF16` found for enum `QMatMul` in the current scope
   --> mistralrs-lora/src/qloralinear.rs:175:43
    |
175 |             QMatMul::Tensor(_) | QMatMul::TensorF16(_) => unreachable!(),
    |                                           ^^^^^^^^^
    |                                           |
    |                                           variant or associated item not found in `QMatMul`
    |                                           help: there is a variant with a similar name: `Tensor`

   Compiling nu-ansi-term v0.46.0
For more information about this error, try `rustc --explain E0599`.
error: could not compile `mistralrs-lora` (lib) due to 2 previous errors
warning: build failed, waiting for other jobs to finish...

You should specify the adapter repository with -a instead of -x because you are trying to load a LoRA model. -x means --x-lora-model-id and -a means --adapter-model-id.

Regarding that command, I copied it from README so I would expect it should work.

Apr 29 '24 06:04 igo

Regarding that command, I copied it from README so I would expect it should work.

Ah, sorry. I will fix it.

It does not compile anymore:

It should compile now, can you please try it?

Apr 29 '24 08:04 EricLBuehler

Still failing

% cargo clean                           
     Removed 2821 files, 623.8MiB total
% cargo build --release --features metal
   Compiling proc-macro2 v1.0.81
   Compiling unicode-ident v1.0.12
   Compiling libc v0.2.153
   Compiling autocfg v1.2.0
   Compiling cfg-if v1.0.0
   Compiling serde v1.0.199
   Compiling version_check v0.9.4
   Compiling once_cell v1.19.0
   Compiling libm v0.2.8
   Compiling cc v1.0.95
   Compiling log v0.4.21
   Compiling bitflags v1.3.2
   Compiling memchr v2.7.2
   Compiling itoa v1.0.11
   Compiling crossbeam-utils v0.8.19
   Compiling pin-project-lite v0.2.14
   Compiling num-traits v0.2.18
   Compiling bitflags v2.5.0
   Compiling rayon-core v1.12.1
   Compiling thiserror v1.0.59
   Compiling core-foundation-sys v0.8.6
   Compiling paste v1.0.14
   Compiling ppv-lite86 v0.2.17
   Compiling crc32fast v1.4.0
   Compiling heck v0.4.1
   Compiling syn v1.0.109
   Compiling ryu v1.0.17
   Compiling typenum v1.17.0
   Compiling quote v1.0.36
   Compiling crossbeam-epoch v0.9.18
   Compiling syn v2.0.60
   Compiling crossbeam-deque v0.8.5
   Compiling smallvec v1.13.2
   Compiling generic-array v0.14.7
   Compiling target-lexicon v0.12.14
   Compiling serde_json v1.0.116
   Compiling reborrow v0.5.5
   Compiling aho-corasick v1.1.3
   Compiling regex-syntax v0.8.3
   Compiling byteorder v1.5.0
   Compiling tracing-core v0.1.32
   Compiling getrandom v0.2.14
   Compiling rand_core v0.6.4
   Compiling futures-core v0.3.30
   Compiling num_cpus v1.16.0
   Compiling core-foundation v0.9.4
   Compiling slab v0.4.9
   Compiling rand_chacha v0.3.1
   Compiling raw-cpuid v10.7.0
   Compiling seq-macro v0.3.5
   Compiling rand v0.8.5
   Compiling regex-automata v0.4.6
   Compiling pyo3-build-config v0.21.2
   Compiling mio v0.8.11
   Compiling lazy_static v1.4.0
   Compiling fnv v1.0.7
   Compiling futures-sink v0.3.30
   Compiling futures-channel v0.3.30
   Compiling rand_distr v0.4.3
   Compiling objc_exception v0.1.2
   Compiling ahash v0.8.11
   Compiling siphasher v0.3.11
   Compiling futures-task v0.3.30
   Compiling pin-utils v0.1.0
   Compiling futures-io v0.3.30
   Compiling semver v1.0.22
   Compiling phf_shared v0.11.2
   Compiling synstructure v0.13.1
   Compiling zerocopy v0.7.32
   Compiling regex v1.10.4
   Compiling socket2 v0.5.6
   Compiling malloc_buf v0.0.6
   Compiling ring v0.17.8
   Compiling allocator-api2 v0.2.18
   Compiling strsim v0.10.0
   Compiling ident_case v1.0.1
   Compiling foreign-types-shared v0.3.1
   Compiling serde_derive v1.0.199
   Compiling bytemuck_derive v1.6.0
   Compiling thiserror-impl v1.0.59
   Compiling enum-as-inner v0.6.0
   Compiling tracing-attributes v0.1.27
   Compiling futures-macro v0.3.30
   Compiling foreign-types-macros v0.2.3
   Compiling tokio-macros v2.2.0
   Compiling percent-encoding v2.3.1
   Compiling foreign-types v0.5.0
   Compiling form_urlencoded v1.2.1
   Compiling tokio v1.37.0
   Compiling futures-util v0.3.30
   Compiling bytemuck v1.15.0
   Compiling tracing v0.1.40
   Compiling num-complex v0.4.5
   Compiling sysctl v0.5.5
   Compiling pulp v0.18.10
   Compiling dyn-stack v0.10.0
   Compiling half v2.4.1
   Compiling zerofrom-derive v0.1.3
   Compiling darling_core v0.14.4
   Compiling hashbrown v0.14.3
   Compiling phf_generator v0.11.2
   Compiling objc v0.2.7
   Compiling rustc_version v0.4.0
   Compiling core-graphics-types v0.1.3
   Compiling lock_api v0.4.12
   Compiling rustix v0.38.34
   Compiling tinyvec_macros v0.1.1
   Compiling parking_lot_core v0.9.10
   Compiling portable-atomic v1.6.0
   Compiling equivalent v1.0.1
   Compiling stable_deref_trait v1.2.0
   Compiling block v0.1.6
   Compiling metal v0.27.0
   Compiling darling_macro v0.14.4
   Compiling either v1.11.0
   Compiling zerofrom v0.1.3
   Compiling indexmap v2.2.6
   Compiling rayon v1.10.0
   Compiling tinyvec v1.6.0
   Compiling vob v3.0.3
   Compiling phf_codegen v0.11.2
   Compiling yoke-derive v0.7.3
   Compiling parse-zoneinfo v0.3.0
   Compiling phf v0.11.2
   Compiling errno v0.3.8
   Compiling unicode-width v0.1.12
   Compiling untrusted v0.9.0
   Compiling scopeguard v1.2.0
   Compiling bytes v1.6.0
   Compiling utf8parse v0.2.1
   Compiling signal-hook v0.3.17
   Compiling spin v0.9.8
   Compiling rustversion v1.0.15
   Compiling pkg-config v0.3.30
   Compiling rustls-pki-types v1.5.0
   Compiling gemm-common v0.17.1
   Compiling onig_sys v69.8.1
   Compiling http v1.1.0
   Compiling gemm-f32 v0.17.1
   Compiling gemm-f64 v0.17.1
   Compiling gemm-c32 v0.17.1
   Compiling gemm-c64 v0.17.1
   Compiling gemm-f16 v0.17.1
   Compiling gemm v0.17.1
   Compiling candle-metal-kernels v0.5.0 (https://github.com/EricLBuehler/candle.git#e385e2b8)
   Compiling yoke v0.7.3
   Compiling anstyle-parse v0.2.3
   Compiling unicode-normalization v0.1.23
   Compiling chrono-tz-build v0.2.1
   Compiling safetensors v0.4.3
   Compiling darling v0.14.4
   Compiling memmap2 v0.9.4
   Compiling pyo3-ffi v0.21.2
   Compiling zip v0.6.6
   Compiling signal-hook-registry v1.4.2
   Compiling security-framework-sys v2.10.0
   Compiling unicase v2.7.0
   Compiling rustls v0.22.4
   Compiling anyhow v1.0.82
   Compiling native-tls v0.2.11
   Compiling anstyle v1.0.6
   Compiling adler v1.0.2
   Compiling unicode-bidi v0.3.15
   Compiling fastrand v2.0.2
   Compiling colorchoice v1.0.0
   Compiling anstyle-query v1.0.2
   Compiling anstream v0.6.13
   Compiling idna v0.5.0
   Compiling tempfile v3.10.1
   Compiling miniz_oxide v0.7.2
   Compiling candle-core v0.5.0 (https://github.com/EricLBuehler/candle.git#e385e2b8)
   Compiling security-framework v2.10.0
   Compiling chrono-tz v0.8.6
   Compiling derive_builder_core v0.12.0
   Compiling parking_lot v0.12.2
   Compiling rustls-webpki v0.102.3
   Compiling http-body v1.0.0
   Compiling console v0.15.8
   Compiling esaxx-rs v0.1.10
   Compiling iana-time-zone v0.1.60
   Compiling memoffset v0.9.1
   Compiling num-bigint v0.4.4
   Compiling rust_decimal v1.35.0
   Compiling subtle v2.5.0
   Compiling minimal-lexical v0.2.1
   Compiling number_prefix v0.4.0
   Compiling eyre v0.6.12
   Compiling bit-vec v0.6.3
   Compiling strsim v0.11.1
   Compiling zeroize v1.7.0
   Compiling heck v0.5.0
   Compiling clap_lex v0.7.0
   Compiling option-ext v0.2.0
   Compiling dirs-sys v0.4.1
   Compiling clap_derive v4.5.4
   Compiling clap_builder v4.5.2
   Compiling bit-set v0.5.3
   Compiling nom v7.1.3
   Compiling indicatif v0.17.8
   Compiling chrono v0.4.38
   Compiling derive_builder_macro v0.12.0
   Compiling candle-nn v0.5.0 (https://github.com/EricLBuehler/candle.git#e385e2b8)
   Compiling signal-hook-mio v0.2.3
   Compiling flate2 v1.0.29
   Compiling url v2.5.0
   Compiling crypto-common v0.1.6
   Compiling block-buffer v0.10.4
   Compiling webpki-roots v0.26.1
   Compiling itertools v0.11.0
   Compiling packedvec v1.2.4
   Compiling monostate-impl v0.1.12
   Compiling pyo3 v0.21.2
   Compiling pyo3-macros-backend v0.21.2
   Compiling num-integer v0.1.46
   Compiling proc-macro-error-attr v1.0.4
   Compiling macro_rules_attribute-proc_macro v0.2.0
   Compiling base64 v0.13.1
   Compiling indenter v0.3.3
   Compiling httparse v1.8.0
   Compiling unicode-segmentation v1.11.0
   Compiling base64 v0.22.0
   Compiling arrayvec v0.7.4
   Compiling spm_precompiled v0.1.4
   Compiling ureq v2.9.7
   Compiling macro_rules_attribute v0.2.0
   Compiling onig v6.4.0
   Compiling rayon-cond v0.3.0
   Compiling monostate v0.1.12
   Compiling pyo3-macros v0.21.2
   Compiling sparsevec v0.2.0
   Compiling digest v0.10.7
   Compiling cpufeatures v0.2.12
   Compiling crossterm v0.25.0
   Compiling derive_builder v0.12.0
   Compiling clap v4.5.4
   Compiling fancy-regex v0.13.0
   Compiling dirs v5.0.1
   Compiling cfgrammar v0.13.4
   Compiling itertools v0.12.1
   Compiling serde_plain v1.0.2
   Compiling futures-executor v0.3.30
   Compiling unicode-normalization-alignments v0.1.12
   Compiling nibble_vec v0.1.0
   Compiling proc-macro-error v1.0.4
   Compiling tower-layer v0.3.2
   Compiling defmac v0.1.3
   Compiling indoc v2.0.5
   Compiling unindent v0.2.3
   Compiling same-file v1.0.6
   Compiling tower-service v0.3.2
   Compiling endian-type v0.1.2
   Compiling regex-syntax v0.6.29
   Compiling unchecked-index v0.2.2
   Compiling unicode_categories v0.1.1
   Compiling tokenizers v0.15.2
   Compiling galil-seiferas v0.1.5
   Compiling radix_trie v0.2.1
   Compiling walkdir v2.5.0
   Compiling futures v0.3.30
   Compiling candle-transformers v0.5.0 (https://github.com/EricLBuehler/candle.git#e385e2b8)
   Compiling lrtable v0.13.4
   Compiling hf-hub v0.3.2
   Compiling regex-automata v0.1.10
   Compiling tqdm v0.7.0
   Compiling sha2 v0.10.8
   Compiling mistralrs-lora v0.1.1 (/Users/igo/playground/mistral.rs/mistralrs-lora)
error[E0599]: no variant or associated item named `TensorF16` found for enum `QMatMul` in the current scope
   --> mistralrs-lora/src/loralinear.rs:152:54
    |
152 |             QMatMul::Tensor(w_base_layer) | QMatMul::TensorF16(w_base_layer) => {
    |                                                      ^^^^^^^^^
    |                                                      |
    |                                                      variant or associated item not found in `QMatMul`
    |                                                      help: there is a variant with a similar name: `Tensor`

error[E0599]: no variant or associated item named `TensorF16` found for enum `QMatMul` in the current scope
   --> mistralrs-lora/src/qloralinear.rs:175:43
    |
175 |             QMatMul::Tensor(_) | QMatMul::TensorF16(_) => unreachable!(),
    |                                           ^^^^^^^^^
    |                                           |
    |                                           variant or associated item not found in `QMatMul`
    |                                           help: there is a variant with a similar name: `Tensor`

   Compiling axum-core v0.4.3
For more information about this error, try `rustc --explain E0599`.
error: could not compile `mistralrs-lora` (lib) due to 2 previous errors
warning: build failed, waiting for other jobs to finish...

Apr 30 '24 07:04 igo

Can you please run cargo update and git pull? There was a new variant TensorF16 introduced on our Candle fork a few days ago.

Apr 30 '24 09:04 EricLBuehler

Great! Finally some command that works:

./mistralrs_server --token-source none -i plain -m microsoft/Phi-3-mini-128k-instruct -a phi3

Although it's much much slow compared to Ollama's phi3 (maybe because of different quantization).

Some commands still doesn't work but don't know why if I was able to download phi3:

% ./mistralrs_server --port 1234 --log output.log gguf -m TheBloke/Mistral-7B-Instruct-v0.1-GGUF -t mistralai/Mistral-7B-Instruct-v0.1 -f mistral-7b-instruct-v0.1.Q4_K_M.gguf
2024-04-30T11:24:22.051716Z  INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2024-04-30T11:24:22.051737Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-04-30T11:24:22.051741Z  INFO mistralrs_server: Loading model `mistralai/Mistral-7B-Instruct-v0.1` on Metal(MetalDevice(DeviceId(1)))...
2024-04-30T11:24:22.051748Z  INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
thread 'main' panicked at mistralrs-core/src/pipeline/gguf.rs:261:9:
RequestError(Status(403, Response[status: 403, status_text: Forbidden, url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/resolve/main/tokenizer.json]))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

or

% ./mistralrs_server --port 1234 lora-gguf -o orderings/xlora-paper-ordering.json -m TheBloke/zephyr-7B-beta-GGUF -f zephyr-7b-beta.Q8_0.gguf -a lamm-mit/x-lora
2024-04-30T11:30:23.733548Z  INFO mistralrs_server: avx: false, neon: true, simd128: false, f16c: false
2024-04-30T11:30:23.733568Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-04-30T11:30:23.733572Z  INFO mistralrs_server: Loading model `lamm-mit/x-lora` on Metal(MetalDevice(DeviceId(1)))...
2024-04-30T11:30:23.733579Z  INFO mistralrs_server: Model kind is: lora, quantized from gguf
  0%|▊                                                                                                                                                                                                                       | 1/322 [00:00<?, ?it/s-[AGXG13XFamilyCommandBuffer try-[AGXG13XFamilyCommandBuffer tryCoalescingPreviousComputeCommandCoalescingPreviousComputeCommandEncoderWithConfig:nextEncoderClaEncoderWithConfig:nextEncoderClass:]:1015: failed assertion `A css:]:1015: failed assertion `A command encoder is already encoding to this command buffer'                                                                                                                                                         | 1/322 [00:00<?, ?it/s]-[AGXG13XFamilyCommandBuffer tryCoalescingPreviousComputeCommandEncoderWithConfig:nextEncoderClass:]:1015: failed assertion `A command encoder is already encoding to this command buffer'                                   | 1/322 [00:00<?, ?it/s]-[AGXG13XFamilyCommandBuffer tryCoalescingPreviousComputeCommandEncoderWithConfig:nextEncoderClass:]:1015: failed assertion `A command encoder is already encoding to this command buffer'                                   | 1/322 [00:00<?, ?it/s]ommand encoder is already encoding to this command buffer'                                                                                                                                                                   | 1/322 [00:00<?, ?it/s]-[AGXG13XFamilyCommandBuffer tryCoalescingPreviousComputeCommandEncoderWithConfig:nextEncoderClass:]:1015: failed assertion `A command encoder is already encoding to this command buffer'                                   | 1/322 [00:00<?, ?it/s]-[AGXG13XFamilyCommandBuffer tryCoalescingPreviousComputeCommandEncoderWithConfig:nextEncoderCla-[AGXG13XFamilyCommandBuffer tryCoalescingPreviousComputeCommandzsh: abort      ./mistralrs_server --port 1234 lora-gguf -o  -m TheBloke/zephyr-7B-beta-GGUF

Apr 30 '24 11:04 igo

Great! I'm glad that it works.

Although it's much much slow compared to Ollama's phi3 (maybe because of different quantization).

Are you comparing against Ollama quantized phi3? In that case, this will be slower. Can you try with ISQ? That should make it faster while I am working on adding quantized Phi3 support.

Some commands still doesn't work but don't know why if I was able to download phi3:

The Mistral example you gave below responded with a 403 HTTP error when trying to download the tokenizer.json. Can you please confirm that you have access to the Mistral model? It is gated and all you need to do is accept the terms.

Apr 30 '24 11:04 EricLBuehler

@igo, I'm just closing this issue as I think the problems are resolved. However please feel free to reopen!

May 07 '24 23:05 EricLBuehler

mistral.rs mistral.rs copied to clipboard

Improve getting started docs

mistral.rs
mistral.rs copied to clipboard