mistral-inference icon indicating copy to clipboard operation
mistral-inference copied to clipboard

Official inference library for Mistral models

Results 172 mistral-inference issues
Sort by recently updated
recently updated
newest added

Dear Team, You've done a tremendous job! Thank you for creating a real alternative for French and other European languages. I would like to know is it possible to add...

As I understand the current MoeLayer, a gate calculates the weight to be applied to the output of each expert, the top k are selected and run on the data,...

fixed dead link. points to current documentation now.

Actual behavior of https://docs.mistral.ai/usage/guardrailing: ![Screenshot 2023-12-31 at 6 34 14 PM](https://github.com/mistralai/mistral-src/assets/1118615/9805f684-36e3-49f4-9c3f-23278538dee9) Fix it to https://docs.mistral.ai/platform/guardrailing/: ![image](https://github.com/mistralai/mistral-src/assets/1118615/48754d80-02df-4884-ac62-512630835193)

Hi, I'd like to know will mistral planning to support more languages?

Minor typos in `readme.md` and tutorials

Here is the SFTtrainer method i used for finetuning mistral ``` trainer = SFTTrainer( model=peft_model, train_dataset=data, peft_config=peft_config, dataset_text_field=" column name", max_seq_length=3000, tokenizer=tokenizer, args=training_arguments, packing=packing, ) trainer.train() ``` I found different...

Hi, I have used the source code here and downloaded the weight instruct-v0.2 from https://docs.mistral.ai/models/. And in the source code, I have set '''instruct: bool = True''' in main.py. I...

I am finetuning the mistral model using the following configurations ``` training_arguments = TrainingArguments( output_dir=output_dir, per_device_train_batch_size=per_device_train_batch_size, gradient_accumulation_steps=gradient_accumulation_steps, optim=optim, save_steps=save_steps, logging_strategy="steps", logging_steps=10, learning_rate=learning_rate, weight_decay=weight_decay, fp16=fp16, bf16=bf16, max_grad_norm=max_grad_norm, max_steps=13000, warmup_ratio=warmup_ratio, group_by_length=group_by_length, lr_scheduler_type=lr_scheduler_type...