Arno Candel

Results 170 comments of Arno Candel

https://github.com/huggingface/text-generation-inference/blob/70f485bf9f601b5450b00894f56e20b973d1c2e4/server/text_generation_server/utils/gptq/quantize.py#L818-L819 need to hack these into GPTQ model states first.

can you try directly without TGI: `CUDA_VISIBLE_DEVICES=0,1,2,3 SAVE_DIR=./save40b python generate.py --base_model=$MODEL --height=500 --debug --langchain_mode=ChatLLM --visible_langchain_modes="['ChatLLM', 'UserData', 'MyData']" --score_model=None --max_max_new_tokens=2048 --max_new_tokens=512 --infer_devices=False &>> logs.$MODEL_NAME.gradio_chat.txt &` updated above to skip text-generation-inference server,...

### Fine-tuning `CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model=mosaicml/mpt-30b --num_epochs=0.01 --lora_target_modules=['Wqkv'] --train_4bit=True --micro_batch_size=1 --batch_size=1` fails with: `ValueError: MPTForCausalLM does not support gradient checkpointing.`

TGI support: https://github.com/huggingface/text-generation-inference/issues/290

This works out of the box: `python generate.py --base_model=mosaicml/mpt-30b-instruct`

TGI itself seems the issue, works with 0.8.2, fails with 0.9.1

from `modelling_RW.py`: ``` class RWPreTrainedModel(PreTrainedModel): _keys_to_ignore_on_load_missing = [r"h.*.self_attention.scale_mask_softmax.causal_mask", r"lm_head.weight"] """ An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models. "" ``` so...

Created by 8b127fae2bcdfc69266ce35f917a096fe5824a48 export script.

same thing for locally created 7B export ``` export MODEL=h2ogpt-oasst1-2048-falcon-7b export HF_PORT=5000 export CUDA_VISIBLE_DEVICES=0 ``` `docker run --gpus all --shm-size 1g -e CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES -e TRANSFORMERS_CACHE="/.cache/" -p $HF_PORT:80 -v $HOME/.cache:/.cache/ -v...

compare with https://github.com/h2oai/h2o-llmstudio/blob/fd8d879ac9e56203394afca76079c383dcf8ddc0/app_utils/sections/chat.py#L290-L340