sgsdxzy

Results 117 comments of sgsdxzy

Here's my results on 3080Ti, fig size 512x512, eular a 30 steps, batch size=8: | settings | it/s | | --- | --- | | default | 2.10 | |...

I met similar problems. I think this is probably caused by tokenizer config adding extra tokens and not handled correctly.

@oobabooga Update: It seems I have to load the whole model for every process and let it chunk (before I split load the model to multiple GPUs so each process...

@oobabooga Here are some mixed news, and still very interesting: First, I managed to get GPT-Neo and OPT to work. In fact the kernel support list includes most types textui...

@huangjiaheng 你可以用中文,我看得懂 It seems your translation software is cutting off sentences. If you struggle with English you can use Chinese.

Update: I can get split loading to work according to example https://github.com/huggingface/transformers-bloom-inference/blob/e970be1027afc43c147d06153635f4285c517081/bloom-inference-scripts/bloom-ds-inference.py but int8 and llama is still not working yet

With the help from https://github.com/microsoft/DeepSpeed/issues/3099, I managed to make tensor parallel inference working for Llama! However I noticed that without a custom optimized kernel, the performance does not scale: 2080Ti...

I think maybe you can make groupsize a parameter that defaults to 128, not a hard-coded one. That can also support -1 to load old 4bit models.

@oobabooga Now that @ortegaalfredo has pinned down the problem, this is easy to fix by replicating the original device map: ``` params['device_map'] = {"base_model.model."+k: v for k, v in shared.model.hf_device_map.items()}...

I am wondering if the model.half() is still necessary, as it can take several minutes for large models.