Mayank Mishra comments

Results 187 comments of


                                            Mayank Mishra

About convert deepspeed to deepspeed checkpoint

But i am not sure why this is happening with only swiglu

Support latest Transformers and new cache design

from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url ImportError: cannot import name 'cached_path' from 'transformers.utils fails with transformers 4.23.1 :( ```python path = snapshot_download( repo_id=model_name, allow_patterns=["*"], local_files_only=is_offline_mode(), cache_dir=os.getenv("TRANSFORMERS_CACHE", None) ) ```...

Support latest Transformers and new cache design

I have tested this @mrwyattii and it works fine. One thing to note is that: earlier I had to pass the path as: TRANSFORMERS_CACHE/models-bigscience-bloom and now it is just: TRANSFORMERS_CACHE....

Support latest Transformers and new cache design

I would say, after a few versions, we can drop support for older transformers maybe? I don't really think its needed since I think there is only a handful of...

Support latest Transformers and new cache design

Can we merge this?

Possible memory leak when inferencing BLOOM 176B

I converted my server to flask and ran with gunicorn with 1 worker. This serializes all requests however

Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB?

This is not possible. But you might want to take a look at QLoRA paper: https://github.com/artidoro/qlora

How to understand this note: "note: Since Deepspeed-ZeRO can process multiple generate streams in parallel its throughput can be further divided by 8 or 16 ..."

Hey, ds-inference is also doing world_size streams However, accelerate is only doing 1 stream since we are just using naive pipeline parallelism capability from accelerate. A more efficient approach for...

How to save / form the config.json after fine-tuning - Flan T5 11b

Try running in bf16 instead of fp32. Also, you can look at ONNX/TensorRT

How to save / form the config.json after fine-tuning - Flan T5 11b

8x 40G A100s should be enough for PEFT training of FLAN. Can you tell me what backend you are using? Are you not using DeepSpeed?