Mayank Mishra

Results 187 comments of Mayank Mishra

But i am not sure why this is happening with only swiglu

from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url ImportError: cannot import name 'cached_path' from 'transformers.utils fails with transformers 4.23.1 :( ```python path = snapshot_download( repo_id=model_name, allow_patterns=["*"], local_files_only=is_offline_mode(), cache_dir=os.getenv("TRANSFORMERS_CACHE", None) ) ```...

I have tested this @mrwyattii and it works fine. One thing to note is that: earlier I had to pass the path as: TRANSFORMERS_CACHE/models-bigscience-bloom and now it is just: TRANSFORMERS_CACHE....

I would say, after a few versions, we can drop support for older transformers maybe? I don't really think its needed since I think there is only a handful of...

I converted my server to flask and ran with gunicorn with 1 worker. This serializes all requests however

This is not possible. But you might want to take a look at QLoRA paper: https://github.com/artidoro/qlora

Hey, ds-inference is also doing world_size streams However, accelerate is only doing 1 stream since we are just using naive pipeline parallelism capability from accelerate. A more efficient approach for...

Try running in bf16 instead of fp32. Also, you can look at ONNX/TensorRT

8x 40G A100s should be enough for PEFT training of FLAN. Can you tell me what backend you are using? Are you not using DeepSpeed?