parallelformers
parallelformers copied to clipboard
Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
## Describe a requested feature Thanks for releasing this great library! I am currently working on deploying [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B), which is currently not supported by parallelformers. [POLICY.md](https://github.com/tunib-ai/parallelformers/blob/main/POLICY.md) provides a comprehensive guide...
## Describe a requested feature Can you please add support for gpt_neox Its official documentation is here https://huggingface.co/docs/transformers/model_doc/gpt_neox
## How to reproduce First of all, thanks for this great project! I'm facing an issue running the test code provided [here](https://github.com/tunib-ai/parallelformers/blob/main/tests/seq2seq_lm.py) on Kubernetes. This is what I'm running inside...
I am trying to use Roberta NER and BERT NER uncased but for both of the models I am getting the following issues. Is it something which is still under...
Hi, I'm very interested in this work, looks super interesting and useful. Unfortunately one of my models is an EncoderDecoder model and I have no idea how to get it...
Hi there! Thanks for the awesome work on this lib! Just wanted to ask what the recommended way is to clean up a loaded model that has been `parallelize`d using...
GPU행업 이슈
## How to reproduce ```python tokenizer = AutoTokenizer.from_pretrained(model_name, bos_token='[BOS]', eos_token='[EOS]', unk_token='[UNK]', pad_token='[PAD]', mask_token='[MASK]') model = AutoModelForCausalLM.from_pretrained(model_name)#.to(device='cuda', non_blocking=True) _ = model.eval() parallelformers.parallelize(model, num_gpus=4, fp16=True, verbose='detail') tok = tokenizer("My name is Kevin."*10,...
I am using a 3060 and a 3090 to split GPT models two ways including GPTJ and GPT Neo 2.7B. When generating many tokens, say 500, the model hangs and...
Is it possible to use this library for CNN networks implemented with pytorch? Can you show me an example?