parallelformers issues

Support for XGLM: How to achieve faster inference speed?

5

## Describe a requested feature Thanks for releasing this great library! I am currently working on deploying [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B), which is currently not supported by parallelformers. [POLICY.md](https://github.com/tunib-ai/parallelformers/blob/main/POLICY.md) provides a comprehensive guide...

un-certainty

enhancement

Can you please add support for gpt_neox

2

## Describe a requested feature Can you please add support for gpt_neox Its official documentation is here https://huggingface.co/docs/transformers/model_doc/gpt_neox

tahercoolguy

enhancement

Issue running parallelformers test script in a VM

1

## How to reproduce First of all, thanks for this great project! I'm facing an issue running the test code provided [here](https://github.com/tunib-ai/parallelformers/blob/main/tests/seq2seq_lm.py) on Kubernetes. This is what I'm running inside...

Mehrad0711

bug

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

2

I am trying to use Roberta NER and BERT NER uncased but for both of the models I am getting the following issues. Is it something which is still under...

samarthsarin

bug

EncoderDecoder support

Hi, I'm very interested in this work, looks super interesting and useful. Unfortunately one of my models is an EncoderDecoder model and I have no idea how to get it...

d-miketa

enhancement

Recommended way for cleaning up?

Hi there! Thanks for the awesome work on this lib! Just wanted to ask what the recommended way is to clean up a loaded model that has been `parallelize`d using...

creatorrr

GPU행업 이슈

4

## How to reproduce ```python tokenizer = AutoTokenizer.from_pretrained(model_name, bos_token='[BOS]', eos_token='[EOS]', unk_token='[UNK]', pad_token='[PAD]', mask_token='[MASK]') model = AutoModelForCausalLM.from_pretrained(model_name)#.to(device='cuda', non_blocking=True) _ = model.eval() parallelformers.parallelize(model, num_gpus=4, fp16=True, verbose='detail') tok = tokenizer("My name is Kevin."*10,...

jason9693

bug

GPT models hang on large token generation. Lower performance?

1

I am using a 3060 and a 3090 to split GPT models two ways including GPTJ and GPT Neo 2.7B. When generating many tokens, say 500, the model hangs and...

mallorbc

bug

Use this library for CNN networks like Unet

Is it possible to use this library for CNN networks implemented with pytorch? Can you show me an example?

cporrasn

enhancement

parallelformers
parallelformers copied to clipboard

Metadata

Support for XGLM: How to achieve faster inference speed?

Can you please add support for gpt_neox

Issue running parallelformers test script in a VM

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

EncoderDecoder support

Recommended way for cleaning up?

GPU행업 이슈

GPT models hang on large token generation. Lower performance?

Use this library for CNN networks like Unet

← Metadata

Owner

Metadata

parallelformers parallelformers copied to clipboard

Metadata

← Metadata

Owner

Metadata

parallelformers
parallelformers copied to clipboard