djl-serving [Not DJLServing]HFPipeline error

[Not DJLServing]HFPipeline error - GPT Neox

Open sindhuvahinis opened this issue 2 years ago • 0 comments

Description

This is not actually an error on DJLServing. Just tracking this here. Will raise an issue in HF as well. HF Pipeline actually trying to generate the outputs on CPU despite including the device_map=auto as configuration for GPT_NeoX 20B model.

Workaround is to use model.generate method by manually converting the input_ids to GPU.

Error Message

 Bug: RuntimeError: "topk_cpu" not implemented for 'Half'

How to reproduce?

Trying GPT_NEOX 20B model with our huggingface.py handler.

This was actually recorded as issues in transformers.

https://github.com/huggingface/transformers/issues/18703 https://github.com/huggingface/transformers/issues/19445

Jan 26 '23 02:01 sindhuvahinis

djl-serving djl-serving copied to clipboard

[Not DJLServing]HFPipeline error - GPT Neox

Description

Error Message

How to reproduce?

djl-serving
djl-serving copied to clipboard