inference chatglm3-6b with int4 and 8k input prompt failed

Open Fred-cell opened this issue 1 year ago • 1 comments

bigdl-llm: 2.5.0b20240321, all-in-one benchmark tools: 8k prompt refers https://github.com/intel/xFasterTransformer/blob/main/benchmark/prompt.json 2024-03-22 20:38:03,260 - INFO - intel_extension_for_pytorch auto imported Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:04<00:00, 1.52it/s] 2024-03-22 20:38:08,095 - INFO - Converting the current model to sym_int4 format......

loading of model costs 8.393956548999995s and 3.583984375GB <class 'transformers_modules.chatglm3-6b.modeling_chatglm.ChatGLMForConditionalGeneration'> /build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524544,0,0], local id: [256,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed /build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524545,0,0], local id: [257,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed /build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524546,0,0], local id: [258,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed /build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524547,0,0], local id: [259,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed /build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524548,0,0], local id: [260,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed /build/intel-pytorch-extension/csrc/gpu/aten/operators/Indexing.h:670: operator(): global id: [524549,0,0], local id: [261,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed

Mar 22 '24 12:03 Fred-cell

Should be the same issue as https://github.com/intel-analytics/ipex-llm/issues/10513

As we test, when the input length is larger than 8166, it will show the same error as above. When the input length is smaller or equal to 8166, it will show the error that is similar to llama2 8k issue, which is the IPEX allocation error:

The root cause should be the same. Need further investigation.

Mar 29 '24 01:03 hkvision