WebGLM RuntimeError: CUDA driver error: device-side assert triggered

os: ubuntu 18:04 python: 3.9 cuda: 11.8 启动命令：python web_demo.py -w THUDM/WebGLM-2B --searcher bing question: 大连在中国什么位置 error: WebGLM Initializing... WebGLM Loaded Running on local URL: http://0.0.0.0:8032 [System] Searching ... [System] Count of available urls: 15 [System] Fetching ... [System] Count of available fetch results: 2147719 [System] Extracting ... [System] Count of paragraphs: 136 [System] Filtering ... Input length of input_ids is 1068, butmax_lengthis set to 1024. This can lead to unexpected behavior. You should consider increasingmax_new_tokens. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [39,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [39,0,0], thread: [65,0,0] AssertionsrcIndex < srcSelectDimSizefailed. ... Traceback (most recent call last): File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/routes.py", line 427, in run_predict output = await app.get_blocks().process_api( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/blocks.py", line 1323, in process_api result = await self.call_function( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/blocks.py", line 1067, in call_function prediction = await utils.async_iteration(iterator) File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/utils.py", line 336, in async_iteration return await iterator.__anext__() File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/utils.py", line 329, in __anext__ return await anyio.to_thread.run_sync( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, *args) File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/utils.py", line 312, in run_sync_iterator_async return next(iterator) File "/data/ssd_workspace/lh/WebGLM/web_demo.py", line 52, in query for resp in webglm.stream_query(query): File "/data/ssd_workspace/lh/WebGLM/model/modeling_webglm.py", line 49, in stream_query outputs = self.model.generate(**inputs, max_length=1024, eos_token_id = self.tokenizer.eop_token_id, pad_token_id=self.tokenizer.eop_token_id) File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/transformers/generation/utils.py", line 1515, in generate return self.greedy_search( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/transformers/generation/utils.py", line 2385, in greedy_search next_tokens.tile(eos_token_id_tensor.shape[0], 1).ne(eos_token_id_tensor.unsqueeze(1)).prod(dim=0) RuntimeError: CUDA driver error: device-side assert triggered ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [87,0,0], thread: [64,0,0] AssertionsrcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [87,0,0], thread: [65,0,0] AssertionsrcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [87,0,0], thread: [66,0,0] AssertionsrcIndex < srcSelectDimSize` failed. ...

`

Jun 25 '23 05:06 genius0182

我也出现这个问题了, 感觉该项目对中文的支持不够好.

Jun 26 '23 08:06 longGGGGGG

我猜可能是打分模型是用英文数据集训练的，可能对中文支持不好。不知道猜的对不对。

Jun 26 '23 08:06 genius0182

字符串输入太长了，把modeling_webglm.py 代码中的 max_length=1024改成2048试试。或者将搜索后得到的结果截断试一下。

inputs = self.tokenizer.build_inputs_for_generation(inputs, max_gen_length=1024) if self.device: inputs = inputs.to(self.device) outputs = self.model.generate(**inputs, max_length=1024, eos_token_id = self.tokenizer.eop_token_id, pad_token_id=self.tokenizer.eop_token_id)

Jul 06 '23 09:07 campuslifeceo

同样出现这个问题了。把modeling_webglm.py 代码中的 max_length=1024改成2048也不行，但是跑英文好像没问题

Jul 09 '23 10:07 EQ3000

原因是目前不支持中文，参考问题 #1

Jul 30 '23 16:07 zilunzhang

WebGLM WebGLM copied to clipboard

RuntimeError: CUDA driver error: device-side assert triggered

WebGLM
WebGLM copied to clipboard