WebGLM
WebGLM copied to clipboard
RuntimeError: CUDA driver error: device-side assert triggered
os: ubuntu 18:04
python: 3.9
cuda: 11.8
启动命令:python web_demo.py -w THUDM/WebGLM-2B --searcher bing
question: 大连在中国什么位置
error:
WebGLM Initializing... WebGLM Loaded Running on local URL: http://0.0.0.0:8032 [System] Searching ... [System] Count of available urls: 15 [System] Fetching ... [System] Count of available fetch results: 2147719 [System] Extracting ... [System] Count of paragraphs: 136 [System] Filtering ... Input length of input_ids is 1068, but
max_lengthis set to 1024. This can lead to unexpected behavior. You should consider increasing
max_new_tokens. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [39,0,0], thread: [64,0,0] Assertion
srcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [39,0,0], thread: [65,0,0] Assertion
srcIndex < srcSelectDimSizefailed. ... Traceback (most recent call last): File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/routes.py", line 427, in run_predict output = await app.get_blocks().process_api( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/blocks.py", line 1323, in process_api result = await self.call_function( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/blocks.py", line 1067, in call_function prediction = await utils.async_iteration(iterator) File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/utils.py", line 336, in async_iteration return await iterator.__anext__() File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/utils.py", line 329, in __anext__ return await anyio.to_thread.run_sync( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, *args) File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/utils.py", line 312, in run_sync_iterator_async return next(iterator) File "/data/ssd_workspace/lh/WebGLM/web_demo.py", line 52, in query for resp in webglm.stream_query(query): File "/data/ssd_workspace/lh/WebGLM/model/modeling_webglm.py", line 49, in stream_query outputs = self.model.generate(**inputs, max_length=1024, eos_token_id = self.tokenizer.eop_token_id, pad_token_id=self.tokenizer.eop_token_id) File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/transformers/generation/utils.py", line 1515, in generate return self.greedy_search( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/transformers/generation/utils.py", line 2385, in greedy_search next_tokens.tile(eos_token_id_tensor.shape[0], 1).ne(eos_token_id_tensor.unsqueeze(1)).prod(dim=0) RuntimeError: CUDA driver error: device-side assert triggered ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [87,0,0], thread: [64,0,0] Assertion
srcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [87,0,0], thread: [65,0,0] Assertion
srcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [87,0,0], thread: [66,0,0] Assertion
srcIndex < srcSelectDimSize` failed.
...
`
我也出现这个问题了, 感觉该项目对中文的支持不够好.
我猜可能是打分模型是用英文数据集训练的,可能对中文支持不好。不知道猜的对不对。
字符串输入太长了, 把modeling_webglm.py 代码中的 max_length=1024改成2048试试。 或者将搜索后得到的结果截断试一下。
inputs = self.tokenizer.build_inputs_for_generation(inputs, max_gen_length=1024) if self.device: inputs = inputs.to(self.device) outputs = self.model.generate(**inputs, max_length=1024, eos_token_id = self.tokenizer.eop_token_id, pad_token_id=self.tokenizer.eop_token_id)
同样出现这个问题了。把modeling_webglm.py 代码中的 max_length=1024改成2048也不行,但是跑英文好像没问题
原因是目前不支持中文,参考问题 #1