[Question]: aquila-7B OOM
Description
在32G GPU上跑aquila-7B推理的示例代码显示out of memory,请问需要多少显存? 其他7B大模型是可以跑的,aquila模型的显存消耗会比较高吗?
Alternatives
No response
Same issue here:
- Loading model
aquila-7b/aquilachat-7btakes at most 107G memory. - After moving the model to CUDA, the program still use ~65G memory.
- Inference on 3090 24G always trigger the CUDA OOM error.
My system information:
.-/+oossssoo+/-. minerva@worker
`:+ssssssssssssssssss+:` --------------
-+ssssssssssssssssssyyssss+- OS: Ubuntu 20.04.3 LTS x86_64
.ossssssssssssssssssdMMMNysssso. Host: Super Server 0123456789
/ssssssssssshdmmNNmmyNMMMMhssssss/ Kernel: 5.4.0-125-generic
+ssssssssshmydMMMMMMMNddddyssssssss+ Uptime: 145 days, 29 mins
/sssssssshNMMMyhhyyyyhmNMMMNhssssssss/ Packages: 756 (dpkg), 5 (snap)
.ssssssssdMMMNhsssssssssshNMMMdssssssss. Shell: bash 5.0.17
+sssshhhyNMMNyssssssssssssyNMMMysssssss+ Resolution: 1024x768
ossyNMMMNyMMhsssssssssssssshmmmhssssssso Terminal: /dev/pts/22
ossyNMMMNyMMhsssssssssssssshmmmhssssssso CPU: Intel Xeon E5-2690 v4 (56) @ 3.500GHz
+sssshhhyNMMNyssssssssssssyNMMMysssssss+ GPU: NVIDIA 83:00.0 NVIDIA Corporation Device 2204
.ssssssssdMMMNhsssssssssshNMMMdssssssss. GPU: NVIDIA 82:00.0 NVIDIA Corporation Device 2204
/sssssssshNMMMyhhyyyyhdNMMMNhssssssss/ GPU: NVIDIA 02:00.0 NVIDIA Corporation Device 2204
+sssssssssdmydMMMMMMMMddddyssssssss+ GPU: NVIDIA 03:00.0 NVIDIA Corporation Device 2204
/ssssssssssshdmNNNNmyNMMMMhssssss/ Memory: 1940MiB / 257821MiB
.ossssssssssssssssssdMMMNysssso.
-+sssssssssssssssssyyyssss+-
`:+ssssssssssssssssss+:`
.-/+oossssoo+/-.
我们工程师正在排查这个问题
fixed。
后面我们发个修复版本,到时候您更新下
Description
在32G GPU上跑aquila-7B推理的示例代码显示out of memory,请问需要多少显存? 其他7B大模型是可以跑的,aquila模型的显存消耗会比较高吗?
Alternatives
No response
您从哪儿下载的模型文件?
是用了这里的代码
1.7.1版还是会遇到这个问题。32G RAM耗尽后killed。
我在24GB的A5000上运行,也是莫名退出,连OOM错误都不报
1.7.1版还是会遇到这个问题。32G RAM耗尽后killed。
可以给下执行脚本吗
我在24GB的A5000上运行,也是莫名退出,连OOM错误都不报
也是在1.7.1版吗
我在24GB的A5000上运行,也是莫名退出,连OOM错误都不报
也是在1.7.1版吗
没注意版本,就是前天从github上打包下载的flagai的整个zip文件
1.7.1版还是会遇到这个问题。32G RAM耗尽后killed。
可以给下执行脚本吗
代码从这里copy的:https://github.com/FlagAI-Open/FlagAI/tree/master/examples/Aquila#3-%E6%8E%A8%E7%90%86inference
import os
import torch
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor
from flagai.data.tokenizer import Tokenizer
import bminf
state_dict = "./checkpoints_in/"
model_name = 'aquila-7b' # 'aquila-33b'
loader = AutoLoader(
"lm",
model_dir=state_dict,
model_name=model_name,
use_cache=True)
model = loader.get_model()
tokenizer = loader.get_tokenizer()
model.eval()
model.half()
model.cuda()
predictor = Predictor(model, tokenizer)
text = "北京在哪儿?"
text = f'{text}'
print(f"text is {text}")
with torch.no_grad():
out = predictor.predict_generate_randomsample(text, out_max_length=200, temperature=0)
print(f"pred is {out}")
版本:
torch 2.0.1+cu118
flagai 1.7.1
bminf 2.0.1
另外将 from torch._six import inf 替换为 from torch import inf。
耗尽的是CPU RAM,不是GPU RAM。
1.7.1版还是会遇到这个问题。32G RAM耗尽后killed。
可以给下执行脚本吗
代码从这里copy的:https://github.com/FlagAI-Open/FlagAI/tree/master/examples/Aquila#3-%E6%8E%A8%E7%90%86inference
import os import torch from flagai.auto_model.auto_loader import AutoLoader from flagai.model.predictor.predictor import Predictor from flagai.data.tokenizer import Tokenizer import bminf state_dict = "./checkpoints_in/" model_name = 'aquila-7b' # 'aquila-33b' loader = AutoLoader( "lm", model_dir=state_dict, model_name=model_name, use_cache=True) model = loader.get_model() tokenizer = loader.get_tokenizer() model.eval() model.half() model.cuda() predictor = Predictor(model, tokenizer) text = "北京在哪儿?" text = f'{text}' print(f"text is {text}") with torch.no_grad(): out = predictor.predict_generate_randomsample(text, out_max_length=200, temperature=0) print(f"pred is {out}")版本:
torch 2.0.1+cu118 flagai 1.7.1 bminf 2.0.1另外将
from torch._six import inf替换为from torch import inf。耗尽的是CPU RAM,不是GPU RAM。
啊?!那需要多少CPU内存?
fixed。
后面我们发个修复版本,到时候您更新下
按照这儿第三步推理的例子运行,还是会出现OOM的问题,40GB内存,V100显卡。
https://github.com/FlagAI-Open/FlagAI/tree/master/examples/Aquila#3-%E6%8E%A8%E7%90%86inference
wsl2给了50g内存和64g交换空间 显存24g 提示显存不够
看起来AquilaChat也有同样的问题。 使用代码:https://github.com/FlagAI-Open/FlagAI/tree/master/examples/Aquila/Aquila-chat#1-%E6%8E%A8%E7%90%86inference
复现环境:
python3 -m venv .env
source .env/bin/activate
pip install -i https://mirrors.cloud.tencent.com/pypi/simple flagai
pip install -i https://mirrors.cloud.tencent.com/pypi/simple bminf
# 修正_six不存在的问题: from torch._six import inf 替换为 from torch import inf。
vim /home/robin/aquila-7b/.env/lib/python3.8/site-packages/flagai/mpu/grads.py
有没有可能是依赖包版本问题?官方能否给一个requirements.txt?
$ pip freeze
absl-py==1.4.0
aiohttp==3.8.4
aiosignal==1.3.1
antlr4-python3-runtime==4.9.3
async-timeout==4.0.2
attrs==23.1.0
bminf==2.0.1
boto3==1.21.42
botocore==1.24.46
cachetools==5.3.1
certifi==2023.5.7
charset-normalizer==3.1.0
click==8.1.3
cmake==3.26.4
colorama==0.4.6
cpm-kernels==1.0.11
datasets==2.0.0
diffusers==0.7.2
dill==0.3.6
einops==0.3.0
filelock==3.12.1
flagai==1.7.1
frozenlist==1.3.3
fsspec==2023.6.0
ftfy==6.1.1
google-auth==2.19.1
google-auth-oauthlib==0.4.6
grpcio==1.54.2
huggingface-hub==0.15.1
idna==3.4
importlib-metadata==6.6.0
jieba==0.42.1
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
lit==16.0.5.post0
lxml==4.9.2
Markdown==3.4.3
MarkupSafe==2.1.3
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.14
networkx==3.1
nltk==3.6.7
numpy==1.24.3
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
oauthlib==3.2.2
omegaconf==2.3.0
packaging==23.1
pandas==1.3.5
Pillow==9.5.0
portalocker==2.7.0
protobuf==3.19.6
pyarrow==12.0.0
pyasn1==0.5.0
pyasn1-modules==0.3.0
pyDeprecate==0.3.2
python-dateutil==2.8.2
pytorch-lightning==1.6.5
pytz==2023.3
PyYAML==6.0
regex==2023.6.3
requests==2.31.0
requests-oauthlib==1.3.1
responses==0.18.0
rouge-score==0.1.2
rsa==4.9
s3transfer==0.5.2
sacrebleu==2.3.1
scikit-learn==1.0.2
scipy==1.10.1
sentencepiece==0.1.96
six==1.16.0
sympy==1.12
tabulate==0.9.0
taming-transformers-rom1504==0.0.6
tensorboard==2.9.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
threadpoolctl==3.1.0
tokenizers==0.12.1
torch==2.0.1
torchmetrics==0.11.4
torchvision==0.15.2
tqdm==4.65.0
transformers==4.20.1
triton==2.0.0
typing-extensions==4.6.3
urllib3==1.26.16
wcwidth==0.2.6
Werkzeug==2.3.6
xxhash==3.2.0
yarl==1.9.2
zipp==3.15.0
我试成功了:推理的代码中,加一个device="cuda"的参数,模型会直接加载到GPU(之前是先加载到CPU,我也不知道为啥啊),加载完后,显存占用28GB,清理缓存后,16GB。7b模型。
感谢。AutoLoader追加device="cuda"后,现在是24G显存不够的错误。
OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 23.68 GiB total capacity; 22.89 GiB already
allocated; 21.31 MiB free; 22.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting
max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
羊驼系7B是没有问题的。
可以先清理下cuda cache。
使用测试脚本部署成服务,每调用一次增加显存,几次之后就回出现oom
使用测试脚本部署成服务,每调用一次增加显存,几次之后就回出现oom
请问用的是flagai哪个版本? 方便看下服务代码么
使用测试脚本部署成服务,每调用一次增加显存,几次之后就回出现oom
请问用的是flagai哪个版本? 方便看下服务代码么
@ftgreat 直接在根目录下跑的,然后分支用的这个
- master 0634ab4 Merge pull request #341 from Anhforth/master
服务代码:
import asyncio import websockets import json import numpy as np import os import torch from flagai.auto_model.auto_loader import AutoLoader from flagai.model.predictor.predictor_web import Predictor from flagai.data.tokenizer import Tokenizer import bminf
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
state_dict = "./checkpoints_in" model_name = 'aquila-7b' # 'aquila-33b'
loader = AutoLoader( "lm", model_dir=state_dict, model_name=model_name, use_cache=True) model = loader.get_model() tokenizer = loader.get_tokenizer()
model.eval() model.half() model.cuda()
predictor = Predictor(model, tokenizer)
def default_dump(obj): """Convert numpy classes to JSON serializable objects.""" if isinstance(obj, (np.integer, np.floating, np.bool_)): return obj.item() elif isinstance(obj, np.ndarray): return obj.tolist() else: return obj
async def main_logic(websocket, path):
data = await websocket.recv()
request_json = json.loads(data)
print(request_json)
query = request_json["prompt"]
use_stream = request_json["stream"] if "stream" in request_json else False
max_length = request_json["maxTokens"] if "maxTokens" in request_json else 320
top_k = request_json["topK"] if "topK" in request_json else 50
temperature = request_json["temperature"] if "temperature" in request_json else 0.95
top_p = request_json["topP"] if "topP" in request_json else 0.7
do_sample = request_json["useRandom"] if "useRandom" in request_json else False
logprobs = request_json["logprobs"] if "logprobs" in request_json else 0
with torch.autocast("cuda"):
g_index = 0
for re_data in predictor.predict_generate_randomsample(query,
total_max_length=max_length,
top_k=top_k,
top_p=top_p,
temperature=temperature,
prompts_tokens=None):
print(re_data)
# await websocket.send(json.dumps(re_data, ensure_ascii=False, default=default_dump))
if "result" in re_data:
re_data["result"]["index"] = g_index
# await websocket.send(re_data.lstrip("").rstrip(""))
if re_data["finish"]:
await websocket.send(json.dumps(re_data, ensure_ascii=False, default=default_dump))
break
else:
if use_stream and re_data["usage"]["totalTokens"] % 5 == 0 and re_data["usage"]["totalTokens"] >= 20:
await websocket.send(json.dumps(re_data, ensure_ascii=False, default=default_dump))
g_index += 1
await websocket.send("close")
async def start_server(): server = await websockets.serve(main_logic, '0.0.0.0', 17862) await server.wait_closed()
if name == "main": asyncio.get_event_loop().run_until_complete(start_server()) asyncio.get_event_loop().run_forever()
其中引用的方法的return改成了yield
每次增加1G左右显存
no_grad
我觉得predict部分需要 no_grad 包一下,不然会增加显存。
no_grad我觉得predict部分需要 no_grad 包一下,不然会增加显存。
好的,谢啦,我把方法加了@torch.no_grad()注解,不会增加了
感谢。
AutoLoader追加device="cuda"后,现在是24G显存不够的错误。OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 23.68 GiB total capacity; 22.89 GiB already allocated; 21.31 MiB free; 22.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF羊驼系7B是没有问题的。
可以试试flagai 1.7.2 ,内存32G,显存16G(包括模型+一条2048tokens)
感谢。
AutoLoader追加device="cuda"后,现在是24G显存不够的错误。OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 23.68 GiB total capacity; 22.89 GiB already allocated; 21.31 MiB free; 22.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF羊驼系7B是没有问题的。
可以试试flagai 1.7.2 ,内存32G,显存16G(包括模型+一条2048tokens)
感谢回复!
升级到1.7.2以后,RTX 3090还是会报GPU OOM错误。
[2023-06-14 00:46:31,934] [INFO] [logger.py:85:log_dist] [Rank -1] Unsupported bmtrain
******************** lm aquilachat-7b
Traceback (most recent call last):
File "chat.py", line 10, in <module>
loader = AutoLoader(
File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/flagai/auto_model/auto_loader.py", line 216, in __init__
self.model = getattr(LazyImport(self.model_name[0]),
File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/flagai/model/base_model.py", line 184, in from_pretrain
return load_local(checkpoint_path, only_download_config=only_download_config)
File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/flagai/model/base_model.py", line 116, in load_local
model.to(device)
File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 23.68 GiB total capacity; 23.22 GiB already allocated; 169.31 MiB free; 23.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
使用代码:
import os
import torch
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor
from flagai.model.predictor.aquila import aquila_generate
state_dict = "./checkpoints_in"
model_name = 'aquilachat-7b'
loader = AutoLoader(
"lm",
model_dir=state_dict,
model_name=model_name,
use_cache=True,
device='cuda')
model = loader.get_model()
tokenizer = loader.get_tokenizer()
cache_dir = os.path.join(state_dict, model_name)
model.eval()
model.half()
model.cuda()
predictor = Predictor(model, tokenizer)
text = "北京为什么是中国的首都?"
def pack_obj(text):
obj = dict()
obj['id'] = 'demo'
obj['conversations'] = []
human = dict()
human['from'] = 'human'
human['value'] = text
obj['conversations'].append(human)
# dummy bot
bot = dict()
bot['from'] = 'gpt'
bot['value'] = ''
obj['conversations'].append(bot)
obj['instruction'] = ''
return obj
def delete_last_bot_end_singal(convo_obj):
conversations = convo_obj['conversations']
assert len(conversations) > 0 and len(conversations) % 2 == 0
assert conversations[0]['from'] == 'human'
last_bot = conversations[len(conversations)-1]
assert last_bot['from'] == 'gpt'
## from _add_speaker_and_signal
END_SIGNAL = "\n"
len_end_singal = len(END_SIGNAL)
len_last_bot_value = len(last_bot['value'])
last_bot['value'] = last_bot['value'][:len_last_bot_value-len_end_singal]
return
def convo_tokenize(convo_obj, tokenizer):
chat_desc = convo_obj['chat_desc']
instruction = convo_obj['instruction']
conversations = convo_obj['conversations']
# chat_desc
example = tokenizer.encode_plus(f"{chat_desc}", None, max_length=None)['input_ids']
EOS_TOKEN = example[-1]
example = example[:-1] # remove eos
# instruction
instruction = tokenizer.encode_plus(f"{instruction}", None, max_length=None)['input_ids']
instruction = instruction[1:-1] # remove bos & eos
example += instruction
for conversation in conversations:
role = conversation['from']
content = conversation['value']
print(f"role {role}, raw content {content}")
content = tokenizer.encode_plus(f"{content}", None, max_length=None)['input_ids']
content = content[1:-1] # remove bos & eos
print(f"role {role}, content {content}")
example += content
return example
print('-'*80)
print(f"text is {text}")
from cyg_conversation import default_conversation
conv = default_conversation.copy()
conv.append_message(conv.roles[0], text)
conv.append_message(conv.roles[1], None)
tokens = tokenizer.encode_plus(f"{conv.get_prompt()}", None, max_length=None)['input_ids']
tokens = tokens[1:-1]
with torch.no_grad():
out = aquila_generate(tokenizer, model, [text], max_gen_len:=200, top_p=0.95, prompts_tokens=[tokens])
print(f"pred is {out}")
另外,上传到Pypi上边的1.7.2版与Github 1.7.2版不一致。Pypi的包会报错:
Traceback (most recent call last):
File "chat.py", line 4, in <module>
from flagai.model.predictor.predictor import Predictor
File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/flagai/model/predictor/predictor.py", line 22, in <module>
from .aquila import aquila_generate
File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/flagai/model/predictor/aquila.py", line 6
def aquila_generate(
^
SyntaxError: duplicate argument 'top_k' in function definition
文件flagai/model/predictor/aquila.py第14行重复了一个参数:
def aquila_generate(
tokenizer,
model,
prompts: List[str],
max_gen_len: int,
temperature: float = 0.8,
top_k: int = 30,
top_p: float = 0.95,
top_k: int = 30, # 重复的参数
prompts_tokens: List[List[int]] = None,
) -> List[str]:
...
感谢。
AutoLoader追加device="cuda"后,现在是24G显存不够的错误。OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 23.68 GiB total capacity; 22.89 GiB already allocated; 21.31 MiB free; 22.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF羊驼系7B是没有问题的。
可以试试flagai 1.7.2 ,内存32G,显存16G(包括模型+一条2048tokens)
感谢回复!
升级到1.7.2以后,RTX 3090还是会报GPU OOM错误。
[2023-06-14 00:46:31,934] [INFO] [logger.py:85:log_dist] [Rank -1] Unsupported bmtrain ******************** lm aquilachat-7b Traceback (most recent call last): File "chat.py", line 10, in <module> loader = AutoLoader( File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/flagai/auto_model/auto_loader.py", line 216, in __init__ self.model = getattr(LazyImport(self.model_name[0]), File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/flagai/model/base_model.py", line 184, in from_pretrain return load_local(checkpoint_path, only_download_config=only_download_config) File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/flagai/model/base_model.py", line 116, in load_local model.to(device) File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1145, in to return self._apply(convert) File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) [Previous line repeated 1 more time] File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 820, in _apply param_applied = fn(param) File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 23.68 GiB total capacity; 23.22 GiB already allocated; 169.31 MiB free; 23.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF使用代码:
import os import torch from flagai.auto_model.auto_loader import AutoLoader from flagai.model.predictor.predictor import Predictor from flagai.model.predictor.aquila import aquila_generate state_dict = "./checkpoints_in" model_name = 'aquilachat-7b' loader = AutoLoader( "lm", model_dir=state_dict, model_name=model_name, use_cache=True, device='cuda') model = loader.get_model() tokenizer = loader.get_tokenizer() cache_dir = os.path.join(state_dict, model_name) model.eval() model.half() model.cuda() predictor = Predictor(model, tokenizer) text = "北京为什么是中国的首都?" def pack_obj(text): obj = dict() obj['id'] = 'demo' obj['conversations'] = [] human = dict() human['from'] = 'human' human['value'] = text obj['conversations'].append(human) # dummy bot bot = dict() bot['from'] = 'gpt' bot['value'] = '' obj['conversations'].append(bot) obj['instruction'] = '' return obj def delete_last_bot_end_singal(convo_obj): conversations = convo_obj['conversations'] assert len(conversations) > 0 and len(conversations) % 2 == 0 assert conversations[0]['from'] == 'human' last_bot = conversations[len(conversations)-1] assert last_bot['from'] == 'gpt' ## from _add_speaker_and_signal END_SIGNAL = "\n" len_end_singal = len(END_SIGNAL) len_last_bot_value = len(last_bot['value']) last_bot['value'] = last_bot['value'][:len_last_bot_value-len_end_singal] return def convo_tokenize(convo_obj, tokenizer): chat_desc = convo_obj['chat_desc'] instruction = convo_obj['instruction'] conversations = convo_obj['conversations'] # chat_desc example = tokenizer.encode_plus(f"{chat_desc}", None, max_length=None)['input_ids'] EOS_TOKEN = example[-1] example = example[:-1] # remove eos # instruction instruction = tokenizer.encode_plus(f"{instruction}", None, max_length=None)['input_ids'] instruction = instruction[1:-1] # remove bos & eos example += instruction for conversation in conversations: role = conversation['from'] content = conversation['value'] print(f"role {role}, raw content {content}") content = tokenizer.encode_plus(f"{content}", None, max_length=None)['input_ids'] content = content[1:-1] # remove bos & eos print(f"role {role}, content {content}") example += content return example print('-'*80) print(f"text is {text}") from cyg_conversation import default_conversation conv = default_conversation.copy() conv.append_message(conv.roles[0], text) conv.append_message(conv.roles[1], None) tokens = tokenizer.encode_plus(f"{conv.get_prompt()}", None, max_length=None)['input_ids'] tokens = tokens[1:-1] with torch.no_grad(): out = aquila_generate(tokenizer, model, [text], max_gen_len:=200, top_p=0.95, prompts_tokens=[tokens]) print(f"pred is {out}")另外,上传到Pypi上边的1.7.2版与Github 1.7.2版不一致。Pypi的包会报错:
Traceback (most recent call last): File "chat.py", line 4, in <module> from flagai.model.predictor.predictor import Predictor File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/flagai/model/predictor/predictor.py", line 22, in <module> from .aquila import aquila_generate File "/home/robin/aquila-7b/.env/lib/python3.8/site-packages/flagai/model/predictor/aquila.py", line 6 def aquila_generate( ^ SyntaxError: duplicate argument 'top_k' in function definition文件
flagai/model/predictor/aquila.py第14行重复了一个参数:def aquila_generate( tokenizer, model, prompts: List[str], max_gen_len: int, temperature: float = 0.8, top_k: int = 30, top_p: float = 0.95, top_k: int = 30, # 重复的参数 prompts_tokens: List[List[int]] = None, ) -> List[str]: ...
今天会发版本修复。
更新1.7.3,同时使用FP16精度后,在RTX3090上运行成功。
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 Off | N/A |
| 0% 34C P8 32W / 350W| 15283MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1955360 C python3 15280MiB |
+---------------------------------------------------------------------------------------+
使用FP16精度:
loader = AutoLoader(
"lm",
model_dir=state_dict,
model_name=model_name,
use_cache=True,
fp16=True)
先关闭issue,如有问题请再打开。谢谢
后面我们发个修复版本,到时候您更新下