CogVLM load_in_4bit works but load_in_8bit gives error : self and mat2 must have the same dtype, but got Half and Char

System Info / 系統信息

Microsoft Windows [Version 10.0.19045.3996]
(c) Microsoft Corporation. All rights reserved.

G:\temp Local install\CogVLM\venv\Scripts>activate

(venv) G:\temp Local install\CogVLM\venv\Scripts>pip freeze
accelerate==0.26.1
aiofiles==23.2.1
aiohttp==3.9.3
aiosignal==1.3.1
altair==5.2.0
annotated-types==0.6.0
anyio==4.2.0
anykeystore==0.2
apex==0.9.10.dev0
async-timeout==4.0.3
attrs==23.2.0
bitsandbytes @ https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.1-py3-none-win_amd64.whl
blinker==1.7.0
blis==0.7.11
boto3==1.34.34
botocore==1.34.34
braceexpand==0.1.7
cachetools==5.3.2
catalogue==2.0.10
certifi==2022.12.7
charset-normalizer==2.1.1
click==8.1.7
cloudpathlib==0.16.0
colorama==0.4.6
confection==0.1.4
contourpy==1.2.0
cpm-kernels==1.0.11
cryptacular==1.6.2
cycler==0.12.1
cymem==2.0.8
datasets==2.16.1
deepspeed @ https://huggingface.co/MonsterMMORPG/SECourses/resolve/main/deepspeed-0.11.2_cuda121-cp310-cp310-win_amd64.whl
defusedxml==0.7.1
dill==0.3.7
einops==0.7.0
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl
exceptiongroup==1.2.0
fastapi==0.109.1
ffmpy==0.3.1
filelock==3.9.0
fonttools==4.47.2
frozenlist==1.4.1
fsspec==2023.10.0
gitdb==4.0.11
GitPython==3.1.41
gradio==4.16.0
gradio_client==0.8.1
greenlet==3.0.3
h11==0.14.0
hjson==3.1.0
httpcore==1.0.2
httpx==0.26.0
huggingface-hub==0.20.3
hupper==1.12.1
idna==3.4
importlib-metadata==7.0.1
importlib-resources==6.1.1
Jinja2==3.1.2
jmespath==1.0.1
jsonlines==4.0.0
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
kiwisolver==1.4.5
langcodes==3.3.0
loguru==0.7.2
markdown-it-py==3.0.0
MarkupSafe==2.1.3
matplotlib==3.8.2
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.15
murmurhash==1.0.10
networkx==3.2.1
ninja==1.11.1.1
numpy==1.26.3
oauthlib==3.2.2
orjson==3.9.13
packaging==23.2
pandas==2.2.0
PasteDeploy==3.1.0
pbkdf2==1.3
pillow==10.2.0
plaster==1.1.2
plaster-pastedeploy==1.0.1
preshed==3.0.9
protobuf==4.25.2
psutil==5.9.8
py-cpuinfo==9.0.0
pyarrow==15.0.0
pyarrow-hotfix==0.6
pydantic==2.6.0
pydantic_core==2.16.1
pydeck==0.8.1b0
pydub==0.25.1
Pygments==2.17.2
pynvml==11.5.0
pyparsing==3.1.1
pyramid==2.0.2
pyramid-mailer==0.15.1
python-dateutil==2.8.2
python-multipart==0.0.7
python3-openid==3.2.0
pytz==2024.1
PyYAML==6.0.1
referencing==0.33.0
regex==2023.12.25
repoze.sendmail==4.4.1
requests==2.28.1
requests-oauthlib==1.3.1
rich==13.7.0
rpds-py==0.17.1
ruff==0.2.0
s3transfer==0.10.0
safetensors==0.4.2
scipy==1.12.0
seaborn==0.13.2
semantic-version==2.10.0
sentencepiece==0.1.99
shellingham==1.5.4
six==1.16.0
smart-open==6.4.0
smmap==5.0.1
sniffio==1.3.0
spacy==3.7.2
spacy-legacy==3.0.12
spacy-loggers==1.0.5
SQLAlchemy==2.0.25
srsly==2.4.8
starlette==0.35.1
streamlit==1.31.0
SwissArmyTransformer==0.4.11
sympy==1.12
tenacity==8.2.3
tensorboardX==2.6.2.2
thinc==8.2.2
timm==0.9.12
tokenizers==0.15.1
toml==0.10.2
tomlkit==0.12.0
toolz==0.12.1
torch==2.2.0+cu121
torchaudio==2.2.0+cu121
torchvision==0.17.0+cu121
tornado==6.4
tqdm==4.66.1
transaction==4.0
transformers==4.37.2
translationstring==1.4
triton @ https://huggingface.co/MonsterMMORPG/SECourses/resolve/main/triton-2.1.0-cp310-cp310-win_amd64.whl
typer==0.9.0
typing_extensions==4.8.0
tzdata==2023.4
tzlocal==5.2
urllib3==1.26.13
uvicorn==0.27.0.post1
validators==0.22.0
velruse==1.1.1
venusian==3.1.0
wasabi==1.1.2
watchdog==3.0.0
weasel==0.3.4
webdataset==0.2.86
WebOb==1.8.7
websockets==11.0.3
win32-setctime==1.1.0
WTForms==3.1.2
wtforms-recaptcha==0.3.2
xformers==0.0.24
xxhash==3.4.1
yarl==1.9.4
zipp==3.17.0
zope.deprecation==5.0
zope.interface==6.1
zope.sqlalchemy==3.1

(venv) G:\temp Local install\CogVLM\venv\Scripts>

Who can help? / 谁可以帮助到您？

@1049451037 @zr

Information / 问题信息

[X] The official example scripts / 官方的示例脚本
[X] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

same code 4 bit load working no errors

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
MODEL_PATH = "THUDM/cogagent-vqa-hf"
tokenizer = LlamaTokenizer.from_pretrained('lmsys/vicuna-7b-v1.5')
torch_type = torch.float16	

# Load the model based on the arguments

    model = AutoModelForCausalLM.from_pretrained(
        MODEL_PATH,
        low_cpu_mem_usage=True,
        load_in_8bit=True,
        trust_remote_code=True
    ).eval()

        with torch.no_grad():
            image = Image.open(image_prompt).convert('RGB') if image_prompt is not None else None

            input_by_model = model.build_conversation_input_ids(tokenizer, query=input_text, history=[], images=([image] if image else None), template_version='base')
            inputs = {
                'input_ids': input_by_model['input_ids'].unsqueeze(0).to(DEVICE),
                'token_type_ids': input_by_model['token_type_ids'].unsqueeze(0).to(DEVICE),
                'attention_mask': input_by_model['attention_mask'].unsqueeze(0).to(DEVICE),
                'images': [[input_by_model['images'][0].to(DEVICE).to(torch_type)]],
            }
            if 'cross_images' in input_by_model and input_by_model['cross_images']:
                inputs['cross_images'] = [[input_by_model['cross_images'][0].to(DEVICE).to(torch_type)]]

            gen_kwargs = {
                "max_length": 2048,
                "temperature": temperature,
                "do_sample": do_sample,
                "top_p": top_p,
                "top_k": top_k
            }
            outputs = model.generate(**inputs, **gen_kwargs)
            outputs = outputs[:, inputs['input_ids'].shape[1]:]
            response = tokenizer.decode(outputs[0])
            response = response.split("</s>")[0]
            return response

Expected behavior / 期待表现

no error

Feb 04 '24 12:02 FurkanGozukara

4bit 模型是用cuda的算子来进行量化的，没有测试过8bit，可能是会出问题的

Feb 04 '24 16:02 zRzRzRzRzRzRzR

4bit 模型是用cuda的算子来进行量化的，没有测试过8bit，可能是会出问题的

yes 4 bit is working but 8 bit giving error.

I have tested cogvlm-chat-hf and it works in 8 bit same code

Feb 04 '24 16:02 FurkanGozukara

4bit 模型是用cuda的算子来进行量化的，没有测试过8bit，可能是会出问题的

yes 4 bit is working but 8 bit giving error.

I have tested cogvlm-chat-hf and it works in 8 bit same code

cogvlm-chat-hf runs on 8 bit but cogagent-chat-hf doesn't

Apr 05 '24 17:04 deadpipe

CogVLM CogVLM copied to clipboard

load_in_4bit works but load_in_8bit gives error : self and mat2 must have the same dtype, but got Half and Char

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

CogVLM
CogVLM copied to clipboard