Paper provided example can not be reproduced !!
I have draw the example from the README.md.
-
load-8bit response is acceptable, but it didn't give me any explanation.
-
I think load-8-bit may decrease performance, so i exec in fp16 mode only (no serious quantization). But i got the worse results.. it still doesn't explain.
bunch of out-of-control [SEG] token is pop out?
About version and package :
accelerate 1.0.1
aiofiles 23.2.1
aiohappyeyeballs 2.4.3
aiohttp 3.10.10
aiosignal 1.3.1
altair 5.4.1
annotated-types 0.7.0
anyio 4.6.2
async-timeout 4.0.3
attrs 24.2.0
autocommand 2.2.2
backports.tarfile 1.2.0
bitsandbytes 0.41.1
certifi 2024.8.30
charset-normalizer 3.4.0
click 8.1.7
contourpy 1.3.0
cycler 0.12.1
deepspeed 0.15.2
einops 0.4.1
exceptiongroup 1.2.2
fastapi 0.100.1
ffmpy 0.4.0
filelock 3.16.1
flash_attn 2.6.3
fonttools 4.54.1
frozenlist 1.4.1
fsspec 2024.9.0
gradio 3.39.0
gradio_client 1.3.0
grpcio 1.66.2
h11 0.14.0
hjson 3.1.0
httpcore 1.0.6
httpx 0.27.2
huggingface-hub 0.25.2
idna 3.10
importlib_metadata 8.0.0
importlib_resources 6.4.5
inflect 7.3.1
jaraco.collections 5.1.0
jaraco.context 5.3.0
jaraco.functools 4.0.1
jaraco.text 3.12.1
Jinja2 3.1.4
joblib 1.4.2
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
kiwisolver 1.4.7
linkify-it-py 2.0.3
markdown-it-py 2.2.0
markdown2 2.4.10
MarkupSafe 2.1.5
matplotlib 3.9.2
mdit-py-plugins 0.3.3
mdurl 0.1.2
more-itertools 10.3.0
mpmath 1.3.0
msgpack 1.1.0
multidict 6.1.0
narwhals 1.9.3
networkx 3.2.1
ninja 1.11.1.1
numpy 1.24.2
nvidia-ml-py 12.560.30
openai 0.27.8
opencv-python 4.8.0.74
orjson 3.10.7
packaging 24.1
pandas 2.2.3
peft 0.4.0
Pillow 9.4.0
pip 24.2
platformdirs 4.2.2
propcache 0.2.0
protobuf 5.28.2
psutil 6.0.0
py-cpuinfo 9.0.0
pycocotools 2.0.6
pydantic 2.9.2
pydantic_core 2.23.4
pydub 0.25.1
pyparsing 3.2.0
python-dateutil 2.9.0.post0
python-multipart 0.0.12
pytz 2024.2
PyYAML 6.0.2
ray 2.6.1
referencing 0.35.1
regex 2024.9.11
requests 2.31.0
rpds-py 0.20.0
sacremoses 0.1.1
safetensors 0.4.5
scipy 1.11.2
semantic-version 2.10.0
sentencepiece 0.2.0
setuptools 75.1.0
shortuuid 1.0.11
six 1.16.0
sniffio 1.3.1
starlette 0.27.0
sympy 1.12
tokenizers 0.15.2
tomli 2.0.1
torch 2.1.2+cu121
torchaudio 2.1.2+cu121
torchvision 0.16.2+cu121
tqdm 4.64.1
transformers 4.35.2
triton 2.1.0
typeguard 4.3.0
typing_extensions 4.12.2
tzdata 2024.2
uc-micro-py 1.0.3
urllib3 2.2.3
uvicorn 0.23.2
websockets 11.0.3
wheel 0.44.0
yarl 1.15.2
zipp 3.20.2
I have encountered several issue, so i follow transformer version to Llava, and modify the code according to this issue
https://github.com/haotian-liu/LLaVA/issues/968
The real problem i afraid affect the decoding strategy is that https://github.com/salesforce/LAVIS/issues/571.
So, i have replace all private function (i.e. _expand_mask) to object to pass the static check of python.
Moreover, i have place RuntimeError to the begin of all function which will use it (but i didn't get any RuntimeError).
So, that's mean all private function will not be used during inference.
Any suggestion will be appreciated!!
Did not u see the train data. The GT for LLaVA to output is trained to be "Sure, it's the <seg>." It was trained to to say like that, so you cannot force it to output the explaination. Although the writer's demo seem to be wrong.
Did not u see the train data. The GT for LLaVA to output is trained to be "Sure, it's the
." It was trained to to say like that, so you cannot force it to output the explaination. Although the writer's demo seem to be wrong.
yeah, i agree your point, model barely can not generate the phrase what it never seen in training set.
However, the demo examples not just give one example.
So, i wonder how to reproduce such inference results by adjusting the prompt (we just see it trigger the prompt by 'explain why') ?
On the other hand, the reproduced results also appears the error-prone output, for example, it generate massive [SEG] tokens in console. It's also one of my question.
我觉得可以不用纠结这个问题,如果你希望得到demo中的效果,可以考虑调研一下LISA++,LISA++可以很好的完成demo中的任务,对话更自然一些
我觉得可以不用纠结这个问题,如果你希望得到demo中的效果,可以考虑调研一下LISA++,LISA++可以很好的完成demo中的任务,对话更自然一些
有查到你提的那篇論文,但我找不到它的github;你提到LISA++可以很好的完成demo中的任务,那你知道在哪取得LISA++的github嗎 (source code和reproduce的權重) ?
好像确实没有代码和权重,LISA++的论文里也没提到
then, let's wait and see that will LISA author give any comment ~
addition note : 11/06
we have also test the different checkpoint : "xinlai/LISA-13B-llama2-v1-explanatory", i haven't notice "explanatory" this term may denote the new feature i request in this thread.
Note that to fully allowing the capability of model, we also turn off load-in-8 bits. fp16 is the efficient flag we only preserved.
However, we encountered even worst results.
Text output (buggy output) :
Segment output (doesn't correctly segment) :
@HuangChiEn Hi~I have been working on reproducing LISA recently. Have you noticed Issue# 162? I see that you are all paying attention to the reproduction of reason seg, but rarely mention reference seg(e.g. validation on refCOCO dataset). Have you successfully replicated the results tested on the refCOCO dataset?
@HuangChiEn Hi~I have been working on reproducing LISA recently. Have you noticed Issue# 162? I see that you are all paying attention to the reproduction of reason seg, but rarely mention reference seg(e.g. validation on refCOCO dataset). Have you successfully replicated the results tested on the refCOCO dataset?
hello, i hadn't fully tested on it (run through all validation set), but the segmentation results are correct on the paper provided examples.
On the other hand, i have successfully reproduced the reason seg, while only xinlai/LISA-13B-llama2-v1-explanatory quantized version works (--precision='fp16' --load_in_8bit or --load_in_4bit).