gill issues

param.grad is None !

Hi! Thank you for your great work! After preparing datasets and pretrained model, I trained the model using this command: randport=$(shuf -i8000-9999 -n1) # Generate a random port number python...

txdtplus

RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be...

huangyuf

Error size mismatch when load decision model

2

After training both gill and decision model, load_model failed: ```txt ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ in :2 │ │ │ │ /content/gill/gill/models.py:873 in load_gill │ │ │...

haunt98

Visdial相关问题

1. 用的是训练集，split=val，并且没有把image作为输入。应该是dialogs+image -> image? 2. 和vist的使用方式有挺大区别，vist用的是dialogs+image -> image

geknow

Inference shape is not 8

1

Thank you for the good code. However, the inference code appears as follows. The value of the first dimension of the actual raw_emb tensor is 0, not 8. ![image](https://github.com/kohjingyu/gill/assets/96530685/57f36c74-bd7e-4a2d-8ce8-aa40cb692901)

taemin6697

why don't you use universal representation in one task?

I am curious why don't you use universal representation in one task? like input: [image]+ caption output: caption +[IMG1]...[IMGn]

unyqhz

i try to dowmload cc3m using tools recommand by readme.md, but the number of picture can be download only 10% . is it normal?

zhenghuawang6

FID Evaluation on CC3M and VIST

Hi! Congratulations on great work! Could you please point me to the code to reproduce results in Table 3 and Table 4, particularly FID scores on CC3M and VIST dataset?...

shubhamagarwal92

environment conflict

I went through an issue that says, the torch version(1.13.1) is incompatible with the torchvision and torchaudio version, how to fix it in env setup

StephenQSstarThomas

about [img] token and train data

I have some questions with the paper. 1、As mentioned in this issue:https://github.com/kohjingyu/gill/issues/5#issuecomment-1619006482, it is said that "So the model will never produce [IMG2]...[IMG8] organically, but their representations are still helpful...

ALR-alr

gill
gill copied to clipboard

Metadata

param.grad is None !

RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Error size mismatch when load decision model

Visdial相关问题

Inference shape is not 8

why don't you use universal representation in one task?

i try to dowmload cc3m using tools recommand by readme.md, but the number of picture can be download only 10% . is it normal?

FID Evaluation on CC3M and VIST

environment conflict

about [img] token and train data

← Metadata

Owner

Metadata

gill gill copied to clipboard

Metadata

← Metadata

Owner

Metadata

gill
gill copied to clipboard