gill icon indicating copy to clipboard operation
gill copied to clipboard

๐ŸŸ Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".

Results 10 gill issues
Sort by recently updated
recently updated
newest added

Hi! Thank you for your great work! After preparing datasets and pretrained model, I trained the model using this command: randport=$(shuf -i8000-9999 -n1) # Generate a random port number python...

RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be...

After training both gill and decision model, load_model failed: ```txt โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ in :2 โ”‚ โ”‚ โ”‚ โ”‚ /content/gill/gill/models.py:873 in load_gill โ”‚ โ”‚ โ”‚...

1. ็”จ็š„ๆ˜ฏ่ฎญ็ปƒ้›†๏ผŒsplit=val๏ผŒๅนถไธ”ๆฒกๆœ‰ๆŠŠimageไฝœไธบ่พ“ๅ…ฅใ€‚ๅบ”่ฏฅๆ˜ฏdialogs+image -> image? 2. ๅ’Œvist็š„ไฝฟ็”จๆ–นๅผๆœ‰ๆŒบๅคงๅŒบๅˆซ๏ผŒvist็”จ็š„ๆ˜ฏdialogs+image -> image

Thank you for the good code. However, the inference code appears as follows. The value of the first dimension of the actual raw_emb tensor is 0, not 8. ![image](https://github.com/kohjingyu/gill/assets/96530685/57f36c74-bd7e-4a2d-8ce8-aa40cb692901)

I am curious why don't you use universal representation in one task? like input: [image]+ caption output: caption +[IMG1]...[IMGn]

Hi! Congratulations on great work! Could you please point me to the code to reproduce results in Table 3 and Table 4, particularly FID scores on CC3M and VIST dataset?...

I went through an issue that says, the torch version(1.13.1) is incompatible with the torchvision and torchaudio version, how to fix it in env setup

I have some questions with the paper. 1ใ€As mentioned in this issue:https://github.com/kohjingyu/gill/issues/5#issuecomment-1619006482, it is said that "So the model will never produce [IMG2]...[IMG8] organically, but their representations are still helpful...