gill
gill copied to clipboard
๐ Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".
Hi! Thank you for your great work! After preparing datasets and pretrained model, I trained the model using this command: randport=$(shuf -i8000-9999 -n1) # Generate a random port number python...
RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be...
After training both gill and decision model, load_model failed: ```txt โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Traceback (most recent call last) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ โ in :2 โ โ โ โ /content/gill/gill/models.py:873 in load_gill โ โ โ...
1. ็จ็ๆฏ่ฎญ็ป้๏ผsplit=val๏ผๅนถไธๆฒกๆๆimageไฝไธบ่พๅ ฅใๅบ่ฏฅๆฏdialogs+image -> image? 2. ๅvist็ไฝฟ็จๆนๅผๆๆบๅคงๅบๅซ๏ผvist็จ็ๆฏdialogs+image -> image
Thank you for the good code. However, the inference code appears as follows. The value of the first dimension of the actual raw_emb tensor is 0, not 8. 
I am curious why don't you use universal representation in one task? like input: [image]+ caption output: caption +[IMG1]...[IMGn]
Hi! Congratulations on great work! Could you please point me to the code to reproduce results in Table 3 and Table 4, particularly FID scores on CC3M and VIST dataset?...
I went through an issue that says, the torch version(1.13.1) is incompatible with the torchvision and torchaudio version, how to fix it in env setup
I have some questions with the paper. 1ใAs mentioned in this issue:https://github.com/kohjingyu/gill/issues/5#issuecomment-1619006482, it is said that "So the model will never produce [IMG2]...[IMG8] organically, but their representations are still helpful...