ml-mgie icon indicating copy to clipboard operation
ml-mgie copied to clipboard

Feat/package and device compatibility

Open paulasquin opened this issue 1 year ago • 7 comments

Refacto, Packaging & Apple Silicon compatibility

  • Add poetry-style packaging
  • Refacto code in Object Oriented Programming
  • Add typing
  • Add tests
  • Add mps compatibility (tested on M3 Max 64Go)
  • Add gradio app

To squash before merge

image

Solved issues

Nonsense inner thougts

In Apple Silicon, we are (were) getting nonsense from the model.generate methods

Payload

  • Instruction: make the frame red
  • Image: glasses

Expected:

  • Out:
If the frame of the glasses in the image were made red, the overall appearance of the scene would change significantly.The red frame would draw more attention to the glass and create a stronger contrast with the black frame.
  • Res: glasses

Obtained

  • Out
Pres flash togful calledgot At commitilli split sent supports fir card projects course bunch mixture enc halery racc developed curves enjoydog memory seek Inside Wh sam closure served supports fir tripifest towardinn household finishing exact meaning ordinary treat drop whose invert Rem follow til Otherwise stal frames sequence lifted accomp entire variation government carriage uses eratrim condition Wild throne phys mutong B woods racc developed Le rename Ada laugh applying dess squ cit reference rad type refresh spr rud embedded agricult foot ax steps God close These
  • Res: ~same as input

Fix

Latest llava weights that you can get from hugging face with git clone https://huggingface.co/liuhaotian/LLaVA-Lightning-7B-delta-v1-1 are just not working. Solved using saved weights by tsujuifu, stored in GoogleDrive -> A lot of time lost out of this. This is due to delta-vs-full LLava?

  • Out
The image would feature a close-up view of a pair of black eyeglasses with a gold or metallic frame, placed on a gray background.The frame would be red, drawing attention to the glasses and making them the focal point of the image.
  • Res glasses

paulasquin avatar Feb 08 '24 11:02 paulasquin

I also faced issues when trying to reproduce the results. Although no errors were displayed, the quality of the editing was not good as the paper. Could you please share the environment file so I can verify the versions of the critical packages?

xiaoqian-shen avatar Feb 13 '24 15:02 xiaoqian-shen

I fix the problem by using your provided checkpoint in google drive. Thanks!

xiaoqian-shen avatar Feb 13 '24 17:02 xiaoqian-shen

Hello @xiaoqian-shen

Indeed I suggest to use the models from my HuggingFace, which is from Tsu-Jui Fu's Google Drive link. I do not have clear understanding of why original package weights aren't working.

Even if this isn't needed for you anymore, here are the package version if it can help others: I'm sharing poetry run python -m pip freeze instead of poetry.lock file for readability

freeze.txt

paulasquin avatar Feb 13 '24 22:02 paulasquin

Thanks for your reply! May I ask are you available to reproduce the result of MagicBrush in Table 2?

xiaoqian-shen avatar Feb 15 '24 14:02 xiaoqian-shen

My trained mgie_7b also not working. Was able to train and export mllm.pt and unet.pt but if running demo, ckpt has no 'emb' and my ckpt´s 'model.embed_tokens.weight' have different tensor size. So running training worked but result model not. With tsujuifu´s weights demo works.

GitHub1712 avatar Feb 17 '24 12:02 GitHub1712

Thanks for your reply! May I ask are you available to reproduce the result of MagicBrush in Table 2?

Hello @xiaoqian-shen I have sometimes slight differences but I get mainly same level of quality, and a few times I got ugly results (phone and beach photos mainly)

Here are my before/after on the demo images

0-in 0-out 1-in 1-out 2-in 2-out 3-in 3-out 4-in 4-out 5-in 5-out 6-in 6-out 7-in 7-out 8-in 8-out 9-in 9-out 10-in 10-out 11-in 11-out 12-in 12-out 13-in 13-out 14-in 14-out 15-in 15-out 16-in 16-out 17-in 17-out 18-in 18-out 19-in 19-out

paulasquin avatar Feb 21 '24 17:02 paulasquin

Thank you for your contribution. I wonder where can I find the ipr2ipr.pkl/tsv data in the code, that is, the summarized image-text pair, or do I need to construct it myself?

lzw-lzw avatar Mar 18 '24 03:03 lzw-lzw