i-Code icon indicating copy to clipboard operation
i-Code copied to clipboard

Img2txt result is pretty bad on 16bit

Open billzhao9 opened this issue 2 years ago • 4 comments

Don't know if I did anything wrong, but the result was not like the example.

import os from core.models.model_module_infer import model_module

model_load_paths = ['CoDi_encoders.pth', 'CoDi_text_diffuser.pth', 'CoDi_audio_diffuser_m.pth', 'CoDi_video_diffuser_8frames.pth'] inference_tester = model_module(data_dir='checkpoints/', pth=model_load_paths, fp16=True) # turn on fp16=True if loading fp16 weights inference_tester = inference_tester.cuda() inference_tester = inference_tester.eval()

from PIL import Image im = Image.open('./assets/demo_files/house.jpeg').resize((224,224)) im text = inference_tester.inference( xtype = ['text'], condition = [im], condition_types = ['image'], n_samples = 4, ddim_steps = 50, scale = 7.5,) text[0] Data shape for DDIM sampling is [[4, 768]], eta 0.0 DDIM Sampler: 100%|██████████| 50/50 [00:01<00:00, 29.62it/s] ['oriental book examines a bag with bright spots.', 'a street view of kitchen toys and tv.', 'a is also carrying a ton of blue and green spandex around and pedals.', 'woman and a white bird crossing in the field.']

The sample image is a house.. but somehow the result is very strange.

billzhao9 avatar Sep 25 '23 19:09 billzhao9

What is your transformers version? Did you install requirements.txt

zinengtang avatar Sep 25 '23 21:09 zinengtang

Yes. I have all modules installed as requirements.txt listed. However, the transformers version is 4.33.2. Did it impact the result?

billzhao9 avatar Sep 25 '23 22:09 billzhao9

I think so. The transformers version in requirements.txt is 4.26.0. The higher version can cause some mismatch to the code.

zinengtang avatar Sep 26 '23 03:09 zinengtang

@billzhao9 I might have similar issues with Image Encoder. Were you able to fix the issue by fixing transformer version?

jacklishufan avatar Oct 26 '23 06:10 jacklishufan