i-Code
i-Code copied to clipboard
Img2txt result is pretty bad on 16bit
Don't know if I did anything wrong, but the result was not like the example.
import os from core.models.model_module_infer import model_module
model_load_paths = ['CoDi_encoders.pth', 'CoDi_text_diffuser.pth', 'CoDi_audio_diffuser_m.pth', 'CoDi_video_diffuser_8frames.pth'] inference_tester = model_module(data_dir='checkpoints/', pth=model_load_paths, fp16=True) # turn on fp16=True if loading fp16 weights inference_tester = inference_tester.cuda() inference_tester = inference_tester.eval()
from PIL import Image im = Image.open('./assets/demo_files/house.jpeg').resize((224,224)) im text = inference_tester.inference( xtype = ['text'], condition = [im], condition_types = ['image'], n_samples = 4, ddim_steps = 50, scale = 7.5,) text[0] Data shape for DDIM sampling is [[4, 768]], eta 0.0 DDIM Sampler: 100%|██████████| 50/50 [00:01<00:00, 29.62it/s] ['oriental book examines a bag with bright spots.', 'a street view of kitchen toys and tv.', 'a is also carrying a ton of blue and green spandex around and pedals.', 'woman and a white bird crossing in the field.']
The sample image is a house.. but somehow the result is very strange.
What is your transformers version? Did you install requirements.txt
Yes. I have all modules installed as requirements.txt listed. However, the transformers version is 4.33.2. Did it impact the result?
I think so. The transformers version in requirements.txt is 4.26.0. The higher version can cause some mismatch to the code.
@billzhao9 I might have similar issues with Image Encoder. Were you able to fix the issue by fixing transformer version?