open-muse
open-muse copied to clipboard
remove xformers and replace with torch native memory efficient attention
this will help for benchmarking since we don't have to include xformers in it :P
Testing the MOVQ:
from muse.modeling_movq import MOVQ
from PIL import Image
import numpy as np
import torch
torch.set_grad_enabled(False)
torch.manual_seed(0)
image = Image.open('input.png')
image = image.convert("RGB")
image = np.array(image)
image = image.astype(np.float32)
image = image / 255
image = image[None, :, :, :]
image = image.transpose(0, 3, 1, 2)
image = torch.from_numpy(image).to('cuda')
vae = MOVQ.from_pretrained("openMUSE/movq-lion-high-res-f8-16384")
vae.to('cuda')
vae.set_use_memory_efficient_attention_xformers(True) # comment out if running on this branch
out = vae(image)
out = out[0]
print(out.abs().sum())
out = out * 255
out = out.permute(0, 2, 3, 1)
out = out.cpu()
out = out.numpy()
out = out.astype(np.int8)
out = out[0, :, :, :]
out = Image.fromarray(out, mode='RGB')
out.save('out.png')
torch native: tensor(63837.2969, device='cuda:0') xformers: tensor(63861.8047, device='cuda:0')
input:
torch native out:
xformers out:
Testing the transformer:
from muse import PipelineMuse
import torch
torch.manual_seed(0)
model = "openMUSE/muse-cc12m-uvit-clip-130k"
pipe = PipelineMuse.from_pretrained(model).to("cuda")
pipe.transformer.set_use_memory_efficient_attention_xformers(True) # comment out if running on this branch
pipe("a person in the forest", timesteps=12)[0].save("out.png")
torch native:
xformers: