remove xformers and replace with torch native memory efficient attention

Open williamberman opened this issue 2 years ago • 0 comments

this will help for benchmarking since we don't have to include xformers in it :P

Testing the MOVQ:

from muse.modeling_movq import MOVQ
from PIL import Image
import numpy as np
import torch

torch.set_grad_enabled(False)
torch.manual_seed(0)

image = Image.open('input.png')

image = image.convert("RGB")
image = np.array(image)
image = image.astype(np.float32)
image = image / 255
image = image[None, :, :, :]
image = image.transpose(0, 3, 1, 2)
image = torch.from_numpy(image).to('cuda')

vae = MOVQ.from_pretrained("openMUSE/movq-lion-high-res-f8-16384")
vae.to('cuda')
vae.set_use_memory_efficient_attention_xformers(True) # comment out if running on this branch

out = vae(image)
out = out[0]

print(out.abs().sum())

out = out * 255
out = out.permute(0, 2, 3, 1)
out = out.cpu()
out = out.numpy()
out = out.astype(np.int8)
out = out[0, :, :, :]
out = Image.fromarray(out, mode='RGB')
out.save('out.png')

torch native: tensor(63837.2969, device='cuda:0') xformers: tensor(63861.8047, device='cuda:0')

input: man_in_forest

torch native out: movq_out_torch_native

xformers out: movq_out_xformers

Testing the transformer:

from muse import PipelineMuse
import torch

torch.manual_seed(0)

model = "openMUSE/muse-cc12m-uvit-clip-130k"

pipe = PipelineMuse.from_pretrained(model).to("cuda")
pipe.transformer.set_use_memory_efficient_attention_xformers(True) # comment out if running on this branch

pipe("a person in the forest", timesteps=12)[0].save("out.png")

torch native: transformer_torch_native

xformers: transformer_xformers

May 30 '23 22:05 williamberman