imagen-pytorch
imagen-pytorch copied to clipboard
loss value
Does anyone have trained text-conditional model? What about the loss value? I have trained the model on laion-art dataset, and the loss value finally decrease to around 0.1. Is it normal? Here are the sampled pictures.
I've trained a model with about 13k pairs with 10k steps for each Unet and with a final loss of about 0.009 and get quiet the good text to image alignment
What cond_scale are you producing your samples with?
What cond_scale are you producing your samples with?
Try lower values... That said, text conditioning has seemed finicky in my experience.
Try lower values... That said, text conditioning has seemed finicky in my experience.
Ok, thanks for your suggestion.
Which dataset did you use? I guess that laion-art is too large, so I want to train on a smaller dataset
I'm training on danbooru-figures which is almost 900k images.
Do you have a model trained now? I would like to ask you about the specific value of the Unet parameter. The results I get so far are not good.The images I've gotten so far are rather blurry. @TheFusion21
Do you have a model trained now? I would like to ask you about the specific value of the Unet parameter. The results I get so far are not good.The images I've gotten so far are rather blurry. @TheFusion21
unet1 = dict(
dim = 384,
cond_dim = 384,
dim_mults = (1, 2, 3, 4),
num_resnet_blocks = 3,
attn_dim_head = 64,
attn_heads = 8,
layer_attns = (False, True, True, True),
layer_cross_attns = (False, True, True, True),
memory_efficient = False,
)
unet2 = dict(
dim = 128,
cond_dim = 128,
dim_mults = (1, 2, 3, 4),
num_resnet_blocks = (2, 4, 8, 8),
attn_dim_head = 64,
attn_heads = 8,
layer_attns = (False, False, False, True),
layer_cross_attns = (False, False, False, True),
memory_efficient = True,
)
unet3 = dict(
dim = 128,
cond_dim = 128,
dim_mults = (1, 2, 3, 4),
num_resnet_blocks = (2, 4, 8, 8),
attn_dim_head = 64,
attn_heads = 8,
layer_attns = False,
layer_cross_attns = (False, False, False, True),
memory_efficient = True,
)
imagen = ImagenConfig(
unets = [unet1, unet2, unet3],
image_sizes = (64, 256, 1024),
timesteps = 256,
condition_on_text = True,
cond_drop_prob = 0.1,
random_crop_sizes = (None, 64, 256)
).create()
trainer = ImagenTrainer(
imagen = imagen,
lr = 1e-4,
cosine_decay_max_steps = 1500000,
warmup_steps = 7500
)
This is my configuration. Unet 1 should optimaly have 512 in dim but I had to reduce it for memory reasons.
condition_on_text
Thanks!
Do you have a model trained now? I would like to ask you about the specific value of the Unet parameter. The results I get so far are not good.The images I've gotten so far are rather blurry. @TheFusion21
unet1 = dict( dim = 384, cond_dim = 384, dim_mults = (1, 2, 3, 4), num_resnet_blocks = 3, attn_dim_head = 64, attn_heads = 8, layer_attns = (False, True, True, True), layer_cross_attns = (False, True, True, True), memory_efficient = False, ) unet2 = dict( dim = 128, cond_dim = 128, dim_mults = (1, 2, 3, 4), num_resnet_blocks = (2, 4, 8, 8), attn_dim_head = 64, attn_heads = 8, layer_attns = (False, False, False, True), layer_cross_attns = (False, False, False, True), memory_efficient = True, ) unet3 = dict( dim = 128, cond_dim = 128, dim_mults = (1, 2, 3, 4), num_resnet_blocks = (2, 4, 8, 8), attn_dim_head = 64, attn_heads = 8, layer_attns = False, layer_cross_attns = (False, False, False, True), memory_efficient = True, ) imagen = ImagenConfig( unets = [unet1, unet2, unet3], image_sizes = (64, 256, 1024), timesteps = 256, condition_on_text = True, cond_drop_prob = 0.1, random_crop_sizes = (None, 64, 256) ).create() trainer = ImagenTrainer( imagen = imagen, lr = 1e-4, cosine_decay_max_steps = 1500000, warmup_steps = 7500 )
This is my configuration. Unet 1 should optimaly have 512 in dim but I had to reduce it for memory reasons.
I still have a question, should I train Unet1, Unet2, and Unet3 separately or update the parameters of all three networks in one step?
@TheFusion21 I still have a question, should I train Unet1, Unet2, and Unet3 separately or update the parameters of all three networks in one step?
You can't train them together in one step
You can't train them together in one step
All right. Thanks!
@TheFusion21 Sorry to bother you again, but could you send me the checkpoint.pt of your training? I have sent an email to your email address.
@TheFusion21 Sorry to bother you again, but could you send me the checkpoint.pt of your training? I have sent an email to your email address.
Can't do it is trained on private data
@TheFusion21 Sorry to bother you again, but could you send me the checkpoint.pt of your training? I have sent an email to your email address.
Can't do it is trained on private data
All right, Thanks
@TheFusion21 When I use the command "imagen --model" to generate an image, it gives me the error "Command 'imagen' not found". Have you encountered the same problem?