imagen-pytorch loss value

Does anyone have trained text-conditional model? What about the loss value? I have trained the model on laion-art dataset, and the loss value finally decrease to around 0.1. Is it normal? Here are the sampled pictures. 截屏2022-09-08 18 38 58

Sep 08 '22 10:09 BIG-PIE-MILK-COW

I've trained a model with about 13k pairs with 10k steps for each Unet and with a final loss of about 0.009 and get quiet the good text to image alignment

Sep 08 '22 12:09 TheFusion21

What cond_scale are you producing your samples with?

Sep 10 '22 02:09 deepglugs

What cond_scale are you producing your samples with?

Sep 10 '22 03:09 BIG-PIE-MILK-COW

Try lower values... That said, text conditioning has seemed finicky in my experience.

Sep 10 '22 03:09 deepglugs

Try lower values... That said, text conditioning has seemed finicky in my experience.

Ok, thanks for your suggestion.

Sep 10 '22 03:09 BIG-PIE-MILK-COW

Which dataset did you use? I guess that laion-art is too large, so I want to train on a smaller dataset

Sep 13 '22 09:09 BIG-PIE-MILK-COW

I'm training on danbooru-figures which is almost 900k images.

Sep 14 '22 01:09 deepglugs

Do you have a model trained now? I would like to ask you about the specific value of the Unet parameter. The results I get so far are not good.The images I've gotten so far are rather blurry. @TheFusion21

Oct 27 '22 13:10 QinSY123

Do you have a model trained now? I would like to ask you about the specific value of the Unet parameter. The results I get so far are not good.The images I've gotten so far are rather blurry. @TheFusion21

unet1 = dict(
    dim = 384,
    cond_dim = 384,
    dim_mults = (1, 2, 3, 4),
    num_resnet_blocks = 3,
    attn_dim_head = 64,
    attn_heads = 8,
    layer_attns = (False, True, True, True),
    layer_cross_attns = (False, True, True, True),
    memory_efficient = False,
)
unet2 = dict(
    dim = 128,
    cond_dim = 128, 
    dim_mults = (1, 2, 3, 4),
    num_resnet_blocks = (2, 4, 8, 8),
    attn_dim_head = 64,
    attn_heads = 8,
    layer_attns = (False, False, False, True),
    layer_cross_attns = (False, False, False, True),
    memory_efficient = True,
)
unet3 = dict(
    dim = 128,
    cond_dim = 128,
    dim_mults = (1, 2, 3, 4),
    num_resnet_blocks = (2, 4, 8, 8),
    attn_dim_head = 64,
    attn_heads = 8,
    layer_attns = False,
    layer_cross_attns = (False, False, False, True),
    memory_efficient = True,
)

imagen = ImagenConfig(
    unets = [unet1, unet2, unet3],
    image_sizes = (64, 256, 1024),
    timesteps = 256,
    condition_on_text = True,
    cond_drop_prob = 0.1,
    random_crop_sizes = (None, 64, 256)
).create()

trainer = ImagenTrainer(
    imagen = imagen,
    lr = 1e-4,
    cosine_decay_max_steps = 1500000,
    warmup_steps = 7500
)

This is my configuration. Unet 1 should optimaly have 512 in dim but I had to reduce it for memory reasons.

Oct 28 '22 13:10 TheFusion21

condition_on_text

Thanks!

Oct 28 '22 15:10 QinSY123

Do you have a model trained now? I would like to ask you about the specific value of the Unet parameter. The results I get so far are not good.The images I've gotten so far are rather blurry. @TheFusion21

unet1 = dict(
    dim = 384,
    cond_dim = 384,
    dim_mults = (1, 2, 3, 4),
    num_resnet_blocks = 3,
    attn_dim_head = 64,
    attn_heads = 8,
    layer_attns = (False, True, True, True),
    layer_cross_attns = (False, True, True, True),
    memory_efficient = False,
)
unet2 = dict(
    dim = 128,
    cond_dim = 128, 
    dim_mults = (1, 2, 3, 4),
    num_resnet_blocks = (2, 4, 8, 8),
    attn_dim_head = 64,
    attn_heads = 8,
    layer_attns = (False, False, False, True),
    layer_cross_attns = (False, False, False, True),
    memory_efficient = True,
)
unet3 = dict(
    dim = 128,
    cond_dim = 128,
    dim_mults = (1, 2, 3, 4),
    num_resnet_blocks = (2, 4, 8, 8),
    attn_dim_head = 64,
    attn_heads = 8,
    layer_attns = False,
    layer_cross_attns = (False, False, False, True),
    memory_efficient = True,
)

imagen = ImagenConfig(
    unets = [unet1, unet2, unet3],
    image_sizes = (64, 256, 1024),
    timesteps = 256,
    condition_on_text = True,
    cond_drop_prob = 0.1,
    random_crop_sizes = (None, 64, 256)
).create()

trainer = ImagenTrainer(
    imagen = imagen,
    lr = 1e-4,
    cosine_decay_max_steps = 1500000,
    warmup_steps = 7500
)

This is my configuration. Unet 1 should optimaly have 512 in dim but I had to reduce it for memory reasons.

I still have a question, should I train Unet1, Unet2, and Unet3 separately or update the parameters of all three networks in one step?

Oct 29 '22 04:10 QinSY123

@TheFusion21 I still have a question, should I train Unet1, Unet2, and Unet3 separately or update the parameters of all three networks in one step?

Oct 30 '22 10:10 QinSY123

You can't train them together in one step

Oct 30 '22 16:10 TheFusion21

You can't train them together in one step

All right. Thanks!

Oct 31 '22 03:10 QinSY123

@TheFusion21 Sorry to bother you again, but could you send me the checkpoint.pt of your training? I have sent an email to your email address.

Nov 03 '22 06:11 QinSY123

@TheFusion21 Sorry to bother you again, but could you send me the checkpoint.pt of your training? I have sent an email to your email address.

Can't do it is trained on private data

Nov 03 '22 09:11 TheFusion21

@TheFusion21 Sorry to bother you again, but could you send me the checkpoint.pt of your training? I have sent an email to your email address.

Can't do it is trained on private data

All right, Thanks

Nov 03 '22 16:11 QinSY123

@TheFusion21 When I use the command "imagen --model" to generate an image, it gives me the error "Command 'imagen' not found". Have you encountered the same problem?

Nov 06 '22 06:11 QinSY123

imagen-pytorch imagen-pytorch copied to clipboard

loss value

imagen-pytorch
imagen-pytorch copied to clipboard