ak01user

Results 18 comments of ak01user

you could input [512,768], resolution should be divisible by 32 i think,

hi, the model_kwargs look like model_kwargs of inference, can you tell me what x0 is, is gt_frame's vae encode_features?such as 1,sq,4,96,64.

it does not matter,you are so kind,there are good news that I did fine-tune on my own dataset,one person's dancing video with 340 frames.I am just training ['local_image_embedding','local_image_embedding_after'] two blocks'...

hi,I tried to fine-tune a specific person's data, but the result was not very good. How should I modify it?I notice that the loss_type is 'mse', and var_type is 'fixed_small',is...

hi,after testing, the model does not seem to perform well for side faces and turns. I think it is because the model cannot obtain the information of the clothes on...

hi,can you show your results?maybe it was a question of preprocessing.

sorry,I notice that both random_ref and local_image mode are all from selected refrence image to dance in your code,I did not find much difference. @wangxiang1230

> > sorry,I notice that both random_ref and local_image mode are all from selected refrence image to dance in your code,I did not find much difference. @wangxiang1230 > > Hi,...

sorry,what is the v-prediction mean,could you give us some more things about training/finetuning?