DCT-Net icon indicating copy to clipboard operation
DCT-Net copied to clipboard

Training Clipart-Style tooth or teeth issues

Open aLohrer opened this issue 2 years ago • 6 comments

Hi, congrats on the great paper!

I want to try to port this nice work to a mobile. But before getting started with performance I tried to reproduce the results.

As suggested I went with one of the SD Sytsles as it seemed easy to generate data. I tried the clipart style.

Here is an example of a generated clipart image image

they all look pretty good. Afterwards I went on to genreate samples via stylegan2. image

I realized that the generated cartoon samples of Stylegan are only 256x256 is that an issue ?

Anyway, next step is training the texture translator, starting with anime model as initial weights.

Iteration0 (basically anime style) image

Iteration 1000 image

Iteration 10000 image

Iteration 30000 image

Iteration 100000 image image

Here are the loss curves image

From the images I saw so far, it really is catching the style nicely. But it has a major problem with teeth. Unfortanetly thats quiet an important facial part.

My question is:

  • Did I mess something up in the training procedure. E.g. I saw In your paper under Fig. 11 a similar effect which is countered by the facial perception loss. Is changing the weight of the facial perception loss a good idea to get better teeth (less content faithfull, but better looking)

  • Is the style just not usable with the framework and I should go for some other style instead ?

  • Or is it just an issue with SD generated data.

I am happy to test any different style to validate the training process, if you can point me to a dataset I should use and some intermediate results which are expected to be achieved.

Bonus Question - this is just loud thinking:

As my final goal is to get something really performant. I would like to switch out the Unet by a mobilenet v3. I am currently not sure if a mobilenet can pick up the unsupervised training signal or if it would be better train Unet first and use a teacher / student approach to transfer the training resuluts to a mobilenet in a supervised training fashion. Did you test out different architectures forthe texture translation block ?

Sorry for the many questions, but its such an interesting work I could ask 100 more (but I wont, promised :crossed_fingers: )

aLohrer avatar Apr 12 '23 18:04 aLohrer

Ok, I went to through the training code once more and realized that the facial perception loss is not implemented.

Might this cause the above mentioned issues ? Will you release the source code for the facial perception loss ?

aLohrer avatar Apr 14 '23 10:04 aLohrer

May I ask if you are single card training or multi-card training, currently I use single card training, the effect is very poor.

huimlight avatar Jul 14 '23 08:07 huimlight

@aLohrer May I ask if you are single card training or multi-card training, currently I use single card training, the effect is very poor.

huimlight avatar Jul 14 '23 08:07 huimlight

Single Card,

can you share your results for comparison ?

I think its mostly a problem caused by the missing facial perception loss in my case.

aLohrer avatar Jul 20 '23 12:07 aLohrer

I use single card Nvidia 4090, train 71h, 300k step. the effect is very poor.

  299999_face_result   296999_face_result   295999_face_result

image

h3clikejava avatar Dec 15 '23 01:12 h3clikejava

Hi, @aLohrer . I meet the same question, and I also found that there is no Facial perception loss in the code. Do you address this problem? If so, Could you share your found? Thanks, hope you have a good day.

9527-csroad avatar Mar 05 '24 02:03 9527-csroad