DCT-Net Training Clipart-Style tooth or teeth issues

Hi, congrats on the great paper!

I want to try to port this nice work to a mobile. But before getting started with performance I tried to reproduce the results.

As suggested I went with one of the SD Sytsles as it seemed easy to generate data. I tried the clipart style.

Here is an example of a generated clipart image

they all look pretty good. Afterwards I went on to genreate samples via stylegan2.

I realized that the generated cartoon samples of Stylegan are only 256x256 is that an issue ?

Anyway, next step is training the texture translator, starting with anime model as initial weights.

Iteration0 (basically anime style)

Iteration 1000

Iteration 10000

Iteration 30000

Iteration 100000

Here are the loss curves

From the images I saw so far, it really is catching the style nicely. But it has a major problem with teeth. Unfortanetly thats quiet an important facial part.

My question is:

Did I mess something up in the training procedure. E.g. I saw In your paper under Fig. 11 a similar effect which is countered by the facial perception loss. Is changing the weight of the facial perception loss a good idea to get better teeth (less content faithfull, but better looking)
Is the style just not usable with the framework and I should go for some other style instead ?
Or is it just an issue with SD generated data.

I am happy to test any different style to validate the training process, if you can point me to a dataset I should use and some intermediate results which are expected to be achieved.

Bonus Question - this is just loud thinking:

As my final goal is to get something really performant. I would like to switch out the Unet by a mobilenet v3. I am currently not sure if a mobilenet can pick up the unsupervised training signal or if it would be better train Unet first and use a teacher / student approach to transfer the training resuluts to a mobilenet in a supervised training fashion. Did you test out different architectures forthe texture translation block ?

Sorry for the many questions, but its such an interesting work I could ask 100 more (but I wont, promised :crossed_fingers: )

Apr 12 '23 18:04 aLohrer

Ok, I went to through the training code once more and realized that the facial perception loss is not implemented.

Might this cause the above mentioned issues ? Will you release the source code for the facial perception loss ?

Apr 14 '23 10:04 aLohrer

May I ask if you are single card training or multi-card training, currently I use single card training, the effect is very poor.

Jul 14 '23 08:07 huimlight

@aLohrer May I ask if you are single card training or multi-card training, currently I use single card training, the effect is very poor.

Jul 14 '23 08:07 huimlight

Single Card,

can you share your results for comparison ?

I think its mostly a problem caused by the missing facial perception loss in my case.

Jul 20 '23 12:07 aLohrer

I use single card Nvidia 4090, train 71h, 300k step. the effect is very poor.

299999_face_result 296999_face_result 295999_face_result

Dec 15 '23 01:12 h3clikejava

Hi, @aLohrer . I meet the same question, and I also found that there is no Facial perception loss in the code. Do you address this problem? If so, Could you share your found? Thanks, hope you have a good day.

Mar 05 '24 02:03 9527-csroad