Thanks for your persistence

Open JK737353 opened this issue 3 years ago • 1 comments

I'm also working on DFL recently, but I didn't expect to use gray and white photos as a dataset, I watched your experimental process from start to finish, I think binarized images can only reduce the complexity of the calculation, not improve it Computing speed, if you want to simplify its inference process, you can only optimize it from the neural network, delete some unnecessary calculations, and add some tools to help inference acceleration, so that 100K iterations can reach 1000K performance. But why do we have to train a separate model for each face? Couldn't there be some simpler way to achieve this result? For example: https://github.com/cleardusk/3DDFA_V2. I've found many easier ways, just can't experiment with them well due to my limited expertise. If you need some auxiliary tools about DFL, I can provide you with them, such as tools to automatically draw faces, and tools to repair faces in batches, if you need, please send me your email

Oct 14 '22 17:10 JK737353

Thanks for watching, I'm glad it was interesting!

It's still not finished, though. :)

I'm also working on DFL recently,

Do you have particular goals with your project or it's an exercise/exploration? (Mine is both.)

, I think binarized images can only reduce the complexity of the calculation, not improve it Computing speed, if you want to simplify its inference process, you can only optimize it from the neural network, delete some unnecessary calculations, and add some tools to help inference acceleration, so that 100K iterations can reach 1000K performance.

You mean grayscale images? (Although "binarized" is an idea for using sketches etc., drawing the expressions etc. like facial landmarks (smoother) and generating well shaded faces from that).

Computing speed? The change of the neural model here is just reducing it to a single channel instead of three.

Right it doesn't improve computing speed, but the speed of (time for) training of a model for the same resolution as if it was in color. The inference time is reduced as well (compared to color) due to the simpler models for the same resolution. Also it makes possible training such high quality faces on poor GPUs.

That's what I mean with high performance, achieving more with the same hardware.

Initially I tried to reduce some of the layers, remove some layers (due to the presence of a single channel, part of the "embracing" of the adjacent 3 color channels possibly could be avoided with preserved quality, or maybe as it is it allows to achieve higher sharpness and it should stay), but I didn't fit it correctly then. That's in DeepFakeArchi.py. I may make other attempts when I switch to coding here again, I haven't coded the project since the colorization POC, which I haven't pushed here yet.

But why do we have to train a separate model for each face? Couldn't there be some simpler way to achieve this result? For example: https://github.com/cleardusk/3DDFA_V2. I've found many easier ways, just can't experiment with them well due to my limited expertise.

Thanks for the link, I'll try it later. BTW, after you've trained one face pair, training another would be quicker, it would be like "finetuning".

In my tests I train different sizes and architectures, if I was producing clips with many faces, I'd use some of the already trained models and finetune them. I already did a bit for the Stoltenberg clip, it was finetuned from the Arnold's model. (In these cases of self-to-self it's faster because it turns into a simpler autoencoder/repairing image-to-image).

If you need some auxiliary tools about DFL, I can provide you with them, such as tools to automatically draw faces, and tools to repair faces in batches, if you need, please send me your email

Automatically draw faces - you mean to draw the facial landmarks, right? (I think there were some available as cmd line, debug landmarks and also interactive in some modes) OK, send them - there's an email in the profile.

I think mentioned somewhere in the project log about ideas to draw/paint the parts of the face, i.e. like an artist, not as convolutions from the NN. The facial landmarks give exact coordinates, there are samples of the eyes etc. and both dataset images and probe generated etc.

The whole process of training NN on the same/approximately same dataset/"distribution has something "wrong", a redundancy. The final result is actually visible since the early iterations, 10000, 20000, 50000, depending on the criterium - the final optimized result is predictable, it's "the same", but sharper, "upsampled", "enhanced", maybe more precisely and logically: "deconvolved".Indeed, there is such operation/filter.

IMO that itself suggests that the process can be shortened.

Oct 14 '22 19:10 Twenkid