Face-X-Ray icon indicating copy to clipboard operation
Face-X-Ray copied to clipboard

Have you achieved the similar results to the reported ones in the original paper?

Open yzhang2016 opened this issue 5 years ago • 15 comments

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

yzhang2016 avatar Apr 09 '20 09:04 yzhang2016

No, I did not dive deep into this.

neverUseThisName avatar May 15 '20 06:05 neverUseThisName

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

jerry4h avatar May 18 '20 03:05 jerry4h

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

My implementation did not generalize well on F2F and NT when training on the constructed BI database.

yzhang2016 avatar May 18 '20 06:05 yzhang2016

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

My implementation did not generalize well on F2F and NT when training on the constructed BI database.

When training on my constructed BI database, loss decreases quickly from 1000 to 10 in the first 2000 iterations without freezing pretrained HRNet-18w parameters. During the evaluation, the model seems to have only learned my blending fingerprint. My training leads to heavily overfitting. I'm puzzled about that. Did you apply data augmentation or any trick not mentioned in the paper? I think my implementation strictly follows the original paper. Thanks.

jerry4h avatar May 18 '20 07:05 jerry4h

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

My implementation did not generalize well on F2F and NT when training on the constructed BI database.

When training on my constructed BI database, loss decreases quickly from 1000 to 10 in the first 2000 iterations without freezing pretrained HRNet-18w parameters. During the evaluation, the model seems to have only learned my blending fingerprint. My training leads to heavily overfitting. I'm puzzled about that. Did you apply data augmentation or any trick not mentioned in the paper? I think my implementation strictly follows the original paper. Thanks.

For data augmentation, adding random noise and blurring are used.

yzhang2016 avatar May 18 '20 08:05 yzhang2016

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

My implementation on my repo fails to generalize, can I ask you for some advice?

My implementation did not generalize well on F2F and NT when training on the constructed BI database.

When training on my constructed BI database, loss decreases quickly from 1000 to 10 in the first 2000 iterations without freezing pretrained HRNet-18w parameters. During the evaluation, the model seems to have only learned my blending fingerprint. My training leads to heavily overfitting. I'm puzzled about that. Did you apply data augmentation or any trick not mentioned in the paper? I think my implementation strictly follows the original paper. Thanks.

Did you align the results of your implementation and the reported results in the paper?

yzhang2016 avatar May 31 '20 03:05 yzhang2016

Hi, @yzhang2016 @jerry4h . My implementation also heavily overfits BI dataset. I also notice that in my experiments the noise difference in fake images from BI dataset is much more obvious than those from DF which may explain the overfitting. The tool I used is the one introduced in Fig. 2 in the original paper. My frames are all extracted from c23 videos. Could it be the reason that the video is compressed so details are lost? I have not downloaded the raw videos because it takes much more disk space.

wshenx avatar Jun 02 '20 06:06 wshenx

Hi, @yzhang2016 @jerry4h . My implementation also heavily overfits BI dataset. I also notice that in my experiments the noise difference in fake images from BI dataset is much more obvious than those from DF which may explain the overfitting. The tool I used is the one introduced in Fig. 2 in the original paper. My frames are all extracted from c23 videos. Could it be the reason that the video is compressed so details are lost? I have not downloaded the raw videos because it takes much more disk space.

I don't think this is caused by not using the raw videos. In most works of face forgery detection, high-quality videos (c23) are used for training. In my case, the distribution of the generated BI database seems closer to those of DF and FS than those of F2F and NT.

yzhang2016 avatar Jun 03 '20 01:06 yzhang2016

Hi, @yzhang2016 @jerry4h . My implementation also heavily overfits BI dataset. I also notice that in my experiments the noise difference in fake images from BI dataset is much more obvious than those from DF which may explain the overfitting. The tool I used is the one introduced in Fig. 2 in the original paper. My frames are all extracted from c23 videos. Could it be the reason that the video is compressed so details are lost? I have not downloaded the raw videos because it takes much more disk space.

I don't think this is caused by not using the raw videos. In most works of face forgery detection, high-quality videos (c23) are used for training. In my case, the distribution of the generated BI database seems closer to those of DF and FS than those of F2F and NT.

Thank you for your reply. I notice that in the limitation section of the paper the authors say "We test our framework on the HQ version (a light compression) and the LQ version (a heavy compression) of FF++ dataset and the overall AUC are 87.35% and 61.6% respectively." It seems that resolution matters. Nevertheless, I'll keep on trying on c23 images.

wshenx avatar Jun 03 '20 02:06 wshenx

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation?

Could you tell me what is your acc on the c23 deepfake and face2face?

skJack avatar Aug 12 '20 02:08 skJack

Hi,@yzhang2016 . When I reimplemented face xray, I also encountered overfitting problem to the generated data, which is the similar situation as @jerry4h. I trained the BI dataset, the model only catched the blending fingerprint in BI evaluation set, but failed to detect the blending boundaries to the deepfake in FF++_c23 dataset. I checked some examples in my BI dataset, the generated fake face is hard for me to distinguish. So I think the generated data is ok, the reason of poor generation may be that the blending operation is far away from the synthetic process in FF++. Can you tell me your detail parameter in your experiment? I also notice you used random noise and blurring to augment data,whether this operation was implemented on the foreground face or on the whole generated image? Hope for your reply, thank you.

LoveSiameseCat avatar Nov 12 '20 09:11 LoveSiameseCat

@yzhang2016 , @ChineseboyLuo, @jerry4h Hi guys, do you mind sharing your neural network architecture? specifically init and forward functions for Neural Network architecture? (It was Called NNb in the paper). It would be very appreciated and helpful.

AugustasMacys avatar Jan 28 '21 13:01 AugustasMacys

I followed the paper for re-implementation. I got good results on DF and FS, but pool results on F2F and NT. Did you encounter a similar situation? Hello, May I ask your some questions about Face X-ray? I've followed your github id,can you tell me your email?

byx-123 avatar Feb 18 '21 01:02 byx-123

Hi folks! Have you had the chance to overcome the generalization problem with results similar to the original paper ?

gleonato avatar Sep 09 '21 01:09 gleonato

Hi,

@gleonato

I did not manage to get similar results with Deepfake Detection Chalenge. However, what I noticed was that generating Deepfakes for this paper is very important, if your generated Deepfakes will differ from the test data, it will not generalize.

AugustasMMatches avatar Sep 09 '21 08:09 AugustasMMatches