logbert The experimental data of the paper cannot be reproduced

hi, guo I have tried many times. The following results are always the same, which is far from the results in the paper. Is there any difference between the results in the paper and the code?

Can you add a wechat private chat?

dataset: hdfs git branch: main ==================== logbert ==================== best threshold: 0, best threshold ratio: 0.0 TP: 7602, TN: 549880, FP: 3488, FN: 3045 Precision: 68.55%, Recall: 71.40%, F1-measure: 69.95%

Mar 06 '22 12:03 chinahappyking

Can you share your email?

Thanks

May 08 '22 17:05 HelenGuohx

I have the same issue. Did you end up can reproduce the results?

Oct 12 '22 17:10 hniu1

Can you share your email?

Thanks

[email protected]

Oct 22 '22 14:10 chinahappyking

Thanks for reaching out!! My email is @.***

Best, Nick

On Sat, Oct 22, 2022 at 10:03 AM chinahappyking @.***> wrote:

Can you share your email?

Thanks

@.***

— Reply to this email directly, view it on GitHub https://github.com/HelenGuohx/logbert/issues/24#issuecomment-1287802717, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHVRJ64AIAB6RFL3CSG6RJTWEPX4VANCNFSM5QBCFMLQ . You are receiving this because you commented.Message ID: @.***>

Oct 22 '22 15:10 hniu1

I just tried and have the same results:

best threshold: 0, best threshold ratio: 0.0 TP: 7643, TN: 549806, FP: 3562, FN: 3004 Precision: 68.21%, Recall: 71.79%, F1-measure: 69.95%

I haven't looked (deeply) into the code so far, but is the training data really limited to n=4855, as the code in line 122 in file data_process.py seems to indicate?

generate_train_test(log_sequence_file, n=4855)

How can I train for better results?

Jan 09 '23 20:01 jplasser

I removed n=4855 from the described code line in the previous comment and now I have a lot more training data available. I‘ll post about the results again.

Jan 09 '23 21:01 jplasser

Here are my results after applying the above changes:

best threshold: 0, best threshold ratio: 0.0 TP: 6996, TN: 390662, FP: 95, FN: 3651 Precision: 98.66%, Recall: 65.71%, F1-measure: 78.88%

Recall and F1 are still lower than in the paper, which were P=87.02, R=78.10, and F1=82.32 Caveat> I stopped training after 60 epochs, so this could be a reason for the underperforming values.

Jan 10 '23 10:01 jplasser

One more, after finishing training with a batch size of 512 with HDFS, val loss=0.183, train loss=0.178, 135 epochs, takes about 35 minutes on a RTX 3090.

best threshold: 0, best threshold ratio: 0.0 TP: 7583, TN: 390484, FP: 273, FN: 3064 Precision: 96.52%, Recall: 71.22%, F1-measure: 81.97%

Jan 10 '23 11:01 jplasser

Here is my result, training with a batch size of 512 with HDFS, val loss=0.537, train loss=0.451, 87 epochs, takes about 39 minutes on a RTX 3090.

best threshold: 0, best threshold ratio: 0.0 TP: 7908, TN: 389836, FP: 921, FN: 2739 Precision: 89.57%, Recall: 74.27%, F1-measure: 81.21% elapsed_time: 561.5744488239288

Apr 19 '23 03:04 Yudi-Pan

logbert logbert copied to clipboard

The experimental data of the paper cannot be reproduced

logbert
logbert copied to clipboard