Anomaly-Transformer icon indicating copy to clipboard operation
Anomaly-Transformer copied to clipboard

Caution: Please be suspicious of this project and code

Open subeom527 opened this issue 1 year ago • 3 comments

This problem is very obvious and has been clearly pointed out by other questioners as well. https://github.com/thuml/Anomaly-Transformer/issues/4 You can easily see the big difference before and after adding the suspicious code called "detection adjustment" in the above link.

The author gives the following incomprehensible and unclear answer to this:

  1. It is a widely accepted practice in this field of academia.

->The two papers you link to as evidence are not published in official journals, and an author is credited with both papers. If so, please provide review papers published in validated journals that support your claims. Even if you are right, if tuning to improve a model's low performance to this high is a practice in academia, then that practice should be eliminated.

  1. This adjustment useful in an actual industrial environment because it is possible to receive information and make adjustments in real time(?)

-> Remember the real time industrial data has no label!! Especially, when it comes to the anomaly detection!

subeom527 avatar Feb 21 '24 04:02 subeom527

Yes, I agree with your points. Adding to what you listed above, one thing I found is that with the evaluation method used in the work, even a (any) random model (randomly initialized model without training!) can yield a good performance as well. You can test it simply by commenting out the torch.load lines for loading checkpoint model in the test function.

Plus, leveraging the anomaly labels from the "testing dataset" when computing anomaly threshold seems to be simply just wrong regardless of whether some previous works did it in the same way or not (it cannot justify the validity of the method when the method looks obviously faulty). Besides, in the used data_loader.py code, the validation set is simply equal to test set (except for SMD data). That is, there is no actual validation set used unlike how it was explained in the paper.

Unfortunately, this problem is being propagating in the community as I found a few other works that adapt this evaluation method for anomaly task, where exactly the same issue as this work are found in as well. I believe the authors of the works have noticed the incorrectness of the method.

Be aware of the works (I found two below) that followed the same evaluation method. "TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis", ICLR 2023. - (same author) "One Fits All: Power General Time Series Analysis by Pretrained LM,", NeurIPS, 2023.

cooma04 avatar Mar 06 '24 08:03 cooma04

I agree with your point of view; the arguments in the references mentioned by the author are not sufficient. Given the rigor of scientific research, I believe this method should not be continued.

The views in the original text are as follows:

“”“ In real applications, the human operators generally do not care about the point-wise metrics. It is acceptable for an algorithm to trigger an alert for any point in a contiguous anomaly segment, if the delay is not too long. Some metrics for anomaly detection have been proposed to accommodate this preference, e.g., [22], but most are not widely accepted, likely because they are too complicated. We instead use a simple strategy: if any point in an anomaly segment in the ground truth can be detected by a chosen threshold, we say this segment is detected correctly, and all points in this segment are treated as if they can be detected by this threshold. Meanwhile, the points outside the anomaly segments are treated as usual. The precision, recall, AUC, F-score and best F-score are then computed accordingly. This approach is illustrated in Fig 7。 “””

caocaort avatar Mar 27 '24 14:03 caocaort