anomalous-acm icon indicating copy to clipboard operation
anomalous-acm copied to clipboard

Ground true labels of datasets you used

Open BuleSky233 opened this issue 2 years ago • 3 comments

Could you please share the groud true labels of your Yahoo real datasets, synthetic datasets or the datasets in the folder 'data' on this website?

BuleSky233 avatar May 23 '22 04:05 BuleSky233

Read data doesn't have ground truth labels. If it did, we wouldn't need to do anomaly detection. For the synthetic data, the outliers are so extreme that even simply summary statistics will find them. For example,

which(colMeans(dat5) > 1e6)
# X49 X65 X78 
#  49  65  78 

robjhyndman avatar May 23 '22 22:05 robjhyndman

How did you evaluate the performance of your algorithm in your paper without groud truth labels for real datasets you used?

BuleSky233 avatar May 24 '22 11:05 BuleSky233

See https://robjhyndman.com/papers/icdm2015.pdf for a description of the evaluation. The original data from Yahoo did contain labels based on their internal assessment of evidence of malicious activity, new feature deployment or a traffic shift. We did not have permission to make those assessments publicly available. In any case, they were not "ground truth" labels, but are simply based on an alternative assessment of unusual behaviour using additional information.

robjhyndman avatar May 24 '22 23:05 robjhyndman