anomalous-acm
anomalous-acm copied to clipboard
Ground true labels of datasets you used
Could you please share the groud true labels of your Yahoo real datasets, synthetic datasets or the datasets in the folder 'data' on this website?
Read data doesn't have ground truth labels. If it did, we wouldn't need to do anomaly detection. For the synthetic data, the outliers are so extreme that even simply summary statistics will find them. For example,
which(colMeans(dat5) > 1e6)
# X49 X65 X78
# 49 65 78
How did you evaluate the performance of your algorithm in your paper without groud truth labels for real datasets you used?
See https://robjhyndman.com/papers/icdm2015.pdf for a description of the evaluation. The original data from Yahoo did contain labels based on their internal assessment of evidence of malicious activity, new feature deployment or a traffic shift. We did not have permission to make those assessments publicly available. In any case, they were not "ground truth" labels, but are simply based on an alternative assessment of unusual behaviour using additional information.