pygod Flickr is not consistent with the "Flickr" dataset in pyg

Flickr is not consistent with the "Flickr" dataset in pyg

Open goldenNormal opened this issue 3 years ago • 4 comments

The Flickr dataset used in some god papers is the Flickr in pyg. And inj_ Flickr implemented in your library has a very different dataset. I hope you can use a correct data set.

Sep 08 '22 07:09 goldenNormal

Thanks for you issue. 'inj_flickr' in our library is based on flickr dataset in PyG. As there is no ground truth of anomalies in the original flickr dataset, we need to inject anomalies into the graph. Specifically, we use built-in generators to inject the outliers, and more details are available in our benchmark paper.

Sep 10 '22 21:09 kayzliu

Sorry, I just found two Flickr datasets in pyg. The Flickr dataset I refer to is under AttributedGraphDataset .Many graph anomaly detection papers are based on this dataset. You can find that they have different node attributes and edge numbers.

Sep 11 '22 04:09 goldenNormal

We do notice the difference between two datasets. We provide 'inj_flickr' dataset NOT for comparison to the other flickr dataset. This flickr dataset is larger in terms of the number of nodes and number of edges. We would like to offer datasets diverse in scale for future evaluation. Also, previous injection of structural outliers are fully connected, which are too easy to detect. We add an edge drop probability for fully connected clusters, making the structural outlier less explicit. For more details, check our benchmark paper.

Sep 12 '22 20:09 kayzliu

The Flickr under the AttributedGraphDataset of pyg, has the feature dim of 12407, which is different from the feature dim of 500 in the other. Other difference likes the average node degree. These will significantly change the performance of different baseline.

Sep 13 '22 03:09 goldenNormal

pygod pygod copied to clipboard

Flickr is not consistent with the "Flickr" dataset in pyg

pygod
pygod copied to clipboard