pygod
pygod copied to clipboard
Flickr is not consistent with the "Flickr" dataset in pyg
The Flickr dataset used in some god papers is the Flickr in pyg. And inj_ Flickr implemented in your library has a very different dataset. I hope you can use a correct data set.
Thanks for you issue. 'inj_flickr' in our library is based on flickr dataset in PyG. As there is no ground truth of anomalies in the original flickr dataset, we need to inject anomalies into the graph. Specifically, we use built-in generators to inject the outliers, and more details are available in our benchmark paper.
Sorry, I just found two Flickr datasets in pyg. The Flickr dataset I refer to is under AttributedGraphDataset .Many graph anomaly detection papers are based on this dataset. You can find that they have different node attributes and edge numbers.
We do notice the difference between two datasets. We provide 'inj_flickr' dataset NOT for comparison to the other flickr dataset. This flickr dataset is larger in terms of the number of nodes and number of edges. We would like to offer datasets diverse in scale for future evaluation. Also, previous injection of structural outliers are fully connected, which are too easy to detect. We add an edge drop probability for fully connected clusters, making the structural outlier less explicit. For more details, check our benchmark paper.
The Flickr under the AttributedGraphDataset of pyg, has the feature dim of 12407, which is different from the feature dim of 500 in the other. Other difference likes the average node degree. These will significantly change the performance of different baseline.