I have got a GAE model trained with data pyg_graph_train. Then, I use pyg_graph_test for model prediction.

I tried this: pred, score = model.predict( pyg_graph_test, label = pyg_graph_test.label, return_score=True ) And I got "Recall 0.7490 | Precision 0.7490 | AP 0.6226 | F1 0.7490"

But when I check the pred and score: f1_score(y_true=pyg_graph_test.label, y_pred=pred) I got 0.34680888045878483, which is inconsistent.

I found that the returned pred from the predict function is not the same as that in the logger function (pygod.utls.utility), because of different threshold values. In the logger function: contamination = sum(target) / len(target) threshold = np.percentile(score, 100 * (1 - contamination)) pred = (score > threshold).long()

In contrast, in the predict function (pygod.detector.base): if return_pred: pred = (score > self.threshold_).long() The "self.threshold_" is determined in _process_decision_score as: self.threshold_ = np.percentile(self.decision_score_, 100 * (1 - self.contamination))

So, which prediction (i.e. which threshold value) is correct? Or is there something I may have missed/overlooked instead?

Nov 05 '24 04:11 chpoonag

Sorry for the confusion.

If you do have the label or you know exactly how many outliers are in the dataset, e.g., 15%, you can specify the contamination in the initialization of the detector, for example model = DOMINANT(contamination=0.15). The model will make the binary prediction pred based on this contamination.

However, in many cases, our user does not have any label. We set a default contamination to 0.1. The threshold is changed correspondingly. That's why you got ~0.3 for F1. The ~0.7 F1 is evaluated with labels, which means the contamination is set to an ideal value.

To avoid setting the threshold, we also provide AUC, AP, and Recall@k for easier evaluation.

Nov 08 '24 02:11 kayzliu

Hello. I've also been using GAE for anomaly detection recently. However, errors have been constantly reported during the import process. Could I refer to your usage code?

The following is my error message. Thank you very much.

RuntimeError: pyg::neighbor_sample() Expected a value of type 'Optional[Tensor]' for argument 'edge_weight' but instead found type 'bool'.

The following is my code:

from pygod.detector import GAE from pygod.utils import load_data from sklearn.metrics import roc_auc_score, average_precision_score

Function to train the anomaly detector

def train_anomaly_detector(model, graph): return model.fit(graph)

Function to evaluate the anomaly detector

def eval_anomaly_detector(model, graph): outlier_scores = model.decision_function(graph) auc = roc_auc_score(graph.y.numpy(), outlier_scores) ap = average_precision_score(graph.y.numpy(), outlier_scores) print(f'AUC Score: {auc:.3f}') print(f'AP Score: {ap:.3f}')

graph = load_data('weibo')

Initialize and evaluate the model

graph.y = graph.y.bool()

if hasattr(graph, 'edge_weight'): graph.edge_weight = None

model = GAE(epoch=100) model = train_anomaly_detector(model, graph) eval_anomaly_detector(model, graph)

Nov 27 '24 00:11 withMoonstar

pygod
pygod copied to clipboard

Inconsistent prediction: pred in logger vs pred from .predict function

Function to train the anomaly detector

Function to evaluate the anomaly detector

Initialize and evaluate the model

pygod pygod copied to clipboard

Inconsistent prediction: pred in logger vs pred from .predict function

Function to train the anomaly detector

Function to evaluate the anomaly detector

Initialize and evaluate the model

pygod
pygod copied to clipboard