pygod
pygod copied to clipboard
Inconsistent prediction: pred in logger vs pred from .predict function
I have got a GAE model trained with data pyg_graph_train. Then, I use pyg_graph_test for model prediction.
I tried this:
pred, score = model.predict( pyg_graph_test, label = pyg_graph_test.label, return_score=True )
And I got "Recall 0.7490 | Precision 0.7490 | AP 0.6226 | F1 0.7490"
But when I check the pred and score:
f1_score(y_true=pyg_graph_test.label, y_pred=pred)
I got 0.34680888045878483, which is inconsistent.
I found that the returned pred from the predict function is not the same as that in the logger function (pygod.utls.utility), because of different threshold values.
In the logger function:
contamination = sum(target) / len(target)
threshold = np.percentile(score, 100 * (1 - contamination))
pred = (score > threshold).long()
In contrast, in the predict function (pygod.detector.base):
if return_pred:
pred = (score > self.threshold_).long()
The "self.threshold_" is determined in _process_decision_score as:
self.threshold_ = np.percentile(self.decision_score_, 100 * (1 - self.contamination))
So, which prediction (i.e. which threshold value) is correct? Or is there something I may have missed/overlooked instead?
Sorry for the confusion.
If you do have the label or you know exactly how many outliers are in the dataset, e.g., 15%, you can specify the contamination in the initialization of the detector, for example model = DOMINANT(contamination=0.15). The model will make the binary prediction pred based on this contamination.
However, in many cases, our user does not have any label. We set a default contamination to 0.1. The threshold is changed correspondingly. That's why you got ~0.3 for F1. The ~0.7 F1 is evaluated with labels, which means the contamination is set to an ideal value.
To avoid setting the threshold, we also provide AUC, AP, and Recall@k for easier evaluation.
Hello. I've also been using GAE for anomaly detection recently. However, errors have been constantly reported during the import process. Could I refer to your usage code?
The following is my error message. Thank you very much.
RuntimeError: pyg::neighbor_sample() Expected a value of type 'Optional[Tensor]' for argument 'edge_weight' but instead found type 'bool'.
The following is my code:
from pygod.detector import GAE from pygod.utils import load_data from sklearn.metrics import roc_auc_score, average_precision_score
Function to train the anomaly detector
def train_anomaly_detector(model, graph): return model.fit(graph)
Function to evaluate the anomaly detector
def eval_anomaly_detector(model, graph): outlier_scores = model.decision_function(graph) auc = roc_auc_score(graph.y.numpy(), outlier_scores) ap = average_precision_score(graph.y.numpy(), outlier_scores) print(f'AUC Score: {auc:.3f}') print(f'AP Score: {ap:.3f}')
graph = load_data('weibo')
Initialize and evaluate the model
graph.y = graph.y.bool()
if hasattr(graph, 'edge_weight'): graph.edge_weight = None
model = GAE(epoch=100) model = train_anomaly_detector(model, graph) eval_anomaly_detector(model, graph)