DeepLog icon indicating copy to clipboard operation
DeepLog copied to clipboard

Question regarding the predicted variable

Open nagsubhadeep opened this issue 5 years ago • 7 comments
trafficstars

Yifan,

Source: LogKeyModel_predict.py

In the code below, can you please explain the difference between the output and predicted variables? Is output the same as predicted except it being sorted in tensors? Also, shouldn't the value of the predicted variable be something binary so that we can determine whether the predicted outcome is anomalous or not?

output = model(seq)
predicted = torch.argsort(output, 1)[0][-num_candidates:]

Thanks, Deep

nagsubhadeep avatar Sep 03 '20 13:09 nagsubhadeep

Deep, The output is a probability distribution describing the probability for each log key to appear as the next log key value given the history.

wuyifan18 avatar Sep 03 '20 13:09 wuyifan18

Shouldn't the value of the predicted variable be something binary so that we can determine whether the predicted outcome is anomalous or not? I am getting a one-dimensional array instead.

nagsubhadeep avatar Sep 03 '20 13:09 nagsubhadeep

Sort the possible log keys based on their probabilities and treat a key value as normal if it’s among the top g candidates. A log key is flagged as being from an abnormal execution otherwise.

You can read the paper for details.

wuyifan18 avatar Sep 03 '20 14:09 wuyifan18

@wuyifan18 where can I modify top g in your code?

Rufaida94 avatar Jun 23 '21 20:06 Rufaida94

@Rufaida94 here https://github.com/wuyifan18/DeepLog/blob/502aaf05be4c1251b7dc96f6439025c4fc988c66/LogKeyModel_predict.py#L51

wuyifan18 avatar Jun 24 '21 16:06 wuyifan18

than you @wuyifan18 , I know that num_candidates here is a hyperparameter that is supposed to be changed according to the dataset. But my question is if my data has 24297 num_classes (while your HDFS dataset has only 28 num_classes) what can be a reasonable num_candidates? for example is 1000 too high or too low for num_candidates? I know this is a very vague question but any pointers are appreciated.

Rufaida94 avatar Jul 03 '21 21:07 Rufaida94

@Rufaida94 the num_candidates is a hyperparameter, which means you should adjust it according to the metrics, such as F1 measure.

wuyifan18 avatar Jul 05 '21 02:07 wuyifan18