DeepLog
DeepLog copied to clipboard
Question regarding the predicted variable
Yifan,
Source: LogKeyModel_predict.py
In the code below, can you please explain the difference between the output and predicted variables? Is output the same as predicted except it being sorted in tensors? Also, shouldn't the value of the predicted variable be something binary so that we can determine whether the predicted outcome is anomalous or not?
output = model(seq)
predicted = torch.argsort(output, 1)[0][-num_candidates:]
Thanks, Deep
Deep, The output is a probability distribution describing the probability for each log key to appear as the next log key value given the history.
Shouldn't the value of the predicted variable be something binary so that we can determine whether the predicted outcome is anomalous or not? I am getting a one-dimensional array instead.
Sort the possible log keys based on their probabilities and treat a key value as normal if it’s among the top g candidates. A log key is flagged as being from an abnormal execution otherwise.
You can read the paper for details.
@wuyifan18 where can I modify top g in your code?
@Rufaida94 here https://github.com/wuyifan18/DeepLog/blob/502aaf05be4c1251b7dc96f6439025c4fc988c66/LogKeyModel_predict.py#L51
than you @wuyifan18 , I know that num_candidates here is a hyperparameter that is supposed to be changed according to the dataset. But my question is if my data has 24297 num_classes (while your HDFS dataset has only 28 num_classes) what can be a reasonable num_candidates? for example is 1000 too high or too low for num_candidates? I know this is a very vague question but any pointers are appreciated.
@Rufaida94 the num_candidates is a hyperparameter, which means you should adjust it according to the metrics, such as F1 measure.