DeepLog
DeepLog copied to clipboard
Reproducing from the HDFS logs including parsing and encoding
The data in this repo is already encoded. I tried looking through the other issues to get an understanding of how to reproduce the results using the original HDFS dataset and haven't been able to understand what to do.
I understand that the data needs to be parsed and encoded and Drain is a recommended tool for parsing. From there, it isn't clear if that is actually the tool used and what part or parts of the parsed data to use. I see in the conclusion of the paper this : DeepLog learns and encodes entire log message including timestamp, log key, and parameter values.
Unsure if that is also what is done for this implementation or not.
This repo only implements the log key anomaly detection model.
Thanks @wuyifan18 . How did you go about tokenizing and creating the numerical representation from the log keys ?
@c1505 Just encode log keys from 0 to the number of log keys.
Could you share the orginal labeled logs in this code?
@wuyifan18 I have same question. could you please share <Event Template> encoding technique. If you don't feel comfortable please share some articles!
@shoaib-intro can you please check https://github.com/wuyifan18/DeepLog/issues/41#issuecomment-1087807237 it may help although I am not so sure
@wuyifan18 I have same question. could you please share encoding technique. If you don't feel comfortable please share some articles!
the encording technique uses the loghub and logparser, the first one present the original log files and the second presents the log template generator, which can be found on github
@shoaib-intro can you please check #41 (comment) it may help although I am not so sure
Yes, I have gone through thanks for that but the problem there is not always block id available if we talk about application logs and in that case I have combined log keys based on Component which is unique in my case. where some components has sequence length of **213k**
in that case I face index out of bound error IndexError: Target -1 is out of bounds.
over line loss = criterion(output, label.to(device))
any idea for that
@shoaib-intro i am sorry but i really dont have any idea on that
@shoaib-intro can you please check #41 (comment) it may help although I am not so sure
Yes, I have gone through thanks for that but the problem there is not always block id available if we talk about application logs and in that case I have combined log keys based on Component which is unique in my case. where some components has sequence length of
**213k**
in that case I face index out of bound errorIndexError: Target -1 is out of bounds.
over lineloss = criterion(output, label.to(device))
any idea for that
this happened my training data contains negative numbers which I removed and issue resolved.