DeepLog icon indicating copy to clipboard operation
DeepLog copied to clipboard

Reproducing from the HDFS logs including parsing and encoding

Open c1505 opened this issue 4 years ago • 10 comments

The data in this repo is already encoded. I tried looking through the other issues to get an understanding of how to reproduce the results using the original HDFS dataset and haven't been able to understand what to do.

I understand that the data needs to be parsed and encoded and Drain is a recommended tool for parsing. From there, it isn't clear if that is actually the tool used and what part or parts of the parsed data to use. I see in the conclusion of the paper this : DeepLog learns and encodes entire log message including timestamp, log key, and parameter values. Unsure if that is also what is done for this implementation or not.

c1505 avatar Aug 02 '20 20:08 c1505

This repo only implements the log key anomaly detection model.

wuyifan18 avatar Aug 03 '20 06:08 wuyifan18

Thanks @wuyifan18 . How did you go about tokenizing and creating the numerical representation from the log keys ?

c1505 avatar Aug 03 '20 17:08 c1505

@c1505 Just encode log keys from 0 to the number of log keys.

wuyifan18 avatar Aug 04 '20 01:08 wuyifan18

Could you share the orginal labeled logs in this code?

Nothing-bit avatar Nov 29 '20 07:11 Nothing-bit

@wuyifan18 I have same question. could you please share <Event Template> encoding technique. If you don't feel comfortable please share some articles!

shoaib-intro avatar Apr 04 '22 11:04 shoaib-intro

@shoaib-intro can you please check https://github.com/wuyifan18/DeepLog/issues/41#issuecomment-1087807237 it may help although I am not so sure

OutOfBoundCats avatar Apr 04 '22 17:04 OutOfBoundCats

@wuyifan18 I have same question. could you please share encoding technique. If you don't feel comfortable please share some articles!

the encording technique uses the loghub and logparser, the first one present the original log files and the second presents the log template generator, which can be found on github

Nothing-bit avatar Apr 07 '22 11:04 Nothing-bit

@shoaib-intro can you please check #41 (comment) it may help although I am not so sure

Yes, I have gone through thanks for that but the problem there is not always block id available if we talk about application logs and in that case I have combined log keys based on Component which is unique in my case. where some components has sequence length of **213k** in that case I face index out of bound error IndexError: Target -1 is out of bounds. over line loss = criterion(output, label.to(device)) any idea for that

shoaib-intro avatar Apr 07 '22 11:04 shoaib-intro

@shoaib-intro i am sorry but i really dont have any idea on that

OutOfBoundCats avatar Apr 07 '22 13:04 OutOfBoundCats

@shoaib-intro can you please check #41 (comment) it may help although I am not so sure

Yes, I have gone through thanks for that but the problem there is not always block id available if we talk about application logs and in that case I have combined log keys based on Component which is unique in my case. where some components has sequence length of **213k** in that case I face index out of bound error IndexError: Target -1 is out of bounds. over line loss = criterion(output, label.to(device)) any idea for that

this happened my training data contains negative numbers which I removed and issue resolved.

shoaib-intro avatar Aug 03 '22 09:08 shoaib-intro