DeepLog
DeepLog copied to clipboard
Meaning of numbers in dataset
Hi,
I'm looking at the data folder of this repo. Can someone explain what the numbers in these files mean?
Here's an example of the numbers in the file I'm referring to: https://github.com/wuyifan18/DeepLog/blob/master/data/hdfs_train
Any help would be appreciated.
@danielhanbitlee, Each row represents the integer representation of event (logs) sequence corresponding to each unique block_id/session id
@riyu94 Thanks for the quick response.
@riyu94 One more question. How are the rows divided? When do you create a new row?
@danielhanbitlee Rows are divided based on each unique block_id and we create a new row when there is a new block_id in the structured event logs
@riyu94 I see. How do you create a block_id?
You need to look for unique block id in each event log message field
@riyu94 Thanks for your answer!
Hello @riyu94 you explained it very well but it is my lack of understanding that I am still not getting how actually a log key sequence is being generated kindly elaborate on the process of generating log key sequence from the log file(Which log file to be used for generating log key sequence)?
@riyu94 I have another question as I'm a little bit confused about the steps. Which of the following steps do we take to generate to generate the numbers as seen here?
- logs -> Spell -> sort log keys based on block id
- logs -> sort log keys based on block id -> Spell for each block id separately
I'm thinking 1 is the way it's done. Just want to confirm.