DeepLog icon indicating copy to clipboard operation
DeepLog copied to clipboard

Meaning of numbers in dataset

Open danielhanbitlee opened this issue 6 years ago • 10 comments

Hi,

I'm looking at the data folder of this repo. Can someone explain what the numbers in these files mean?

Here's an example of the numbers in the file I'm referring to: https://github.com/wuyifan18/DeepLog/blob/master/data/hdfs_train

Any help would be appreciated.

danielhanbitlee avatar May 17 '19 21:05 danielhanbitlee

@danielhanbitlee, Each row represents the integer representation of event (logs) sequence corresponding to each unique block_id/session id

riyu94 avatar May 17 '19 23:05 riyu94

@riyu94 Thanks for the quick response.

danielhanbitlee avatar May 18 '19 00:05 danielhanbitlee

@riyu94 One more question. How are the rows divided? When do you create a new row?

danielhanbitlee avatar May 18 '19 00:05 danielhanbitlee

@danielhanbitlee Rows are divided based on each unique block_id and we create a new row when there is a new block_id in the structured event logs

riyu94 avatar May 20 '19 18:05 riyu94

@riyu94 I see. How do you create a block_id?

danielhanbitlee avatar May 20 '19 18:05 danielhanbitlee

You need to look for unique block id in each event log message field

riyu94 avatar May 21 '19 00:05 riyu94

@riyu94 Thanks for your answer!

wuyifan18 avatar May 21 '19 02:05 wuyifan18

Hello @riyu94 you explained it very well but it is my lack of understanding that I am still not getting how actually a log key sequence is being generated kindly elaborate on the process of generating log key sequence from the log file(Which log file to be used for generating log key sequence)?

RahulShrivastava22 avatar May 21 '19 05:05 RahulShrivastava22

@riyu94 I have another question as I'm a little bit confused about the steps. Which of the following steps do we take to generate to generate the numbers as seen here?

  1. logs -> Spell -> sort log keys based on block id
  2. logs -> sort log keys based on block id -> Spell for each block id separately

I'm thinking 1 is the way it's done. Just want to confirm.

danielhanbitlee avatar May 21 '19 22:05 danielhanbitlee