timesketch icon indicating copy to clipboard operation
timesketch copied to clipboard

Timesketch_importer duplicates jsonl events (imports twice)

Open boingomw opened this issue 1 year ago • 11 comments

Describe the bug timesketch_importer runs twice when executed on json_line files, resulting in double events.

To Reproduce Steps to reproduce the behavior:

  1. Create a sample timeline:
    log2timeline.py --storage_file example.plaso /usr/bin
  2. verify how many lines are in the file:
    pinfo.py example.plaso
  3. import into timesketch: timesketch_importer --host http://127.0.0.1:81 -u examiner -p xxxxxx --sketch_id 19 example.plaso
  4. verify in gui you have 1 import and the correct # of events
  5. convert the plaso to json_line (normally, this is done for a reason, like slicing or something): psort.py -o json_line -w example.jsonl example.plaso
  6. get a word count to verify the # of messages hasn't changed: wc example.jsonl
  7. Import this into timesketch: timesketch_importer --host http://127.0.0.1:81 -u examiner -p xxxxx--sketch_id 19 example.jsonl
  8. Check Gui: example-jsonl 14.1K events (2 imports: details) example 7K events (imported with CLI importer tool)

Expected behavior Expected it to not double import

Screenshots image

Desktop (please complete the following information):

  • OS: ubuntu
  • Browser chrome
  • Version 22

latest docker install, as of 6/15/2023

boingomw avatar Jun 15 '23 17:06 boingomw

Same issue as https://github.com/google/timesketch/issues/2334

S-Nicholas avatar Jun 17 '23 19:06 S-Nicholas

@boingomw I see that you have 2 imports reported. This indicates that you ran the importer twice with the same timeline name. That will put the events in the same timeline.

Can you confirm that this isn't the case? I can't reproduce this issue on my end.

berggren avatar Sep 19 '23 12:09 berggren

I did run it twice. once for the .json file and once for the .plaso file. The issue is that the files both had the same amount of lines in them, but when I imported the json file, it ended up having 2x the number of events.

so when you do the process above you end up with 7k for the .json and 7k for the .plaso file?

boingomw avatar Sep 22 '23 14:09 boingomw

I can confirm here that timesketch_importer is also creating doubled sources for JSONL imports, doubling the events on searches. The same does not apply to web imports, that seems to import correctly.

# timesketch_importer --version
API Client Version: 20230721
Importer Client Version: 20230721

If you need a sample jsonl, I can supply.

Regards,

arisjr avatar Sep 25 '23 12:09 arisjr

I just tried it with timesketch --sketch 1 import /usr/local/src/timesketch/temp/sigma_temp.jsonl (the CLI tool) the content of the file being (https://github.com/google/timesketch/blob/master/test_tools/test_events/sigma_events.jsonl):

{"message": "A message","timestamp": 123456789,"datetime": "2015-07-24T19:01:01+00:00","timestamp_desc": "Write time","extra_field_1": "foo"}
{"message": "Another message","timestamp": 123456790,"datetime": "2015-07-24T19:01:02+00:00","timestamp_desc": "Write time","extra_field_1": "bar"}
{"message": "Yet more messages","timestamp": 123456791,"datetime": "2015-07-24T19:01:03+00:00","timestamp_desc": "Write time","extra_field_1": "baz"}
{"message": "Install: zmap:amd64 (1.1.0-1) [Commandline: apt-get install zmap]","timestamp": 123456791,"datetime": "2015-07-24T19:01:03+00:00","timestamp_desc": "foo","command":"Commandline: apt-get install zmap","data_type":"apt:history:line","display_name":"GZIP:/var/log/apt/history.log.1.gz","filename":"/var/log/apt/history.log.1.gz","packages":"Install: zmap:amd64 (1.1.0-1)","parser":"apt_history"}
{"message": "[11 / 0x000b] Source Name: Microsoft-Windows-Sysmon Strings: ['DLL', '2022-01-22 23:07:43.492', '{C784477D-8DE8-61EC-AAAA-000000003C00}', '7812', 'C:\\Windows\\tifubjdl\\lysjbpb.exe', 'C:\\Windows\\itfnduuui\\Corporate\\mimilib.dll', '2022-01-22 23:07:43.492'] Computer Name: DESKTOP-B0TAAAA Record Number: 913 Event Level: 4","computer_name":"DESKTOP-B0TAAAA","data_type":"windows:evtx:record","datetime":"2022-01-22T23:07:43.502205+00:00","display_name":"OS:/data/input/Microsoft-Windows-Sysmon%4Operational.evtx","event_identifier":"11","event_level":"4","message_identifier":"11","parser":"winevtx","source_name":"Microsoft-Windows-Sysmon","timestamp":"1642892863502205","timestamp_desc":"Creation Time" }

And I got a new timeline with 5 events Screenshot 2023-09-25 at 15 10 28

jaegeral avatar Sep 25 '23 13:09 jaegeral

Maybe it's volume related and 5 isn't enough lines to trigger

boingomw avatar Sep 25 '23 16:09 boingomw

sample.zip

@jaegeral , try this. password: sample123 It has 484 events, but doubles up on importing.

Regards

arisjr avatar Sep 25 '23 19:09 arisjr

@jaegeral I just realized that you're using timesketch cli instead of timesketch-import-client (timesketch_importer). Is there any difference on the approaches?

arisjr avatar Sep 25 '23 21:09 arisjr

Hm indeed, it is importing them twice.

jaegeral avatar Sep 26 '23 07:09 jaegeral

fwiw, I am still working on this, it seems my e2e tests in https://github.com/google/timesketch/pull/2976 does not trigger it.

jaegeral avatar Nov 28 '23 17:11 jaegeral

Still seeing this bug in the latest version of TS. Looking at the code this flush call isn't needed since the stream close method calls flush() already. We are seeing duplicates because flush() is called twice (and the _data_lines buffer isn't cleared directly by flush() which makes the method name a bit misleading).

mari0d avatar Jun 27 '24 02:06 mari0d