timesketch icon indicating copy to clipboard operation
timesketch copied to clipboard

Support psort filters when ingesting into Timesketch

Open mpilking opened this issue 3 years ago • 7 comments

Is your feature request related to a problem? Please describe. Some hosts produce very large plaso data sets. As an example, a domain controller produced nearly 18 million parsed events when processed with log2timeline. Often I don't want to ingest all of those events. Many of them are from a timeframe that is irrelevent. Also, the majority of parsed events from the DC are Windows event logs. The way log2timeline parses Windows event logs results in duplicate events: one for Creation Time and one for Last Modification Time. I understand where these come from, but 99.99% of the time I only care about the Creation Time events. So, I have a psort filter I can use that will output just the Windows event log Creation events and also filter to a time range of interest. This works great for psort output to CSV. However, I don't know of a way to do an equivalent filter when importing into Timesketch. It would be great to be able to use psort filters with timesketch_importer.

Describe the solution you'd like Here is an example of a psort filter that will narrow down those 18 million events to under 2 million. This is what I'd like to replicate with timesketch_importer or a similar option.

psort.py --output-time-zone 'UTC' -o dynamic -w dc-triage.csv dc-triage.plaso "(((parser == 'winevtx') and (timestamp_desc == 'Creation Time')) or (parser != 'winevtx')) and ( date > datetime('2021-02-01T00:00:00'))"

Describe alternatives you've considered The only other option I'm aware of is to use timesketch_importer or the webui to upload all the data and then go back and delete unwanted documents from the Elasticsearch index. For example, this will delete documents prior to 2021-02-01 and will delete any winevtx events with the timestamp_desc containing "Modification":

curl -XPOST --header 'Content-Type: application/json' localhost:9200/plaso-dc-triage-index/_delete_by_query -d '{"query": {"range": {"datetime": {"time_zone": "+00:00","lt": "2021-02-01T00:00:00"}}}}'
curl -XPOST --header 'Content-Type: application/json' localhost:9200/plaso-dc-triage-index/_delete_by_query -d '{"query" : {"bool" : {"must": [{"match": {"parser": "winevtx"}}, {"match": {"timestamp_desc": "Modification"}}]}}}'

Additional context This issue is tangentially related to Plaso issue #3813

mpilking avatar Aug 28 '21 00:08 mpilking

This is a great idea, and it shouldn't be too dificult to implement. We call psort in the background worker, and we can pass in arguments to the command. We can add a filter argument and let the user (timesketch_importer for example) set that. This should replicate exactly what you do with normal CSV output.

berggren avatar Sep 12 '21 22:09 berggren

Just wanted to say it be awesome if we could implement this! I had actually completely reverted to using psort.py as the timesketch_importer is missing this.

Is this still being worked on?

56616c6f72 avatar Oct 13 '21 17:10 56616c6f72

Just wanted to say it be awesome if we could implement this! I had actually completely reverted to using psort.py as the timesketch_importer is missing this.

Is this still being worked on?

Hey, afaik this is currently not being worked on.

jaegeral avatar Oct 13 '21 20:10 jaegeral

This issue is currently being worked on (ref: https://github.com/google/timesketch/pull/1987)

jleaniz avatar Nov 09 '21 15:11 jleaniz

This issue is currently being worked on (ref: #1987)

I all :) Do you know if a maintainer had the opportunity to take a look at the pull request?

WoBuGs avatar Feb 05 '23 15:02 WoBuGs

Would love to see this implemented. My current workflow uses psort externally with opensearch_ts so I can add filters - https://timesketch.org/developers/api-upload-data/#import-data-already-ingested-into-opensearch

Unless I'm mistaken this assumes you already have an existing timeline, however in my case the evidence is always a new timeline. I generally just attempt to +1 to the previous ID --timeline_identifier however this doesn't always work with concurrency. So I may later have to go in a re-index the Opensearch documents to update the timeline ID.

POST timesketch_index/_update_by_query
{
  "script": {
    "source": "ctx._source.__ts_timeline_id = 4",
    "lang": "painless"
  }
}

Unless I'm mistaken there's no easy way to create a timeline ID prior to uploading content?

pemontto avatar Jul 31 '23 13:07 pemontto

This issue would really help reduce timelines. Is this still being worked on?

Camel0101 avatar Mar 21 '24 19:03 Camel0101