timesketch
timesketch copied to clipboard
Support psort filters when ingesting into Timesketch
Is your feature request related to a problem? Please describe.
Some hosts produce very large plaso data sets. As an example, a domain controller produced nearly 18 million parsed events when processed with log2timeline. Often I don't want to ingest all of those events. Many of them are from a timeframe that is irrelevent. Also, the majority of parsed events from the DC are Windows event logs. The way log2timeline parses Windows event logs results in duplicate events: one for Creation Time
and one for Last Modification Time
. I understand where these come from, but 99.99% of the time I only care about the Creation Time events. So, I have a psort filter I can use that will output just the Windows event log Creation events and also filter to a time range of interest. This works great for psort output to CSV. However, I don't know of a way to do an equivalent filter when importing into Timesketch. It would be great to be able to use psort filters with timesketch_importer
.
Describe the solution you'd like
Here is an example of a psort filter that will narrow down those 18 million events to under 2 million. This is what I'd like to replicate with timesketch_importer
or a similar option.
psort.py --output-time-zone 'UTC' -o dynamic -w dc-triage.csv dc-triage.plaso "(((parser == 'winevtx') and (timestamp_desc == 'Creation Time')) or (parser != 'winevtx')) and ( date > datetime('2021-02-01T00:00:00'))"
Describe alternatives you've considered
The only other option I'm aware of is to use timesketch_importer
or the webui to upload all the data and then go back and delete unwanted documents from the Elasticsearch index. For example, this will delete documents prior to 2021-02-01 and will delete any winevtx
events with the timestamp_desc
containing "Modification":
curl -XPOST --header 'Content-Type: application/json' localhost:9200/plaso-dc-triage-index/_delete_by_query -d '{"query": {"range": {"datetime": {"time_zone": "+00:00","lt": "2021-02-01T00:00:00"}}}}'
curl -XPOST --header 'Content-Type: application/json' localhost:9200/plaso-dc-triage-index/_delete_by_query -d '{"query" : {"bool" : {"must": [{"match": {"parser": "winevtx"}}, {"match": {"timestamp_desc": "Modification"}}]}}}'
Additional context This issue is tangentially related to Plaso issue #3813
This is a great idea, and it shouldn't be too dificult to implement. We call psort in the background worker, and we can pass in arguments to the command. We can add a filter
argument and let the user (timesketch_importer
for example) set that. This should replicate exactly what you do with normal CSV output.
Just wanted to say it be awesome if we could implement this! I had actually completely reverted to using psort.py as the timesketch_importer is missing this.
Is this still being worked on?
Just wanted to say it be awesome if we could implement this! I had actually completely reverted to using psort.py as the timesketch_importer is missing this.
Is this still being worked on?
Hey, afaik this is currently not being worked on.
This issue is currently being worked on (ref: https://github.com/google/timesketch/pull/1987)
This issue is currently being worked on (ref: #1987)
I all :) Do you know if a maintainer had the opportunity to take a look at the pull request?
Would love to see this implemented. My current workflow uses psort externally with opensearch_ts
so I can add filters - https://timesketch.org/developers/api-upload-data/#import-data-already-ingested-into-opensearch
Unless I'm mistaken this assumes you already have an existing timeline, however in my case the evidence is always a new timeline. I generally just attempt to +1 to the previous ID --timeline_identifier
however this doesn't always work with concurrency. So I may later have to go in a re-index the Opensearch documents to update the timeline ID.
POST timesketch_index/_update_by_query
{
"script": {
"source": "ctx._source.__ts_timeline_id = 4",
"lang": "painless"
}
}
Unless I'm mistaken there's no easy way to create a timeline ID prior to uploading content?
This issue would really help reduce timelines. Is this still being worked on?