suricata
suricata copied to clipboard
pgsql/logger: open json object from logger fn - v3
Previous PR: https://github.com/OISF/suricata/pull/10997
Changes from last PR:
- remove changes to
template.rsas per Jason's analysis
Link to redmine ticket: https://redmine.openinfosecfoundation.org/issues/6983 https://redmine.openinfosecfoundation.org/issues/7000
Provide values to any of the below to override the defaults.
SV_BRANCH=https://github.com/OISF/suricata-verify/pull/1810
I don't understand the tx_id explanation, can you expand on it? What was happening before? What is the tx_id space? 0-N or 1-N?
I don't understand the tx_id explanation, can you expand on it? What was happening before? What is the tx_id space? 0-N or 1-N?
Pgsql's tx_id space is 1-N.
When Suricata does app-layer event logging, it uses JsonPgsqlLogger, which will use data from output-tx, it starts from tx_id, as that's the value that it gets from AppLayerParserState, from log_id.
When Suricata does alert event logging and adds app-layer metadata, the tx_id used comes from the PacketAlert, whose info comes from AppLayerParserGetTransactionInspectId, and that one seems to have discrepancy, so the metadata logging, without this fix, will always look for the alert's tx_id+1, and either log the incorrect tx data, or nothing.
If, when creating Pgsql Events, I call CreateEveHeaderWithTx instead of CreateEveHeader, I see the tx_id discrepancy in the Pgsql EVE logs. I think that's because the logging function won't actually use the tx_id passed for calling the parser logging function, and will only add that to the event header.
I hope this all make sense. I could update the commit message for bug 7000 to be something like:
pgsql: tx_id doesn't need to be adjusted
PgsqlTransaction.tx_id space is 1-N. Alerts are generated using the actual
tx_id from the parser, so, if tx_id == 1, PacketAlert.tx_id == 1.
Without this fix, the EVE alert metadata for Pgsql will always log tx_id+1, and
thus either log the wrong tx, or no tx at all.
When creating a new PgsqlTransaction, we first increment PgsqlState's
tx_id and then pass that to the new tx's tx_id. As Pgsql txs are not created
first with id 0, there's no need to adjust when freeing or searching for txs by
id.
Extra: there is still a discrepancy between PgsqlState's `tx_id` and
AppLayerState's log_id, as that is initialized as zero.
Bug #7000
About that extra part - feels like something that should be documented.
I'm initially curious why other protocols aren't having the same issue, or maybe they are. As this pattern used throughout.
Observation.
In master branch without this patch:
- by default, the
alertrecord DOES NOT get thepgsqlmetadata - with
--simulate-ips, thealertrecords DOES get thepgsqlmetadata
With this patch:
- by default, the
alertrecord DOES get thepgsqlmetadata - with
--simulate-ips, thealertrecord DOES NOT get thepgsqlmetadata
While I haven't yet, this should be looked at in another protocol as well to see if its a common pattern.
Information:
ERROR: QA failed on SURI_TLPW2_autofp_suri_time.
| field | baseline | test | % |
|---|---|---|---|
| SURI_TLPW2_autofp_stats_chk | |||
| .uptime | 138 | 146 | 105.8% |
Pipeline 20450
Observation.
In master branch without this patch:
- by default, the
alertrecord DOES NOT get thepgsqlmetadata- with
--simulate-ips, thealertrecords DOES get thepgsqlmetadataWith this patch:
- by default, the
alertrecord DOES get thepgsqlmetadata- with
--simulate-ips, thealertrecord DOES NOT get thepgsqlmetadataWhile I haven't yet, this should be looked at in another protocol as well to see if its a common pattern.
I vaguely remember a discussion where something connected was mentioned - by you, I think? Where we'd see different results between IDS/IPS...
The tx_id for detection, in IDS, is recovered by AppLayerParserGetTransactionInspectId, which takes direction into consideration. Logging doesn't, so uses something different. I'm curious with what happens with IPS.
I will try to craft SV tests for other protocols, having versions with --simulate-ips (shall also add one of those for pgsql, too).
I'm initially curious why other protocols aren't having the same issue, or maybe they are. As this pattern used throughout.
Maybe stateless parsers won't be affected by this? Afaik we don't have this fix for the alert metadata for all app-protos, yet - might help in us not seeing this?
Wonder if the fact that we don't add the tx_id to the EVE header could help in masking the issue, or more likely how protocol transactions differ. Or none of the above. Just hope I'm not making us waste time in something that isn't important >__<'
I vaguely remember a discussion where something connected was mentioned - by you, I think? Where we'd see different results between IDS/IPS...
There are differences. Detection is run earlier that can lead to some differing results, such as packet logged, what transaction data might be available - but from adding some prints its clear the transaction is there, just the indexing is off.
However, I don't think the app-layer should need to worry if we're in IDS or IPS mode, that should be done more inside the engine, and this is where things seem to be off here.
Closing this as I'm still investigating. I want to submit the next PR once we have a deeper understanding of what's going on, and whether the fix should include other protocols - or follow a different approach, entirely.