suricata icon indicating copy to clipboard operation
suricata copied to clipboard

pgsql/logger: open json object from logger fn - v3

Open jufajardini opened this issue 2 months ago • 8 comments

Previous PR: https://github.com/OISF/suricata/pull/10997

Changes from last PR:

  • remove changes to template.rs as per Jason's analysis

Link to redmine ticket: https://redmine.openinfosecfoundation.org/issues/6983 https://redmine.openinfosecfoundation.org/issues/7000

Provide values to any of the below to override the defaults.

SV_BRANCH=https://github.com/OISF/suricata-verify/pull/1810

jufajardini avatar May 01 '24 14:05 jufajardini

I don't understand the tx_id explanation, can you expand on it? What was happening before? What is the tx_id space? 0-N or 1-N?

victorjulien avatar May 01 '24 17:05 victorjulien

I don't understand the tx_id explanation, can you expand on it? What was happening before? What is the tx_id space? 0-N or 1-N?

Pgsql's tx_id space is 1-N.

When Suricata does app-layer event logging, it uses JsonPgsqlLogger, which will use data from output-tx, it starts from tx_id, as that's the value that it gets from AppLayerParserState, from log_id.

When Suricata does alert event logging and adds app-layer metadata, the tx_id used comes from the PacketAlert, whose info comes from AppLayerParserGetTransactionInspectId, and that one seems to have discrepancy, so the metadata logging, without this fix, will always look for the alert's tx_id+1, and either log the incorrect tx data, or nothing.

If, when creating Pgsql Events, I call CreateEveHeaderWithTx instead of CreateEveHeader, I see the tx_id discrepancy in the Pgsql EVE logs. I think that's because the logging function won't actually use the tx_id passed for calling the parser logging function, and will only add that to the event header.

I hope this all make sense. I could update the commit message for bug 7000 to be something like:

    pgsql: tx_id doesn't need to be adjusted

   PgsqlTransaction.tx_id space is 1-N. Alerts are generated using the actual 
   tx_id from the parser, so, if tx_id == 1, PacketAlert.tx_id == 1.

   Without this fix, the EVE alert metadata for Pgsql will always log tx_id+1, and 
   thus either log the wrong tx, or no tx at all.
   
   When creating a new PgsqlTransaction, we first increment PgsqlState's
   tx_id and then pass that to the new tx's tx_id. As Pgsql txs are not created
   first with id 0, there's no need to adjust when freeing or searching for txs by 
   id.
   
   Extra: there is still a discrepancy between PgsqlState's `tx_id` and 
   AppLayerState's log_id, as that is initialized as zero.

   Bug #7000

About that extra part - feels like something that should be documented.

jufajardini avatar May 01 '24 21:05 jufajardini

I'm initially curious why other protocols aren't having the same issue, or maybe they are. As this pattern used throughout.

jasonish avatar May 01 '24 22:05 jasonish

Observation.

In master branch without this patch:

  • by default, the alert record DOES NOT get the pgsql metadata
  • with --simulate-ips, the alert records DOES get the pgsql metadata

With this patch:

  • by default, the alert record DOES get the pgsql metadata
  • with --simulate-ips, the alert record DOES NOT get the pgsql metadata

While I haven't yet, this should be looked at in another protocol as well to see if its a common pattern.

jasonish avatar May 01 '24 23:05 jasonish

Information:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

field baseline test %
SURI_TLPW2_autofp_stats_chk
.uptime 138 146 105.8%

Pipeline 20450

suricata-qa avatar May 02 '24 00:05 suricata-qa

Observation.

In master branch without this patch:

  • by default, the alert record DOES NOT get the pgsql metadata
  • with --simulate-ips, the alert records DOES get the pgsql metadata

With this patch:

  • by default, the alert record DOES get the pgsql metadata
  • with --simulate-ips, the alert record DOES NOT get the pgsql metadata

While I haven't yet, this should be looked at in another protocol as well to see if its a common pattern.

I vaguely remember a discussion where something connected was mentioned - by you, I think? Where we'd see different results between IDS/IPS...

The tx_id for detection, in IDS, is recovered by AppLayerParserGetTransactionInspectId, which takes direction into consideration. Logging doesn't, so uses something different. I'm curious with what happens with IPS.

I will try to craft SV tests for other protocols, having versions with --simulate-ips (shall also add one of those for pgsql, too).

jufajardini avatar May 02 '24 12:05 jufajardini

I'm initially curious why other protocols aren't having the same issue, or maybe they are. As this pattern used throughout.

Maybe stateless parsers won't be affected by this? Afaik we don't have this fix for the alert metadata for all app-protos, yet - might help in us not seeing this?

Wonder if the fact that we don't add the tx_id to the EVE header could help in masking the issue, or more likely how protocol transactions differ. Or none of the above. Just hope I'm not making us waste time in something that isn't important >__<'

jufajardini avatar May 02 '24 12:05 jufajardini

I vaguely remember a discussion where something connected was mentioned - by you, I think? Where we'd see different results between IDS/IPS...

There are differences. Detection is run earlier that can lead to some differing results, such as packet logged, what transaction data might be available - but from adding some prints its clear the transaction is there, just the indexing is off.

However, I don't think the app-layer should need to worry if we're in IDS or IPS mode, that should be done more inside the engine, and this is where things seem to be off here.

jasonish avatar May 02 '24 13:05 jasonish

Closing this as I'm still investigating. I want to submit the next PR once we have a deeper understanding of what's going on, and whether the fix should include other protocols - or follow a different approach, entirely.

jufajardini avatar May 08 '24 14:05 jufajardini