etl
etl copied to clipboard
parser: add filter.IsOAM logic to standard parsers
Today, we manually enumerate the OAM IPs in views. Ultimately, the parser should receive a list of OAM IPs from configuration at run time and label a standard column "filter.IsOAM" field accordingly.
"35.193.254.117", -- script-exporter VMs in GCE, sandbox.
"35.225.75.192", -- script-exporter VM in GCE, staging.
"35.192.37.249", -- script-exporter VM in GCE, oti.
"23.228.128.99", "2605:a601:f1ff:fffe::99", -- ks addresses.
"45.56.98.222", "2600:3c03::f03c:91ff:fe33:819", -- eb addresses.
"35.202.153.90", "35.188.150.110" -- Static IPs from GKE VMs for e2e tests.
I think this needs to be more agile than can easily done in the parser.
What requirement is not met by the parser configuration including these values? Please describe a scenario that cannot be met.
OAM addresses can can change as an unexpected side effect of operational events. If IsOAM is bound in the parser, it has to be updated before the data arrives from the fleet otherwise it leaks into BQ. If OAM is done in the Views, we can retroactively remove OAM data, and don't have to be quite as prompt on the update.
My thought is that we do not want the views to be a receptacle for post-hoc configurations. The set of such things is unbounded. Preferably static filter logic is managed by the parsers and optionally in the views as a "hot fix". View-based management is an expediency not preferable design (imo). Ideally, we would never archive the OAM measurements..
A query to detect OAM traffic found more than 60 likely candidate clients. See: OAM Client Scan 2022-08-19 - Sheets
Although a small number appear to be spurious (e.g. 192.168.0.192), the vast majority are pretty clearly legit.
I strongly advocate marking isOAM as part of a late stage materialized join.
Note that we do need to be able to do post-hoc configurations, in order to properly label canary data, because we don't know if we trust new deployments until they collect significant data. In nearly all cases we treat canary data a valid: indeed it matches future production data. However, in the rare cases where we roll back canaries (which we have done), we need to have the capability to retroactively mark the data as non-production.
@mattmathis regarding the canaries, is there a way to automate the retroactive labeling without human intervention? Or via the same signals that we use operationally when rolling forward or backward?
@mattmathis you performed some work for the fixit - is that enough to close this issue? Or, is there some remaining work to capture the result of your query above?