etl icon indicating copy to clipboard operation
etl copied to clipboard

parser: add filter.IsOAM logic to standard parsers

Open stephen-soltesz opened this issue 4 years ago • 7 comments

Today, we manually enumerate the OAM IPs in views. Ultimately, the parser should receive a list of OAM IPs from configuration at run time and label a standard column "filter.IsOAM" field accordingly.

         "35.193.254.117", -- script-exporter VMs in GCE, sandbox.            
          "35.225.75.192", -- script-exporter VM in GCE, staging.              
          "35.192.37.249", -- script-exporter VM in GCE, oti.                  
          "23.228.128.99", "2605:a601:f1ff:fffe::99", -- ks addresses.         
          "45.56.98.222", "2600:3c03::f03c:91ff:fe33:819", -- eb addresses.    
          "35.202.153.90", "35.188.150.110" -- Static IPs from GKE VMs for e2e tests.

stephen-soltesz avatar Jun 02 '20 17:06 stephen-soltesz

I think this needs to be more agile than can easily done in the parser.

mattmathis avatar May 17 '22 04:05 mattmathis

What requirement is not met by the parser configuration including these values? Please describe a scenario that cannot be met.

stephen-soltesz avatar May 17 '22 10:05 stephen-soltesz

OAM addresses can can change as an unexpected side effect of operational events. If IsOAM is bound in the parser, it has to be updated before the data arrives from the fleet otherwise it leaks into BQ. If OAM is done in the Views, we can retroactively remove OAM data, and don't have to be quite as prompt on the update.

mattmathis avatar May 18 '22 14:05 mattmathis

My thought is that we do not want the views to be a receptacle for post-hoc configurations. The set of such things is unbounded. Preferably static filter logic is managed by the parsers and optionally in the views as a "hot fix". View-based management is an expediency not preferable design (imo). Ideally, we would never archive the OAM measurements..

stephen-soltesz avatar May 18 '22 17:05 stephen-soltesz

A query to detect OAM traffic found more than 60 likely candidate clients. See: OAM Client Scan 2022-08-19 - Sheets

Although a small number appear to be spurious (e.g. 192.168.0.192), the vast majority are pretty clearly legit.

I strongly advocate marking isOAM as part of a late stage materialized join.

Note that we do need to be able to do post-hoc configurations, in order to properly label canary data, because we don't know if we trust new deployments until they collect significant data. In nearly all cases we treat canary data a valid: indeed it matches future production data. However, in the rare cases where we roll back canaries (which we have done), we need to have the capability to retroactively mark the data as non-production.

mattmathis avatar Aug 19 '22 15:08 mattmathis

@mattmathis regarding the canaries, is there a way to automate the retroactive labeling without human intervention? Or via the same signals that we use operationally when rolling forward or backward?

stephen-soltesz avatar Aug 19 '22 16:08 stephen-soltesz

@mattmathis you performed some work for the fixit - is that enough to close this issue? Or, is there some remaining work to capture the result of your query above?

stephen-soltesz avatar Aug 22 '22 21:08 stephen-soltesz