qlog icon indicating copy to clipboard operation
qlog copied to clipboard

Provide clear privacy and security guidelines

Open rmarx opened this issue 3 years ago • 6 comments

qlog should be specific about which fields are potentially privacy sensitive and the possible actions that can be taken to mitigate these issues (e.g., hashing, exclusion, mapping, ...).

The current proposal is to work with multiple sanitization levels, depending on the intended use case. The way this is expressed in the drafts depends on the data definition format we end up adopting.

One potential source of inspiration is https://www.tracewrangler.com, which anonymizes pcaps.

Other related considerations are whether to provide options to encrypt qlogs themselves.

This is intended to be an overarching summary issue, with sub-issues expected for specific approaches/proposals.

rmarx avatar Mar 31 '21 08:03 rmarx

On the topic of encrypting the logs themselves, I think that shouldn't be specified. How ever the owner of the log wants to secure the file at rest should be up to them.

nibanks avatar Mar 31 '21 14:03 nibanks

I think that makes sense. I feel some of the remarks on this point were mainly tied to the serialization format selection. i.e., if we use CBOR, we get COSE (https://tools.ietf.org/html/rfc8152) "for free", which in turn might be an additional reason to chose CBOR as basic format, as that gives a standard/default option for encryption for people that don't have a specific opinion on how they'd want to do it otherwise.

rmarx avatar Mar 31 '21 14:03 rmarx

Ok. Well, see the other issue on my feelings about a complicated format. :) Keep it simple, IMO.

nibanks avatar Mar 31 '21 14:03 nibanks

This was discussed again at IETF 113.

Some of the main notes from queue comments:

Brian Trammell:
	Let's talk about sec and privacy stuff offline 
	We semi-tackled it in IPFIX and TCPM wg's 
	
Eric Kinnear 
	I advocate for per-field indicators in addition to whatever general guidance we're going to give 
	We internally found that it's extremely helpful to have local indication of concerns to make sure implementers understand it in-context while looking at qlog 
	
Jana Iyengar:
	Agree with Eric. Per-field indicators are useful.
	Caution against going too deep in that rabbit hole though. 
        Lots of local semantics attached to what data means and what
	value in indicating levels of sensitivity. 
        Because you don't have global view of how these traces are being used,
	e.g., in tandem with other logs that are also existing.
	There should be considerations and not rules. 
	Per-field indicators are useful, but don't loose yourself in there 

@britram is willing to help out on this one.

He suggested looking at RFC 6235, noting that "it's primarily digging into how to anonymize fields (using state-of-the-art anonymization of identifers circa 2008) as opposed to the sensitivity of the data itself (it primarily concerns itself with flow-level data, which is a tiny subset of the sensitivity of per-field information in qlog".

If we go with per-field indicators, I myself primarily have the question if we just indicate the "sensitivity level" of the field (e.g., IPs are dangerous) or ALSO how to best encode the field (this is how you should encode/hash IPs at various levels of anonymization).

rmarx avatar Mar 22 '22 11:03 rmarx

This topic is important and needs the right level of sensitive treatment. But my 2c are that we shouldn't try to boil the ocean on the first pass.

Propose we add some well considered text in the short term and ask for an early review from the relevant areas while we work on other design issues in parallel.

LPardue avatar Sep 08 '22 01:09 LPardue

Agreed with @LPardue here.

Also had a meeting with @britram about this a few weeks ago, who similarly proposes to keep the text in the qlog docs relatively short with high-level guidance (e.g., not tagging sensitive fields per event).

He additionally proposes to start a parallel project for (a) new document(s) that discuss in a more general scope how to deal with sensitive information in all types of logs (since we lack such guidance at the IETF atm). If this effort is at least started by last call for qlog, it would help to have something to point to if people worry about a lack of in-depth guidance here.

Will add some basic text here ASAP, ready for internal review soon, and then have something to bring to the wg for initial review by IETF 115.

rmarx avatar Sep 29 '22 14:09 rmarx

Closing as completed

LPardue avatar Nov 08 '23 10:11 LPardue