documentation icon indicating copy to clipboard operation
documentation copied to clipboard

Update the documentation on small number suppression

Open wjchulme opened this issue 2 years ago • 1 comments

I think the redaction / rounding section of the opensafely documentation could do with a bit of a refresh, both in terms of how we explain disclosure control / suppression and what we recommend.

We say here that the general principle is that any statistic describing 5 or fewer patients, either directly or indirectly, should be redacted. I wrote this (or maybe it's since been tweaked), but I now think it's misleading.

The key principle is the suppression of information about groups of size 5 or fewer. We shouldn't know the size of these groups (ie, we should only know if the group-size is 5 or fewer), and we shouldn't know any details in addition to what we already know about how that group is defined. So for example, if we know there are "at most five people in [study population] with [attributes a,b,c]" we are not allowed to know anything more about them (eg, average age, proportion who died...). I think this principle is best expressed, in general terms, as something like:

All information about groups of size 5 or fewer should be suppressed. This includes the size of the group and any other demographic or clinical information such as the average age, proportion with a particular disease, or number who have died. This principle applies for both primary and secondary disclosure, so if any additional information about such a group can be inferred from information released elsewhere then further suppression is required.

I think this wording is better as it doesn't imply that redaction is necessary for suppression, and it doesn't use the term "statistic" which is probably unhelpful.

A few examples then would help explain these principles in practice, using redaction and/or rounding as the means of suppression. The existing examples are probably good enough for this but we should review to make sure they focus on our primary recommendations for disclosure control.

wjchulme avatar Nov 23 '22 14:11 wjchulme