web-monitoring-processing icon indicating copy to clipboard operation
web-monitoring-processing copied to clipboard

Testing against sensitive data

Open danielballan opened this issue 7 years ago • 5 comments

We will want to run tests against real lists of changes that were flagged for review. Some of the elements of these lists are already public because they are the subject of old EDGI reports. Some are not public -- they were interesting enough to trigger careful review, but ultimately did not generate a report. Finally, there will be an incoming stream of new entries to the list that are not yet sorted into one of those two categories.

After a conversation with @trinberg, I propose the following system:

  • Make a public a list of changes that generated old reports so that any interested developer can use them for testing.
  • Maintain a private more complete list. Only distribute this to established contributors.
  • Use a secure token to make the complete available to CI for testing. Apply "skiptest" features on these tests so that developers can run all the tests locally even if they don't have access to the complete list.

attn @janakrajchadha

danielballan avatar Sep 09 '17 18:09 danielballan

I had a couple of questions here.

  • Do we have an existing list of changes which generated old reports?
  • "skiptest" features would also skip the tests in CI and accessing the complete list would not be useful. Am I missing something here?

janakrajchadha avatar Sep 09 '17 20:09 janakrajchadha

Apologies for my overdue reply, @janakrajchadha.

Do we have an existing list of changes which generated old reports?

I'm not sure that we have uuids for them, but we have Versionista links which we can convert into uuids.

"skiptest" features would also skip the tests in CI and accessing the complete list would not be useful. Am I missing something here?

I'm imagining a conditional skip (such as with pytest.mark.skipif) that checks for an env variable containing authentication info.

danielballan avatar Oct 09 '17 00:10 danielballan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

stale[bot] avatar Jan 10 '19 01:01 stale[bot]

This is still relevant — outside contributors really need decent data they can grapple with. However, is the long trudge towards a more public staging server the better solution here? (See also edgi-govdata-archiving/web-monitoring-db#34 and edgi-govdata-archiving/web-monitoring-ui#220)

Mr0grog avatar Jan 10 '19 04:01 Mr0grog

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

stale[bot] avatar Jul 09 '19 05:07 stale[bot]