web-monitoring-processing Testing against sensitive data

We will want to run tests against real lists of changes that were flagged for review. Some of the elements of these lists are already public because they are the subject of old EDGI reports. Some are not public -- they were interesting enough to trigger careful review, but ultimately did not generate a report. Finally, there will be an incoming stream of new entries to the list that are not yet sorted into one of those two categories.

After a conversation with @trinberg, I propose the following system:

Make a public a list of changes that generated old reports so that any interested developer can use them for testing.
Maintain a private more complete list. Only distribute this to established contributors.
Use a secure token to make the complete available to CI for testing. Apply "skiptest" features on these tests so that developers can run all the tests locally even if they don't have access to the complete list.

attn @janakrajchadha

Sep 09 '17 18:09 danielballan

I had a couple of questions here.

Do we have an existing list of changes which generated old reports?
"skiptest" features would also skip the tests in CI and accessing the complete list would not be useful. Am I missing something here?

Sep 09 '17 20:09 janakrajchadha

Apologies for my overdue reply, @janakrajchadha.

Do we have an existing list of changes which generated old reports?

I'm not sure that we have uuids for them, but we have Versionista links which we can convert into uuids.

"skiptest" features would also skip the tests in CI and accessing the complete list would not be useful. Am I missing something here?

I'm imagining a conditional skip (such as with pytest.mark.skipif) that checks for an env variable containing authentication info.

Oct 09 '17 00:10 danielballan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

Jan 10 '19 01:01 stale[bot]

This is still relevant — outside contributors really need decent data they can grapple with. However, is the long trudge towards a more public staging server the better solution here? (See also edgi-govdata-archiving/web-monitoring-db#34 and edgi-govdata-archiving/web-monitoring-ui#220)

Jan 10 '19 04:01 Mr0grog

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

Jul 09 '19 05:07 stale[bot]

web-monitoring-processing web-monitoring-processing copied to clipboard

Testing against sensitive data

web-monitoring-processing
web-monitoring-processing copied to clipboard