OpenWPM icon indicating copy to clipboard operation
OpenWPM copied to clipboard

Add a config option to save the hashed content of data URLs

Open englehardt opened this issue 7 years ago • 1 comments

  • I'm submitting a ... [ ] bug report [X] feature request [ ] question about the decisions made in the repository [ ] question about how to use this project

  • Summary In #31 and #34 we added an option to strip data: URL content to avoid large URL records in the dataset. When the stripDataUrlData configuration option is set, we replace the actual content of data: URLs with <data-stripped>.

In https://github.com/mozilla/openwpm-webext-instrumentation/issues/23#issuecomment-440460289 we identified that it would be useful to change this config option (or perhaps add a new config parameter) to instead save a hash of the data: URL content in place of <data-stripped>. This will add a relatively low storage overhead and will preserve the uniqueness of the URLs.

englehardt avatar Nov 27 '18 03:11 englehardt

We should always save the hash of the content in the data URL, but have a config option that allows us to decide whether we should save the actual content.

englehardt avatar Nov 12 '19 09:11 englehardt