Add a config option to save the hashed content of data URLs
-
I'm submitting a ... [ ] bug report [X] feature request [ ] question about the decisions made in the repository [ ] question about how to use this project
-
Summary In #31 and #34 we added an option to strip
data:URL content to avoid large URL records in the dataset. When thestripDataUrlDataconfiguration option is set, we replace the actual content ofdata:URLs with<data-stripped>.
In https://github.com/mozilla/openwpm-webext-instrumentation/issues/23#issuecomment-440460289 we identified that it would be useful to change this config option (or perhaps add a new config parameter) to instead save a hash of the data: URL content in place of <data-stripped>. This will add a relatively low storage overhead and will preserve the uniqueness of the URLs.
We should always save the hash of the content in the data URL, but have a config option that allows us to decide whether we should save the actual content.