butcher icon indicating copy to clipboard operation
butcher copied to clipboard

The privacy concern

Open jarauh opened this issue 5 years ago • 3 comments

In the README, the main goal that is mentioned for the package is to reduce the size. My main motivation for removing training data from model objects are privacy issues: I want to be able to publish the model without publishing the training data. (Of course, the model itself depends on the training data, so a full separation of the model from the training data might not be possible, but at least I want to make sure that my model does not contain an explicit copy of the training data.)

Would it make sense to expand the scope of the package to address this goal?

Sometimes privacy and size constraints may lead to different approaches:

  • If I care about privacy, I might decide to only keep those parts of the model that I am sure that I will use later. That is, I need a list of components which I cannot drop.
  • If I worry about size, I might decide to only drop those parts of the model that I surely don't need. That is, I would work with a list of components that I can safely drop without reducing the desired functionality.

What are your thoughts on this?

jarauh avatar Feb 06 '20 12:02 jarauh

I have been attempting to write a more general axe method, and incorporating your idea here (listing the components to keep or listing the components to drop) is a great way to implement it. If you might have particular models (or packages) in mind, please let me know, I could experiment with those specifically in prototyping this function. Thank you for this! Very interesting use case.

jyuu avatar Feb 06 '20 12:02 jyuu

I mostly use gam or glm objects.

I guess that you are aware of the strip-package? Interestingly, it uses a hybrid approach: https://github.com/paulponcet/strip/blob/master/R/strip_.glm.R When axing for prediction, a "negative list" of components to drop is used, while when axing for printing, a "positive list" of components to keep is used.

jarauh avatar Feb 06 '20 13:02 jarauh

I was not aware, but this is very useful. Again, thank you!

jyuu avatar Feb 06 '20 13:02 jyuu

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

github-actions[bot] avatar Mar 21 '23 01:03 github-actions[bot]