butcher
butcher copied to clipboard
The privacy concern
In the README, the main goal that is mentioned for the package is to reduce the size. My main motivation for removing training data from model objects are privacy issues: I want to be able to publish the model without publishing the training data. (Of course, the model itself depends on the training data, so a full separation of the model from the training data might not be possible, but at least I want to make sure that my model does not contain an explicit copy of the training data.)
Would it make sense to expand the scope of the package to address this goal?
Sometimes privacy and size constraints may lead to different approaches:
- If I care about privacy, I might decide to only keep those parts of the model that I am sure that I will use later. That is, I need a list of components which I cannot drop.
- If I worry about size, I might decide to only drop those parts of the model that I surely don't need. That is, I would work with a list of components that I can safely drop without reducing the desired functionality.
What are your thoughts on this?
I have been attempting to write a more general axe method, and incorporating your idea here (listing the components to keep or listing the components to drop) is a great way to implement it. If you might have particular models (or packages) in mind, please let me know, I could experiment with those specifically in prototyping this function. Thank you for this! Very interesting use case.
I mostly use gam
or glm
objects.
I guess that you are aware of the strip
-package? Interestingly, it uses a hybrid approach:
https://github.com/paulponcet/strip/blob/master/R/strip_.glm.R
When axing for prediction, a "negative list" of components to drop is used, while when axing for printing, a "positive list" of components to keep is used.
I was not aware, but this is very useful. Again, thank you!
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.