Anes Benmerzoug issues

Results 33 issues of


                                            Anes Benmerzoug

Deprecate compute_*_values interface

In order to prepare for a newer interface for game theoretic data valuation methods i.e. #467, we should deprecate the [`compute_shapley_values`](https://github.com/aai-institute/pyDVL/blob/96326ae75600391e927b01bf2e31a93f12915159/src/pydvl/value/shapley/common.py) and [`compute_least_core_values`](https://github.com/aai-institute/pyDVL/blob/96326ae75600391e927b01bf2e31a93f12915159/src/pydvl/value/least_core/__init__.py#L42) functions.

enhancement

Implement Variance Reduced Data Shapley (VRDS)

Introduced in Wu, M., Jia, R., Huang, W., & Chang, X. (2022). [Robust Data Valuation via Variance Reduced Data Shapley](https://arxiv.org/abs/2210.16835). arXiv preprint arXiv:2210.16835. The idea is to use stratified sampling...

new-method

Move tolerate fixture to separate repository

With the changes in #529, we no longer use the tolerate fixture and we should therefore remove it from the repository. However I think it is useful and there may...

utils

cleanup

Issue with SemiValue batching and parallelization

While working on PR #341, I realized that there is a bug in the batching feature of semivalues when using `n_jobs` > 1. The results are almost the same but...

bug

Consider putting the examples or a more detailed version as colab notebooks.

documentation

Explain that data valuation does not only work with sklearn models but rather with all models that implement the sklearn Predictor interface.

documentation

enhancement

Anes Benmerzoug

Deprecate compute_*_values interface

Implement Variance Reduced Data Shapley (VRDS)

Move tolerate fixture to separate repository

Issue with SemiValue batching and parallelization

Consider putting the examples or a more detailed version as colab notebooks.

Explain that data valuation does not only work with sklearn models but rather with all models that implement the sklearn Predictor interface.

Add notebook showcasing use of PyTorch models with Data Valuation

Improve `Dataset` class

Caching documentation should be consolidated into one page/section.

Create notebook showcasing use of Caching