rcps
rcps copied to clipboard
Official codebase for "Distribution-Free, Risk-Controlling Prediction Sets"
Paper
Distribution-Free, Risk-Controlling Prediction Sets
@article{bates-rcps,
title={Distribution-Free, Risk-Controlling Prediction Sets},
author={Bates, Stephen and Angelopoulos, Anastasios N and Lei, Lihua and Malik, Jitendra and Jordan, Michael I},
journal={arXiv preprint arXiv:2101.02703},
year={2020}
}
Basic Overview
For general information about RCPS, you can check our blog post.
This GitHub contains the code we used for the experiments in the RCPS paper.
Each experiment lives in a different, appropriately named folder.
The directory core
contains code common to all of our experiments, including the implementations of concentration bounds and choice of lambda hat.
The repository is still a work in progress; we will be continually updating the code to make it more user-friendly and remove clutter from our development.
If you have trouble reproducing our results, please email [email protected]
.
Getting Started
We store some large files in our git repo via git-lfs
; you may need to install and configure it from here.
After installing git-lfs
, you can clone this repository.
Then, you can create the rcps
conda environment by running the following line:
conda create --name rcps --file ./requirements.txt
Each experiment requires different datasets.
For the ./imagenet
and ./hierarchical_imagenet
experiments, you will need to point the scripts towards the val directory of your local copy of the Imagenet dataset.
Similarly, for ./coco
, you need to point the scripts towards your local copy of the 2017 version of MS COCO, available here.
For the ./polyp
and ./protein
examples, a bit more work must be done.
Polyp data
We used data from five different datasets: HyperKvasir-SEG, CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and ETIS-LaribPolypDB.
Download each of these datasets and unzip them into the folder ./polyps/PraNet/data/TestDataset/{datasetname}
.
Then run the script ./polyps/PraNet/process_all_data.py
, which should store the outputs of the tumor prediction model in the proper directory so you can run our experiments.
Protein data
For the AlphaFoldv1 experiments in ./proteins
, you can point the scripts to the alphafold CASP-13 test set, available here.