Dan Saattrup Nielsen

Results 74 issues of Dan Saattrup Nielsen

This is an attempt to collect all the sklearn dependency issues currently present. **Presort parameter** `The parameter 'presort' is deprecated and has no effect. It will be removed in v0.24....

Adds the [MuMiN dataset](https://mumin-dataset.github.io/) to the readme. (My editor also removed some trailing whitespace, I hope that's okay)

Conformal Quantile Regression was introduced in [Romano, Patterson & Candès](https://arxiv.org/abs/1905.03222) and is a variant of quantile regression which calibrates the prediction intervals, yielding narrower intervals, while preserving theoretical coverage guarantees....

enhancement

The Inductive Venn-Abers predictors (IVAPs) and Cross Venn-Abers predictors (CVAPs) was introduced in [Vovk, Petej & Fedorova (2015)](https://arxiv.org/abs/1511.00213), and can provide lower and upper bounds for probabilities in classification models....

enhancement

So far all tests are doctests, which double as explanatory examples. However, we also need unit tests that test the edge cases of the functions.

enhancement

This adds the [MuMiN dataset](https://mumin-dataset.github.io/) to the readme. (My editor also removed trailing whitespace, I hope that's okay)

Adds the [MuMiN dataset](https://mumin-dataset.github.io/) to the readme. (My editor also removed trailing whitespace, I hope that's alright)

### 🚀 The feature, motivation and pitch The current `pyproject.toml` file has [dependencies with no lower or upper bounds on them](https://github.com/allenai/OLMo/blob/3be4c1ec367213ff96ccc168ba2f7c27be6d5bc7/pyproject.toml#L15-L54), which means that package management systems that check for...

type/feature

**Describe the model** Meta has pretrained and RLHF'd models on primarily English data, but also includes many others. Should be benchmarked on all languages. All models can be found [on...

model evaluation request
large model (>7B)

### 🚀 The feature, motivation and pitch Supporting these architectures would enable benchmarking popular models such as T5 and BART. Most of the benchmarking suite should be identical to the...

enhancement