TabPFN icon indicating copy to clipboard operation
TabPFN copied to clipboard

Document strategies to use more samples

Open noahho opened this issue 11 months ago • 4 comments

Document dataset constraints for TabPFN more, document strategies to use for more samples (subsampling ensemble, SklearnBasedRandomForestTabPFN [tabpfn-extensions (https://github.com/PriorLabs/tabpfn-extensions/blob/dbc3f5da25821135602fdc4d95cc8c217afbc3b0/src/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN.py#L106])

noahho avatar Jan 13 '25 17:01 noahho

Where can I read about the constraints?

jokus-pokus avatar Jan 14 '25 00:01 jokus-pokus

yes where can we read about the constraints?

Daniel-KK-world avatar Jan 14 '25 09:01 Daniel-KK-world

Some information about the pretraining limits is currently documented here: https://github.com/PriorLabs/TabPFN/blob/main/src/tabpfn/classifier.py#L219-L238

The current pre-training limits are:

  - 10_000 samples/rows
  - 500 features/columns
  - 10 classes, this is not ignorable and will raise an error if the model is used with more classes.

This is not shown on the docs, so it would also be good to add this.

Otherwise, as a starting point, I recommend reading the User Guide in our paper .

LennartPurucker avatar Jan 15 '25 12:01 LennartPurucker

Example for how to use on large datasets can be found at: https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/large_datasets/large_datasets_example.py

noahho avatar Sep 03 '25 17:09 noahho