TabPFN Document strategies to use more samples

Document dataset constraints for TabPFN more, document strategies to use for more samples (subsampling ensemble, SklearnBasedRandomForestTabPFN [tabpfn-extensions (https://github.com/PriorLabs/tabpfn-extensions/blob/dbc3f5da25821135602fdc4d95cc8c217afbc3b0/src/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN.py#L106])

Jan 13 '25 17:01 noahho

Where can I read about the constraints?

Jan 14 '25 00:01 jokus-pokus

yes where can we read about the constraints?

Jan 14 '25 09:01 Daniel-KK-world

Some information about the pretraining limits is currently documented here: https://github.com/PriorLabs/TabPFN/blob/main/src/tabpfn/classifier.py#L219-L238

The current pre-training limits are:

  - 10_000 samples/rows
  - 500 features/columns
  - 10 classes, this is not ignorable and will raise an error if the model is used with more classes.

This is not shown on the docs, so it would also be good to add this.

Otherwise, as a starting point, I recommend reading the User Guide in our paper .

Jan 15 '25 12:01 LennartPurucker

Example for how to use on large datasets can be found at: https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/large_datasets/large_datasets_example.py

Sep 03 '25 17:09 noahho