Document strategies to use more samples
Document dataset constraints for TabPFN more, document strategies to use for more samples (subsampling ensemble, SklearnBasedRandomForestTabPFN [tabpfn-extensions (https://github.com/PriorLabs/tabpfn-extensions/blob/dbc3f5da25821135602fdc4d95cc8c217afbc3b0/src/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN.py#L106])
Where can I read about the constraints?
yes where can we read about the constraints?
Some information about the pretraining limits is currently documented here: https://github.com/PriorLabs/TabPFN/blob/main/src/tabpfn/classifier.py#L219-L238
The current pre-training limits are:
- 10_000 samples/rows
- 500 features/columns
- 10 classes, this is not ignorable and will raise an error if the model is used with more classes.
This is not shown on the docs, so it would also be good to add this.
Otherwise, as a starting point, I recommend reading the User Guide in our paper .
Example for how to use on large datasets can be found at: https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/large_datasets/large_datasets_example.py