noahho comments

Results 34 comments of


                                            noahho

Batch predictions when test set is large

Agreed with @dholzmueller. There could be a default option that uses estimated memory_usage and in this way adapts the batch size by default. This parameter could then be overwritten by...

Multi-output regression

Hi! It does not currently do so, but we will support this soon (The code is mostly ready in tabpfn-extensions repository). Are you looking for a point prediction or distribution...

Multi-output regression

This is just being built here: https://github.com/PriorLabs/tabpfn-extensions/pull/207 and will be merged soon. You can already use this by installing from that branch

Print warning for larger datasets when run on CPU

Yes exactly, I'd just put this at 1000 samples and also include a reference to using the API if GPU not available https://github.com/PriorLabs/tabpfn-client Longer term, I'd make a larger change...

TabPFNRegressor preprocessing fails on bigger datasets fix

Thanks so much for this change! Would yu be able to add a test for this change, i.e. one that tests if the preprocessing runs on datasets of > 10,000...

TabPFNRegressor preprocessing fails on bigger datasets fix

Great, this looks really good. There seems to be a tiny ruff issue at this point. Do you know how to resolve? "ruff check . --fix" with ruff version 0.8.6

TabPFNRegressor preprocessing fails on bigger datasets fix

Ohh also something that copilot just caught: The 'quantile_uni_coarse' transformer now caps n_quantiles to 10,000, yet the 'quantile_uni' transformer remains uncapped.

TabPFNRegressor preprocessing fails on bigger datasets fix

The two open ones don't seem to be automatically fixable: src/tabpfn/regressor.py:723:89: E501 Line too long (89 > 88) tests/test_preprocessing.py:12:9: NPY002 Replace legacy `np.random.rand` call with `np.random.Generator` An LLM will know...

TabPFNRegressor preprocessing fails on bigger datasets fix

Thanks a lot for continuing to work on this. It seems there were a few changes made for the linting that weren't right (such as adding ""). I'll look into...

TabPFNRegressor preprocessing fails on bigger datasets

When using ignore_pretraining_limits=True in TabPFN, the training data is subsampled (typically to 10,000 samples) before fitting the preprocessing pipeline. Currently, quantile transformers in our pipeline—configured in ReshapeFeatureDistributionsStep.get_all_preprocessors—use the original dataset...