noahho

Results 34 comments of noahho

Agreed with @dholzmueller. There could be a default option that uses estimated memory_usage and in this way adapts the batch size by default. This parameter could then be overwritten by...

Hi! It does not currently do so, but we will support this soon (The code is mostly ready in tabpfn-extensions repository). Are you looking for a point prediction or distribution...

This is just being built here: https://github.com/PriorLabs/tabpfn-extensions/pull/207 and will be merged soon. You can already use this by installing from that branch

Yes exactly, I'd just put this at 1000 samples and also include a reference to using the API if GPU not available https://github.com/PriorLabs/tabpfn-client Longer term, I'd make a larger change...

Thanks so much for this change! Would yu be able to add a test for this change, i.e. one that tests if the preprocessing runs on datasets of > 10,000...

Great, this looks really good. There seems to be a tiny ruff issue at this point. Do you know how to resolve? "ruff check . --fix" with ruff version 0.8.6

Ohh also something that copilot just caught: The 'quantile_uni_coarse' transformer now caps n_quantiles to 10,000, yet the 'quantile_uni' transformer remains uncapped.

The two open ones don't seem to be automatically fixable: src/tabpfn/regressor.py:723:89: E501 Line too long (89 > 88) tests/test_preprocessing.py:12:9: NPY002 Replace legacy `np.random.rand` call with `np.random.Generator` An LLM will know...

Thanks a lot for continuing to work on this. It seems there were a few changes made for the linting that weren't right (such as adding ""). I'll look into...

When using ignore_pretraining_limits=True in TabPFN, the training data is subsampled (typically to 10,000 samples) before fitting the preprocessing pipeline. Currently, quantile transformers in our pipeline—configured in ReshapeFeatureDistributionsStep.get_all_preprocessors—use the original dataset...