TabPFN icon indicating copy to clipboard operation
TabPFN copied to clipboard

TabPFNRegressor preprocessing fails on bigger datasets fix

Open Krishnadubey1008 opened this issue 9 months ago • 10 comments

This PR fixes #169 chnages made- took n_quantiles=min(n_quantiles, 10_000) in TabPFN/src/tabpfn/model/preprocessing.py

Krishnadubey1008 avatar Mar 26 '25 14:03 Krishnadubey1008

Thanks so much for this change! Would yu be able to add a test for this change, i.e. one that tests if the preprocessing runs on datasets of > 10,000 samples. We can't run the inference step unfortunately as ofc, it would be way too slow. Only way to test the inference on lareg datasets was if we provided a tiny tabpfn checkpoint, a very small model that is randomly initialized but that would be a project in itself.

noahho avatar Mar 26 '25 14:03 noahho

@noahho I had added test_preprocessing.py please suggest changes if any

Krishnadubey1008 avatar Mar 26 '25 15:03 Krishnadubey1008

Great, this looks really good. There seems to be a tiny ruff issue at this point. Do you know how to resolve? "ruff check . --fix" with ruff version 0.8.6

noahho avatar Mar 26 '25 15:03 noahho

Ohh also something that copilot just caught: The 'quantile_uni_coarse' transformer now caps n_quantiles to 10,000, yet the 'quantile_uni' transformer remains uncapped.

noahho avatar Mar 26 '25 15:03 noahho

i will fix it now

Krishnadubey1008 avatar Mar 26 '25 15:03 Krishnadubey1008

@noahho I had ran ruff check . --fix but still ruff linting test is failing

Krishnadubey1008 avatar Mar 26 '25 15:03 Krishnadubey1008

The two open ones don't seem to be automatically fixable: src/tabpfn/regressor.py:723:89: E501 Line too long (89 > 88) tests/test_preprocessing.py:12:9: NPY002 Replace legacy np.random.rand call with np.random.Generator

An LLM will know how to fix number 2 and by deleting a character in src/tabpfn/regressor.py:723:89 you fix no1

noahho avatar Mar 26 '25 16:03 noahho

@noahho Please review

Krishnadubey1008 avatar Mar 27 '25 04:03 Krishnadubey1008

Thanks a lot for continuing to work on this. It seems there were a few changes made for the linting that weren't right (such as adding ""). I'll look into the PR and fix those things, if you'd like me to.

noahho avatar Mar 27 '25 09:03 noahho

Yes ,sure

Krishnadubey1008 avatar Mar 27 '25 09:03 Krishnadubey1008

Any update on this PR? #169 already solved?

mohammadp1001 avatar May 16 '25 22:05 mohammadp1001