xgboost icon indicating copy to clipboard operation
xgboost copied to clipboard

[R] Use inplace predict

Open david-cortes opened this issue 1 year ago • 28 comments

This PR switches the predict method to use inplace prediction when possible (dense matrix / CSR matrix as inputs).

It seems that there are quite a lot of limitations around when it is or isn't possible to use these functions which are not documented in the readthedocs page, so I am not sure if the conditions that I am adding here to check for whether inplace predict is supported are correct.

I'm also not sure what's the most efficient way of dealing with all the JSON manipulation here.

Along the way, since it is refactoring some of the earlier code introduced for processing the "missing" parameter, it also modifies the function that creates DMatrix objects from CSR/CSC. It doesn't make a practical difference there though, since currently only classes inheriting from "DsparseMatrix" (float64 values) are supported, but might come handy in the future if other sparse matrix classes (like "LsparseMatrix") become supported in the future.

david-cortes avatar Nov 30 '23 21:11 david-cortes

ref https://github.com/dmlc/xgboost/issues/9810 fixes https://github.com/dmlc/xgboost/issues/7616

david-cortes avatar Nov 30 '23 21:11 david-cortes

I see there's a failing MSVC job, but it doesn't seem to contain any test failure, only warnings. No idea what went wrong there.

david-cortes avatar Dec 01 '23 17:12 david-cortes

Always fun to have R failing on Windows when one thinks everything is ready. ;-) I restarted the CI and it seems to be failing again. I can help debug it on Monday next week.

trivialfis avatar Dec 01 '23 23:12 trivialfis

Let's hope the R update in the msvc job helps here.

david-cortes avatar Dec 06 '23 18:12 david-cortes

Let me try to reproduce it.

trivialfis avatar Dec 11 '23 01:12 trivialfis

I can reproduce the error on my machine, and somehow magically got it working by cout << res_code << endl in the new prediction function. The log from MSVC looks suspicious. Other than the error, the prediction function needs to consider base margin as well, which requires the use of proxy DMatrix. I will look into the QDM PR for the proxy implementation in R.

 C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.37.32822\include\stdarg.h(29,9): warning C4068: unknown pragma 'Rf_warning' [D:\a\xgboost\xgboost\build\R-package\xgboost-r.vcxproj]
C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\ucrt\corecrt_malloc.h(13,9): warning C4068: unknown pragma 'Rf_warning' [D:\a\xgboost\xgboost\build\R-package\xgboost-r.vcxproj]
C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\ucrt\corecrt_malloc.h(14,9): warning C4068: unknown pragma 'Rf_warning' [D:\a\xgboost\xgboost\build\R-package\xgboost-r.vcxproj]

trivialfis avatar Dec 11 '23 07:12 trivialfis

Stupid question: wouldn't setting a proxy dmatrix inside the prediction function be slower and/or consume the same amount of memory as creating a dmatrix to use for predict instead? Or am I misunderstanding the mechanism?

david-cortes avatar Dec 11 '23 18:12 david-cortes

BTW, while trying to add base margin here, I ran into an issue which manifested itself with both inplace predictions and dmatrix predictions: https://github.com/dmlc/xgboost/issues/9869

david-cortes avatar Dec 11 '23 20:12 david-cortes

wouldn't setting a proxy dmatrix inside the prediction function be slower and/or consume the same amount of memory

No, the proxy DMatrix is just a reference to the original data, so, don't free the data without first freeing the proxy dmatrix ...

trivialfis avatar Dec 12 '23 03:12 trivialfis

With hindsight, the inplace prediction can be completely implemented with proxy DMatrix without those special parameters for csr/dense, but at this point, there's no real benefit to making such breaking changes.

trivialfis avatar Dec 12 '23 03:12 trivialfis

Added base_margin as a parameter to predict, plus a function to use the in-place prediction on data frames.

david-cortes avatar Dec 12 '23 19:12 david-cortes

@trivialfis After adding support for categorical features for data frames, I just realized that passing them to predict will now require re-identifying which of those earlier factor columns were categorical, given that they need to be manually converted to base-0 indexing.

What would be the best way to retrieve that information (which columns are categorical) from the booster?

david-cortes avatar Dec 12 '23 21:12 david-cortes

@david-cortes In Python, the booster has the same feature_type parameter, search XGBoosterGetStrFeatureInfo in core.py.

trivialfis avatar Dec 13 '23 00:12 trivialfis

@trivialfis Thanks for the tip. Would be very helpful if you could investigate the failing MSVC job here so that we can merge this PR soon, as I'm finding some of the code here will be needed for the QuantileDMatrix and other PRs too.

I unfortunately don't know how best to debug an MSVC-compiled module that needs to be imported in R, as it can't use typical tools like sanitizers with a regular R build.

david-cortes avatar Dec 17 '23 22:12 david-cortes

I can try, but I admit I don't use Windows very often.

trivialfis avatar Dec 19 '23 02:12 trivialfis

Is this PR independent of #9902?

I use Windows as a daily driver, so should be able to help debug the error in this PR.

hcho3 avatar Dec 20 '23 06:12 hcho3

Is this PR independent of https://github.com/dmlc/xgboost/pull/9902?

yes.

trivialfis avatar Dec 20 '23 06:12 trivialfis

Tried to use the sanitizer bundled in vs, but ran into a compiler internal error ...

trivialfis avatar Jan 02 '24 07:01 trivialfis

Tried to use the sanitizer bundled in vs, but ran into a compiler internal error ...

Does that compiler error still happen if you comment out the R-specific code introduced here?

david-cortes avatar Jan 02 '24 18:01 david-cortes

The compiler error with sanitizer was about a different file in XGB (refresher), I will try again tomorrow.

trivialfis avatar Jan 02 '24 18:01 trivialfis

I have made no progress on the Windows error, we will have to delay this PR for a while.

trivialfis avatar Jan 15 '24 09:01 trivialfis

Discussion with @hcho3 , we are likely to move away from MSVC for the R package, along with which, we will drop support for building R package with GPU on Windows using MSVC. We haven't ruled out the possibility of using rtools and CMake to build GPU support, in addition, recommending WSL2 is a potential option as well.

trivialfis avatar Jan 16 '24 08:01 trivialfis

Discussion with @hcho3 , we are likely to move away from MSVC for the R package, along with which, we will drop support for building R package with GPU on Windows using MSVC. We haven't ruled out the possibility of using rtools and CMake to build GPU support, in addition, recommending WSL2 is a potential option as well.

Is that just for the R package, or also for the python package for windows? If just for R, have there been MSVC-specific issues like this before? Curious about what makes it problematic.

david-cortes avatar Jan 16 '24 17:01 david-cortes

It's just the R package,mostly because I couldn't figure out the compiler warning and the error in this PR. So far, it seems MSVC is not handling the C headers from R very well.

trivialfis avatar Jan 16 '24 18:01 trivialfis

Could you please help rebase the PR?

trivialfis avatar Jan 20 '24 16:01 trivialfis

Updated, but I'm now not 100% sure if everything is correct.

I suddenly started getting an error in this test and the one right after it after merging the latest master: https://github.com/dmlc/xgboost/blob/c5d0608057f3c0b335fd31594c65a3a3ca4afca3/R-package/tests/testthat/test_basic.R#L142

Before, it was using ntreelimit, which I guess meant that the code was converting to DMatrix, while now it's actually using inplace predict.

It looks like predictions from in-place match those from DMatrix and from xgb.train up to only about ~1e-7 in this particular case (dart booster), while elsewhere they seem to match closer.


BTW, would be ideal if you could merge the PR about data iterators before this one: https://github.com/dmlc/xgboost/pull/9913

david-cortes avatar Jan 20 '24 19:01 david-cortes

Updated, but I'm now not 100% sure if everything is correct.

Please let me know how things are going. ;-) Is there anything I can help?

trivialfis avatar Feb 18 '24 20:02 trivialfis

Updated, but I'm now not 100% sure if everything is correct.

Please let me know how things are going. ;-) Is there anything I can help?

PR is ready for review. Would be ideal if you could look into this issue with loss of floating point precision for dart. Since the difference between inplace and dmatrix is below $10^{-6}$ I guess it might not be a correctness problem, but it's odd that it gets a larger difference with dart than with gbtree.

david-cortes avatar Feb 19 '24 18:02 david-cortes

Thank you for the work on inplace prediction! The R package is looking much better now thanks to your great work.

trivialfis avatar Feb 23 '24 16:02 trivialfis