Large scale dataset training

Open Ruazzm opened this issue 1 year ago • 1 comments

Hi, I have encountered an issue where the dataset I entered is too large to be read, and if it is particularly large, , it can cause the process to be Killed. For example,

Loading extension module split_decision... Using /root/.cache/torch_extensions/py38_cu118 as PyTorch extensions root... No modifications detected for re-loaded extension module split_decision, skipping build step... Loading extension module split_decision... Killed

How can I solve this problem? Does PGBM support batch training? Thanks

Jun 14 '24 15:06 Ruazzm

Thanks for reporting, looking into it. PGBM currently doesn't support batch training, unfortunately. I'd suggest to try the CPU version based on Sklearn - let me know if that one does work for you?

Jul 29 '24 19:07 elephaint