LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

[RFC] remove HDFS support?

Open jameslamb opened this issue 9 months ago • 3 comments

Proposal

I'm requesting comment on the following proposal:

  • remove CMake option USE_HDFS
  • remove all docs and code related to HDFS

And doing all of these only after issuing build-time and run-time deprecation warnings for 1 release.

Summary

6+ years ago, #1243 was merged, adding "experimental" support for Hadoop Filesystem ("HDFS") to LightGBM.

Specifically, that meant adding the ability to provide an HDFS URI to Dataset creation methods that accept files, and have LightGBM read the file from HDFS.

Since then:

  • we've received very very few issues / PRs related to that support
  • 0 tests have ever been run on that build of the library
  • XGBoost has removed their equivalent support (https://github.com/dmlc/xgboost/pull/9504)

I don't believe SynapseML depends on the HDFS build of LightGBM (GitHub search). Hopefully @imatiach-msft or @mhamilton723 can confirm.

Searching the rest of GitHub (GitHub search), I just see other forks of LightGBM and two recipes from repackages:

Motivation

Simplifies the project's public interface (build system options, installation docs, etc.).

Simplifies file-reading C++ code, making it easier to understand and refactor in the future.

Here in 2024, users have other and better options for using LightGBM distributed training on data in HDFS, like:

  • lightgbm.dask (Dask docs on Hadoop)
  • Spark via SynapseML (https://github.com/microsoft/SynapseML)

jameslamb avatar May 01 '24 03:05 jameslamb

@jameslamb Thank you for referring. For Macports: Since we do not enable this option presently, removing it won’t break anything. So we are good.

barracuda156 avatar May 01 '24 07:05 barracuda156

I'm all for this change, simplification is always good and if we neither have tests nor indication that nobody has used this (extensively), that's even more of an argument :+1:

borchero avatar May 01 '24 18:05 borchero

Thanks for the heads up. will remove the arg from nixpkgs when updating for 4.4.0

nviets avatar May 03 '24 17:05 nviets