xgboost-operator A potential refinement on document

A potential refinement on document

Open 0as1s opened this issue 4 years ago • 0 comments

trafficstars

When I started to deploy xgboost-operator on my kubeflow cluster, I referred to https://github.com/kubeflow/xgboost-operator/blob/master/config/samples/xgboost-dist/utils.py#L47 to implement my own version to read my own data. It's very common I follow this function to read parts of the whole data according to the rank manually.

However, I found that dmatrix already has an internal logic to only read parts of data when it detects distributed mode. Then my manual data reading causes each rank to only read 1/N*N instead of 1/N data.

I think it could be better if adding a comment in that function to guide the users to rewrite it.

Aug 09 '21 04:08 0as1s

xgboost-operator xgboost-operator copied to clipboard

A potential refinement on document

xgboost-operator
xgboost-operator copied to clipboard