xgboost-operator
xgboost-operator copied to clipboard
A potential refinement on document
trafficstars
When I started to deploy xgboost-operator on my kubeflow cluster, I referred to https://github.com/kubeflow/xgboost-operator/blob/master/config/samples/xgboost-dist/utils.py#L47 to implement my own version to read my own data. It's very common I follow this function to read parts of the whole data according to the rank manually.
However, I found that dmatrix already has an internal logic to only read parts of data when it detects distributed mode. Then my manual data reading causes each rank to only read 1/N*N instead of 1/N data.
I think it could be better if adding a comment in that function to guide the users to rewrite it.