xgboost_ray
xgboost_ray copied to clipboard
add multi-label support
Hi I added support to allow label as a list. So we can support reading data with multiple labels. This can then solve https://github.com/ray-project/xgboost_ray/issues/286. I verified new unit tests pass. Also test_matrix.py all pass with my local set up. I verified locally by training a xgboost model with parquet data format, it works well. So far it should work well for parquet data format. Thank you!
I verified the change works with the blow code example:
from sklearn.datasets import make_multilabel_classification
import pandas as pd
import numpy as np
n_classes = 5
random_state = 0
X, y = make_multilabel_classification(n_samples=32, n_classes=5, n_labels=3, random_state=random_state)
features = [f"f{i}" for i in range(len(X[0]))]
labels = [f"label_{i}" for i in range(n_classes)]
X_df = pd.DataFrame(X, columns = features)
y_df = pd.DataFrame(y, columns = labels)
data = pd.concat([X_df, y_df], axis = 1)
data.to_parquet("~/Desktop/sample_data/data.parquet")
from xgboost_ray import RayDMatrix, RayParams, train, RayFileType
n_classes = 5
features = [f"f{i}" for i in range(20)]
labels = [f"label_{i}" for i in range(n_classes)]
training_data = "~/Desktop/sample_data"
train_set = RayDMatrix(training_data, labels, columns = features + labels, filetype=RayFileType.PARQUET)
evals_result = {}
bst = train(
{
"objective": "binary:logistic",
"eval_metric": ["logloss", "error"],
"random_state": random_state,
},
train_set,
num_boost_round = 1,
evals_result=evals_result,
evals=[(train_set, "train")],
verbose_eval=False,
ray_params=RayParams(
num_actors=1, # Number of remote actors
cpus_per_actor=1))
#bst.save_model("model.xgb")
#print("Final training error: {:.4f}".format(
# evals_result["train"]["error"][-1]))
from xgboost_ray import predict
pred_ray = predict(bst, train_set, ray_params=RayParams(num_actors=1))
print(pred_ray)
import xgboost as xgb
clf = xgb.XGBClassifier(tree_method="hist", n_estimators = 1, random_state=0)
clf.fit(X, y)
expected = clf.predict_proba(X)
np.testing.assert_allclose(expected, pred_ray)
@Yard1 can you help take a look when you get a chance? thanks!
Hi @Yard1 may I ask how to fix the lint test? Seems it still blocks the merge. Thank you!
Can you run the ./format.sh script in the root of the repo?
@louis-huang can you please run the above test please?