ipex-llm
ipex-llm copied to clipboard
classification model tutorial
Xshards now does not support 1) shuffle dataframe, 2) astype (data type change), 3) train_test_split, 4) duplicate whole dataframe according to one column.
Xshards now does not support 1) shuffle dataframe, 2) astype (data type change), 3) train_test_split, 4) duplicate whole dataframe according to one column.
I think for train_test_split
, astype
, duplicate
we can use transform_shard
api provided in shards? Also I am wondering why we need shuffle dataframe, I think during training our optimizer will shuffle the data.
Xshards now does not support 1) shuffle dataframe, 2) astype (data type change), 3) train_test_split, 4) duplicate whole dataframe according to one column.
I think for
train_test_split
,astype
,duplicate
we can usetransform_shard
api provided in shards? Also I am wondering why we need shuffle dataframe, I think during training our optimizer will shuffle the data.
duplicate
is not supported now. shuffle
is indeed unnecessary. train_test_split
and astype
can be performed via transform_shard
.
Xshards now does not support 1) shuffle dataframe, 2) astype (data type change), 3) train_test_split, 4) duplicate whole dataframe according to one column.
I think for
train_test_split
,astype
,duplicate
we can usetransform_shard
api provided in shards? Also I am wondering why we need shuffle dataframe, I think during training our optimizer will shuffle the data.
But is it reasonable for the users call astype after using MinMaxScaler of LabelEncoder? MLlib types should better be hidden and the output types should better be common basic types?
Xshards now does not support 1) shuffle dataframe, 2) astype (data type change), 3) train_test_split, 4) duplicate whole dataframe according to one column.
I think for
train_test_split
,astype
,duplicate
we can usetransform_shard
api provided in shards? Also I am wondering why we need shuffle dataframe, I think during training our optimizer will shuffle the data.But is it reasonable for the users call astype after using MinMaxScaler of LabelEncoder? MLlib types should better be hidden and the output types should better be common basic types?
We should not expose implementation details (e.g., MLlib types) to the user.
Updated the code to change the mllib vectors type to array, I think if it's for change the array type to pytorch tensor type, we may need use transform_shard
api. astype
is more like to change basic type to other basic type(eg. double to int)?
Updated the code to change the mllib vectors type to array, I think if it's for change the array type to pytorch tensor type, we may need use
transform_shard
api.astype
is more like to change basic type to other basic type(eg. double to int)?
I think in the Estimator, we would help to convert to torch tensor, check it? @yexinyinancy