ipex-llm classification model tutorial

Jun 30 '22 09:06 yexinyinancy

Xshards now does not support 1) shuffle dataframe, 2) astype (data type change), 3) train_test_split, 4) duplicate whole dataframe according to one column.

Jun 30 '22 13:06 yexinyinancy

Xshards now does not support 1) shuffle dataframe, 2) astype (data type change), 3) train_test_split, 4) duplicate whole dataframe according to one column.

I think for train_test_split, astype, duplicate we can use transform_shard api provided in shards? Also I am wondering why we need shuffle dataframe, I think during training our optimizer will shuffle the data.

Jun 30 '22 17:06 dding3

Xshards now does not support 1) shuffle dataframe, 2) astype (data type change), 3) train_test_split, 4) duplicate whole dataframe according to one column.

I think for train_test_split, astype, duplicate we can use transform_shard api provided in shards? Also I am wondering why we need shuffle dataframe, I think during training our optimizer will shuffle the data.

duplicate is not supported now. shuffle is indeed unnecessary. train_test_split and astype can be performed via transform_shard.

Jul 01 '22 05:07 yexinyinancy

Xshards now does not support 1) shuffle dataframe, 2) astype (data type change), 3) train_test_split, 4) duplicate whole dataframe according to one column.

I think for train_test_split, astype, duplicate we can use transform_shard api provided in shards? Also I am wondering why we need shuffle dataframe, I think during training our optimizer will shuffle the data.

But is it reasonable for the users call astype after using MinMaxScaler of LabelEncoder? MLlib types should better be hidden and the output types should better be common basic types?

Jul 01 '22 05:07 hkvision

Xshards now does not support 1) shuffle dataframe, 2) astype (data type change), 3) train_test_split, 4) duplicate whole dataframe according to one column.

I think for train_test_split, astype, duplicate we can use transform_shard api provided in shards? Also I am wondering why we need shuffle dataframe, I think during training our optimizer will shuffle the data.

But is it reasonable for the users call astype after using MinMaxScaler of LabelEncoder? MLlib types should better be hidden and the output types should better be common basic types?

We should not expose implementation details (e.g., MLlib types) to the user.

Jul 01 '22 22:07 jason-dai

Updated the code to change the mllib vectors type to array, I think if it's for change the array type to pytorch tensor type, we may need use transform_shard api. astype is more like to change basic type to other basic type(eg. double to int)?

Jul 02 '22 00:07 dding3

Updated the code to change the mllib vectors type to array, I think if it's for change the array type to pytorch tensor type, we may need use transform_shard api. astype is more like to change basic type to other basic type(eg. double to int)?

I think in the Estimator, we would help to convert to torch tensor, check it? @yexinyinancy

Jul 04 '22 01:07 hkvision

ipex-llm ipex-llm copied to clipboard

classification model tutorial

ipex-llm
ipex-llm copied to clipboard