DeepTables varlen features implementations

varlen features implementations

Open minarastgar opened this issue 3 years ago • 12 comments

Do you have a plan to implement Varlen sparse features and different pooling layers?

Aug 28 '20 20:08 minarastgar

pooling layers? can you say it more specifically?

Aug 29 '20 01:08 jackguagua

I meant mostly varlan sparse features, for example, sequence of item_ids. Every item id is a sparse feature, and the last 10 items purchased by a user is a sequence of embeddings of item-ids which can be aggregated with a pooling layer like averagepooling.

Aug 29 '20 02:08 minarastgar

I got it, it will be supported in next release.

Aug 29 '20 02:08 jackguagua

Thanks for your quick reply. Looking forward to it. Any ETA for the next release?

Sep 03 '20 17:09 minarastgar

It should be around October this year.

Sep 04 '20 00:09 jackguagua

sorry for bugging you. Wonder if the release mentioned above is available. Thank you very much

Oct 31 '20 01:10 minarastgar

I'm very very sorry for the delay of the original plan due to some other urgent tasks in the past two months. I will strive to release this new feature by the end of November.Sorry again.

Oct 31 '20 03:10 jackguagua

@minarastgar varlen features is ready. here for details https://github.com/DataCanvasIO/DeepTables/pull/44#issue-527389069

Nov 26 '20 05:11 jackguagua

@jackguagua thank you so much. This is absolutely fantastic

Nov 30 '20 00:11 minarastgar

Hi @jackguagua , I have a quick question about Varlen Features. Let's say there is a varlen feature like streams of movie_ids, and a categorical feature that is the movie_id we want to show to user. So we want to have an embedding for movie_id which is used by movie_id as well as streams of movie_ids . How can I specify that the embedding used for streams_of_movie_ids and movie_id is the same

                                   task=consts.TASK_REGRESSION,
                                   categorical_columns=["movie_id", "user_id", "gender", "occupation", "zip", "title", "age"],
                                   metrics=['mse'],
                                   fixed_embedding_dim=True,
                                   embeddings_output_dim=4,
                                   apply_gbm_features=False,
                                   apply_class_weight=True,
                                   earlystopping_patience=5,
                                   var_len_categorical_columns=[('stream_of_movie_ids', "|", "max")]) ```

Jan 28 '21 00:01 minarastgar

DT can't do what you want now. I'm not very clear about the purpose of doing this. If you have the code that uses keras to implement it, pls send to me for reference.

Jan 31 '21 12:01 jackguagua

Let me please clarify this, let say we have a list of movie_id [movie_id1, movie_id2,...,movie_id10] which are the last 10 movies watched by the user. On the other hand, we have a target movie which is movie_id100 (sparse_feature). for both streams (list of movie_id ) and sparse (target_title), we want to use movie_ids to build the embeddings. We do not want to generate different embeddings for entities in streams and sparse. the are coming from the same root which is movie_id.

Feb 04 '21 01:02 minarastgar

DeepTables DeepTables copied to clipboard

varlen features implementations

DeepTables
DeepTables copied to clipboard