DeepTables icon indicating copy to clipboard operation
DeepTables copied to clipboard

varlen features implementations

Open minarastgar opened this issue 3 years ago • 12 comments

Do you have a plan to implement Varlen sparse features and different pooling layers?

minarastgar avatar Aug 28 '20 20:08 minarastgar

pooling layers? can you say it more specifically?

jackguagua avatar Aug 29 '20 01:08 jackguagua

I meant mostly varlan sparse features, for example, sequence of item_ids. Every item id is a sparse feature, and the last 10 items purchased by a user is a sequence of embeddings of item-ids which can be aggregated with a pooling layer like averagepooling.

minarastgar avatar Aug 29 '20 02:08 minarastgar

I got it, it will be supported in next release.

jackguagua avatar Aug 29 '20 02:08 jackguagua

Thanks for your quick reply. Looking forward to it. Any ETA for the next release?

minarastgar avatar Sep 03 '20 17:09 minarastgar

It should be around October this year.

jackguagua avatar Sep 04 '20 00:09 jackguagua

sorry for bugging you. Wonder if the release mentioned above is available. Thank you very much

minarastgar avatar Oct 31 '20 01:10 minarastgar

I'm very very sorry for the delay of the original plan due to some other urgent tasks in the past two months. I will strive to release this new feature by the end of November.Sorry again.

jackguagua avatar Oct 31 '20 03:10 jackguagua

@minarastgar varlen features is ready. here for details https://github.com/DataCanvasIO/DeepTables/pull/44#issue-527389069

jackguagua avatar Nov 26 '20 05:11 jackguagua

@jackguagua thank you so much. This is absolutely fantastic

minarastgar avatar Nov 30 '20 00:11 minarastgar

Hi @jackguagua , I have a quick question about Varlen Features. Let's say there is a varlen feature like streams of movie_ids, and a categorical feature that is the movie_id we want to show to user. So we want to have an embedding for movie_id which is used by movie_id as well as streams of movie_ids . How can I specify that the embedding used for streams_of_movie_ids and movie_id is the same

                                   task=consts.TASK_REGRESSION,
                                   categorical_columns=["movie_id", "user_id", "gender", "occupation", "zip", "title", "age"],
                                   metrics=['mse'],
                                   fixed_embedding_dim=True,
                                   embeddings_output_dim=4,
                                   apply_gbm_features=False,
                                   apply_class_weight=True,
                                   earlystopping_patience=5,
                                   var_len_categorical_columns=[('stream_of_movie_ids', "|", "max")]) ```

minarastgar avatar Jan 28 '21 00:01 minarastgar

DT can't do what you want now. I'm not very clear about the purpose of doing this. If you have the code that uses keras to implement it, pls send to me for reference.

jackguagua avatar Jan 31 '21 12:01 jackguagua

Let me please clarify this, let say we have a list of movie_id [movie_id1, movie_id2,...,movie_id10] which are the last 10 movies watched by the user. On the other hand, we have a target movie which is movie_id100 (sparse_feature). for both streams (list of movie_id ) and sparse (target_title), we want to use movie_ids to build the embeddings. We do not want to generate different embeddings for entities in streams and sparse. the are coming from the same root which is movie_id.

minarastgar avatar Feb 04 '21 01:02 minarastgar