DeepTables
DeepTables copied to clipboard
varlen features implementations
Do you have a plan to implement Varlen sparse features and different pooling layers?
pooling layers? can you say it more specifically?
I meant mostly varlan sparse features, for example, sequence of item_ids. Every item id is a sparse feature, and the last 10 items purchased by a user is a sequence of embeddings of item-ids which can be aggregated with a pooling layer like averagepooling.
I got it, it will be supported in next release.
Thanks for your quick reply. Looking forward to it. Any ETA for the next release?
It should be around October this year.
sorry for bugging you. Wonder if the release mentioned above is available. Thank you very much
I'm very very sorry for the delay of the original plan due to some other urgent tasks in the past two months. I will strive to release this new feature by the end of November.Sorry again.
@minarastgar varlen features is ready. here for details https://github.com/DataCanvasIO/DeepTables/pull/44#issue-527389069
@jackguagua thank you so much. This is absolutely fantastic
Hi @jackguagua , I have a quick question about Varlen Features. Let's say there is a varlen feature like streams of movie_ids
, and a categorical feature that is the movie_id
we want to show to user. So we want to have an embedding for movie_id which is used by movie_id
as well as streams of movie_ids
. How can I specify that the embedding used for streams_of_movie_ids and movie_id is the same
task=consts.TASK_REGRESSION,
categorical_columns=["movie_id", "user_id", "gender", "occupation", "zip", "title", "age"],
metrics=['mse'],
fixed_embedding_dim=True,
embeddings_output_dim=4,
apply_gbm_features=False,
apply_class_weight=True,
earlystopping_patience=5,
var_len_categorical_columns=[('stream_of_movie_ids', "|", "max")]) ```
DT can't do what you want now. I'm not very clear about the purpose of doing this. If you have the code that uses keras to implement it, pls send to me for reference.
Let me please clarify this, let say we have a list of movie_id [movie_id1, movie_id2,...,movie_id10] which are the last 10 movies watched by the user. On the other hand, we have a target movie which is movie_id100 (sparse_feature). for both streams (list of movie_id ) and sparse (target_title), we want to use movie_ids to build the embeddings. We do not want to generate different embeddings for entities in streams and sparse. the are coming from the same root which is movie_id.