TiSASRec.pytorch
TiSASRec.pytorch copied to clipboard
self.time_matrix_K_emb
https://github.com/pmixer/TiSASRec.pytorch/blob/e87342ead6e90898234432f7d9b86e76695008bc/model.py#L104 thx for your nice work about implementing 'TiSASRec' model with pytorch. here is a issue: 'self.item_num+1' may be written in 'args.time_span+1', although with the condition of item_num > time_span , it will have no effect on the result.
https://github.com/pmixer/TiSASRec.pytorch/blob/e87342ead6e90898234432f7d9b86e76695008bc/model.py#L104
thx for your nice work about implementing 'TiSASRec' model with pytorch. here is a issue: 'self.item_num+1' may be written in 'args.time_span+1', although with the condition of item_num > time_span , it will have no effect on the result.
yes, you are right, thx for pointing out this typo!
https://github.com/pmixer/TiSASRec.pytorch/blob/059c6a6193ad6b93ac318f43ddbdcf5d428c9d42/model.py#L66
hi, pximer:
Referring to your full implementation of transformer encoder, i got a RuntimeError in [ model.py:66 ] when i tried to modify args.num_heads==1 to args.num_heads==4.
I modify the code with time_mask = time_mask.unsqueeze(-1).repeat(self.head_num, 1, attn_weights.shape[-1])
, then the code works well.
here is the difference between torch.Tensor.expand() and torch.Tensor.repeat().
@youngzw THANK YOU again for pointing out the bug! Yeah, I used the expand instead of repeat just to save gpu memory which may be a not so good habit 😿, it has been fixed with the most recent commit. BTW, just out of curiousity, what hidden size you used for num_heads=4? As default hidden size=50 does not support num_heads=4 actually.
yeah, i find hidden size=50 does not support num_heads=4, so i adjust hidden_size = 64. for this problem, adding a FC layer behind the multihead layer might be one way, but i haven't tried that, or this way may affect the exp result :)
never mind, no need of FC layer, I just want to confirm that you changed hidden size, pls just set num heads as divisor of default hidden size or change hidden size to multiple of num heads, as that’s a natural requirement for multi head attention, I will add a checker in the code later to remind users about this issue.
Regards, Zan Huang
发件人: Zhongwei Yang [email protected] 发送时间: Saturday, November 21, 2020 10:37:58 PM 收件人: pmixer/TiSASRec.pytorch [email protected] 抄送: 黄(Huáng)瓒(Zàn) [email protected]; Comment [email protected] 主题: Re: [pmixer/TiSASRec.pytorch] self.time_matrix_K_emb (#2)
yeah, i find hidden size=50 does not support num_heads=4, so i adjust hidden_size = 64. for this problem, adding a FC layer behind the multihead layer might be one way, but i haven't tried that, or this way may affect the exp result :)
― You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/pmixer/TiSASRec.pytorch/issues/2#issuecomment-731693354, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADFOOX5DRJFDLWXPIW6LRRDSRCBRNANCNFSM4TLJKKXA.
well-done job!!!
Hi friends, could you tell me how long it takes for one epoch?
depends on your dataset size, if just for default ml1m dataset, expect few minutes, even just w/ cpu.
Thanks! maybe I have a bug. I try to replace your time interval data with distance interval, but I've waited for half-hour. Could you tell me a bit more about how you calculate the time interval when the items are padded in the sequence? (e.g., the maxlen=50, where the actual sequence len=40, the #padded items=10)
Thanks! maybe I have a bug. I try to replace your time interval data with distance interval, but I've waited for half-hour. Could you tell me a bit more about how you calculate the time interval when the items are padded in the sequence? (e.g., the maxlen=50, where the actual sequence len=40, the #padded items=10)
@xiqxin1 it's bit long time ago, if I remember it in the right way, time interval should be time span between two items interaction timestamp https://github.com/pmixer/TiSASRec.pytorch/blob/5ab23975fe695cdba1fbc7b466710652b9fac9b1/utils.py#L15 pls consider opening a new issue if still having problem, btw, TiSASRec paper should provide more info for your reference if hope to modifiy the model based on former work.
Hi, pmixer: I want to test the model and see the performance, so I just run the command
python main.py --dataset=ml-1m --train_dir=default --dropout_rate=0.2 --device=cuda --state_dict_path='ml-1m_default/TiSASRec.epoch=601.lr=0.001.layer=2.head=1.hidden=50.maxlen=200.pth' --inference_only=true --maxlen=200
as your README. But I get this error:
size mismatch for time_matrix_K_emb.weight: copying a param with shape torch.Size([3417, 50]) from checkpoint, the shape in current model is torch.Size([257, 50]).
I notice in your code, the time span should be 256, why the pretrained model you provided is 3417?
Hi, pmixer: I want to test the model and see the performance, so I just run the command
python main.py --dataset=ml-1m --train_dir=default --dropout_rate=0.2 --device=cuda --state_dict_path='ml-1m_default/TiSASRec.epoch=601.lr=0.001.layer=2.head=1.hidden=50.maxlen=200.pth' --inference_only=true --maxlen=200
as your README. But I get this error:
size mismatch for time_matrix_K_emb.weight: copying a param with shape torch.Size([3417, 50]) from checkpoint, the shape in current model is torch.Size([257, 50]).
I notice in your code, the time span should be 256, why the pretrained model you provided is 3417?
Never mind, I just train a new model, and the testing performance is even a bit better than your paper. Thanks for your solid work~
Hi, pmixer: I want to test the model and see the performance, so I just run the command
python main.py --dataset=ml-1m --train_dir=default --dropout_rate=0.2 --device=cuda --state_dict_path='ml-1m_default/TiSASRec.epoch=601.lr=0.001.layer=2.head=1.hidden=50.maxlen=200.pth' --inference_only=true --maxlen=200
as your README. But I get this error:
size mismatch for time_matrix_K_emb.weight: copying a param with shape torch.Size([3417, 50]) from checkpoint, the shape in current model is torch.Size([257, 50]).
I notice in your code, the time span should be 256, why the pretrained model you provided is 3417?
Never mind, I just train a new model, and the testing performance is even a bit better than your paper. Thanks for your solid work~
sounds good, 3417 sounds like user/item num. btw, I'm not the paper author, just ported the code from tf to pytorch.