TiSASRec.pytorch icon indicating copy to clipboard operation
TiSASRec.pytorch copied to clipboard

self.time_matrix_K_emb

Open youngzw opened this issue 4 years ago • 13 comments

https://github.com/pmixer/TiSASRec.pytorch/blob/e87342ead6e90898234432f7d9b86e76695008bc/model.py#L104 thx for your nice work about implementing 'TiSASRec' model with pytorch. here is a issue: 'self.item_num+1' may be written in 'args.time_span+1', although with the condition of item_num > time_span , it will have no effect on the result.

youngzw avatar Nov 05 '20 12:11 youngzw

https://github.com/pmixer/TiSASRec.pytorch/blob/e87342ead6e90898234432f7d9b86e76695008bc/model.py#L104

thx for your nice work about implementing 'TiSASRec' model with pytorch. here is a issue: 'self.item_num+1' may be written in 'args.time_span+1', although with the condition of item_num > time_span , it will have no effect on the result.

yes, you are right, thx for pointing out this typo!

pmixer avatar Nov 06 '20 02:11 pmixer

https://github.com/pmixer/TiSASRec.pytorch/blob/059c6a6193ad6b93ac318f43ddbdcf5d428c9d42/model.py#L66

hi, pximer:

Referring to your full implementation of transformer encoder, i got a RuntimeError in [ model.py:66 ] when i tried to modify args.num_heads==1 to args.num_heads==4.

I modify the code with time_mask = time_mask.unsqueeze(-1).repeat(self.head_num, 1, attn_weights.shape[-1]), then the code works well.

here is the difference between torch.Tensor.expand() and torch.Tensor.repeat().

youngzw avatar Nov 20 '20 07:11 youngzw

@youngzw THANK YOU again for pointing out the bug! Yeah, I used the expand instead of repeat just to save gpu memory which may be a not so good habit 😿, it has been fixed with the most recent commit. BTW, just out of curiousity, what hidden size you used for num_heads=4? As default hidden size=50 does not support num_heads=4 actually.

pmixer avatar Nov 22 '20 03:11 pmixer

yeah, i find hidden size=50 does not support num_heads=4, so i adjust hidden_size = 64. for this problem, adding a FC layer behind the multihead layer might be one way, but i haven't tried that, or this way may affect the exp result :)

youngzw avatar Nov 22 '20 03:11 youngzw

never mind, no need of FC layer, I just want to confirm that you changed hidden size, pls just set num heads as divisor of default hidden size or change hidden size to multiple of num heads, as that’s a natural requirement for multi head attention, I will add a checker in the code later to remind users about this issue.

Regards, Zan Huang


发件人: Zhongwei Yang [email protected] 发送时间: Saturday, November 21, 2020 10:37:58 PM 收件人: pmixer/TiSASRec.pytorch [email protected] 抄送: 黄(Huáng)瓒(Zàn) [email protected]; Comment [email protected] 主题: Re: [pmixer/TiSASRec.pytorch] self.time_matrix_K_emb (#2)

yeah, i find hidden size=50 does not support num_heads=4, so i adjust hidden_size = 64. for this problem, adding a FC layer behind the multihead layer might be one way, but i haven't tried that, or this way may affect the exp result :)

― You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/pmixer/TiSASRec.pytorch/issues/2#issuecomment-731693354, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADFOOX5DRJFDLWXPIW6LRRDSRCBRNANCNFSM4TLJKKXA.

pmixer avatar Nov 22 '20 03:11 pmixer

well-done job!!!

youngzw avatar Nov 22 '20 03:11 youngzw

Hi friends, could you tell me how long it takes for one epoch?

xiqxin1 avatar Jul 04 '22 14:07 xiqxin1

depends on your dataset size, if just for default ml1m dataset, expect few minutes, even just w/ cpu.

pmixer avatar Jul 04 '22 14:07 pmixer

Thanks! maybe I have a bug. I try to replace your time interval data with distance interval, but I've waited for half-hour. Could you tell me a bit more about how you calculate the time interval when the items are padded in the sequence? (e.g., the maxlen=50, where the actual sequence len=40, the #padded items=10)

xiqxin1 avatar Jul 04 '22 14:07 xiqxin1

Thanks! maybe I have a bug. I try to replace your time interval data with distance interval, but I've waited for half-hour. Could you tell me a bit more about how you calculate the time interval when the items are padded in the sequence? (e.g., the maxlen=50, where the actual sequence len=40, the #padded items=10)

@xiqxin1 it's bit long time ago, if I remember it in the right way, time interval should be time span between two items interaction timestamp https://github.com/pmixer/TiSASRec.pytorch/blob/5ab23975fe695cdba1fbc7b466710652b9fac9b1/utils.py#L15 pls consider opening a new issue if still having problem, btw, TiSASRec paper should provide more info for your reference if hope to modifiy the model based on former work.

pmixer avatar Jul 14 '22 14:07 pmixer

Hi, pmixer: I want to test the model and see the performance, so I just run the command

python main.py --dataset=ml-1m --train_dir=default --dropout_rate=0.2 --device=cuda --state_dict_path='ml-1m_default/TiSASRec.epoch=601.lr=0.001.layer=2.head=1.hidden=50.maxlen=200.pth' --inference_only=true --maxlen=200

as your README. But I get this error:

size mismatch for time_matrix_K_emb.weight: copying a param with shape torch.Size([3417, 50]) from checkpoint, the shape in current model is torch.Size([257, 50]).

I notice in your code, the time span should be 256, why the pretrained model you provided is 3417?

Hang-Z avatar Dec 09 '22 06:12 Hang-Z

Hi, pmixer: I want to test the model and see the performance, so I just run the command

python main.py --dataset=ml-1m --train_dir=default --dropout_rate=0.2 --device=cuda --state_dict_path='ml-1m_default/TiSASRec.epoch=601.lr=0.001.layer=2.head=1.hidden=50.maxlen=200.pth' --inference_only=true --maxlen=200

as your README. But I get this error:

size mismatch for time_matrix_K_emb.weight: copying a param with shape torch.Size([3417, 50]) from checkpoint, the shape in current model is torch.Size([257, 50]).

I notice in your code, the time span should be 256, why the pretrained model you provided is 3417?

Never mind, I just train a new model, and the testing performance is even a bit better than your paper. Thanks for your solid work~

Hang-Z avatar Dec 09 '22 06:12 Hang-Z

Hi, pmixer: I want to test the model and see the performance, so I just run the command

python main.py --dataset=ml-1m --train_dir=default --dropout_rate=0.2 --device=cuda --state_dict_path='ml-1m_default/TiSASRec.epoch=601.lr=0.001.layer=2.head=1.hidden=50.maxlen=200.pth' --inference_only=true --maxlen=200

as your README. But I get this error:

size mismatch for time_matrix_K_emb.weight: copying a param with shape torch.Size([3417, 50]) from checkpoint, the shape in current model is torch.Size([257, 50]).

I notice in your code, the time span should be 256, why the pretrained model you provided is 3417?

Never mind, I just train a new model, and the testing performance is even a bit better than your paper. Thanks for your solid work~

sounds good, 3417 sounds like user/item num. btw, I'm not the paper author, just ported the code from tf to pytorch.

pmixer avatar Dec 13 '22 01:12 pmixer