Calculation of NHP's log-likelihood
Hi. I have doubts on how the code calculates NHP's log-likelihood, which is specified by equation (8) in the NHP paper:
The former part of equation (8) is the sum of individual-type log intensities at all event occurrences. In the compute_loglikelihood method in torch_basemodel.py, however, I saw the type dimension (last dimension) of lambda_at_event is summed: https://github.com/ant-research/EasyTemporalPointProcess/blob/01551fb7c7adb3553d33fa1aab0266d139a80aea/easy_tpp/model/torch_model/torch_basemodel.py#L104 which doesn't seem to comply with equation (8) shown above. FYI, the caller of this method in NHP is https://github.com/ant-research/EasyTemporalPointProcess/blob/01551fb7c7adb3553d33fa1aab0266d139a80aea/easy_tpp/model/torch_model/torch_nhp.py#L265 where lambda_at_event has shape [batch_size, num_times=max_len-1, num_event_types], as the comments mentioned.
The latter part of equation (8), according to Algorithm 1 of the NHP paper:
is calculated by uniformly sampling the intensity function over the whole time frame of the event sequence. In the code, however, it seems like the sampling is done by drawing equidistant samples in each inter-event interval: https://github.com/ant-research/EasyTemporalPointProcess/blob/01551fb7c7adb3553d33fa1aab0266d139a80aea/easy_tpp/model/torch_model/torch_nhp.py#L255 which, I imagine, can give quite different results from uniform sampling for event sequences of large inter-event-time variance.
Maybe I'm missing something. Thank you for your help.
Hi,
Regarding the sampling method used here, we are working on to publish a newer version to fix relevant issues. this will be done shortly.
Thank you. It'll be really helpful to my research. For now it's a bit hard to reproduce the results in the original papers.
We will have quite a few PRs in next month, fixing the potential problems in loglike computation, some inconsistency in reproducing models and other issues.
Besides, we will issue a tech report to explain the details of implementation and re-publish the experiment results.
The former part of equation (8) is the sum of individual-type log intensities at all event occurrences.
Not exactly. It refers to the sum of the log intensities of types of event that actually happens, not the sum of intensities of all types.
see page 15 in https://arxiv.org/pdf/1612.09328
it seems like the sampling is done by drawing equidistant samples in each inter-event interval:
Technically we should uniformly sample the intensity function over the whole time frame, but in practice we sample over some known inter-event intervals. What i did follows this https://github.com/yangalan123/anhp-andtt/blob/master/anhp/model/xfmr_nhp_fast.py
If you find it is indeed necessary to modify the code, you are welcome to make a pr.
Thank you for your response. The sentence
former part of equation (8) is the sum of individual-type log intensities at all event occurrences.
does seem misleading as I read it now. Sorry for my poor English. I meant what you said: it's the sum of the log intensities of types of event that actually happens. What I noticed is the code sums over all types:
which is not right I think.
老哥,
lambda_type_mask [batch_size, seq_len, type_dim] , one-hot 的, 最后一个 type_dim 对应的真实发生的 type 那个位置是1, 其他是0 。
lambda_at_event [batch_size, seq_len, type_dim], 非 one-hot 的。
两个一乘,就是真实发生的那个type 对应的intensity 保留下来了,其他没发生的变成了零。再求和,就是所有真实发生的intensity。
同意否?
同意,之前看错以为 lambda_type_mask 只是用来区别padding的。谢谢!我是学生还在学习,多多包涵哈哈
nice. closed