EasyTemporalPointProcess Calculation of NHP's log-likelihood

Hi. I have doubts on how the code calculates NHP's log-likelihood, which is specified by equation (8) in the NHP paper: Screenshot 2024-09-18 at 14 54 42

The former part of equation (8) is the sum of individual-type log intensities at all event occurrences. In the compute_loglikelihood method in torch_basemodel.py, however, I saw the type dimension (last dimension) of lambda_at_event is summed: https://github.com/ant-research/EasyTemporalPointProcess/blob/01551fb7c7adb3553d33fa1aab0266d139a80aea/easy_tpp/model/torch_model/torch_basemodel.py#L104 which doesn't seem to comply with equation (8) shown above. FYI, the caller of this method in NHP is https://github.com/ant-research/EasyTemporalPointProcess/blob/01551fb7c7adb3553d33fa1aab0266d139a80aea/easy_tpp/model/torch_model/torch_nhp.py#L265 where lambda_at_event has shape [batch_size, num_times=max_len-1, num_event_types], as the comments mentioned.

The latter part of equation (8), according to Algorithm 1 of the NHP paper: Screenshot 2024-09-18 at 15 40 10 is calculated by uniformly sampling the intensity function over the whole time frame of the event sequence. In the code, however, it seems like the sampling is done by drawing equidistant samples in each inter-event interval: https://github.com/ant-research/EasyTemporalPointProcess/blob/01551fb7c7adb3553d33fa1aab0266d139a80aea/easy_tpp/model/torch_model/torch_nhp.py#L255 which, I imagine, can give quite different results from uniform sampling for event sequences of large inter-event-time variance.

Maybe I'm missing something. Thank you for your help.

Sep 18 '24 23:09 HaochenWang1243

Hi,

Regarding the sampling method used here, we are working on to publish a newer version to fix relevant issues. this will be done shortly.

Sep 19 '24 02:09 iLampard

Thank you. It'll be really helpful to my research. For now it's a bit hard to reproduce the results in the original papers.

Sep 19 '24 02:09 HaochenWang1243

We will have quite a few PRs in next month, fixing the potential problems in loglike computation, some inconsistency in reproducing models and other issues.

Besides, we will issue a tech report to explain the details of implementation and re-publish the experiment results.

Sep 20 '24 05:09 iLampard

The former part of equation (8) is the sum of individual-type log intensities at all event occurrences.

Not exactly. It refers to the sum of the log intensities of types of event that actually happens, not the sum of intensities of all types.

see page 15 in https://arxiv.org/pdf/1612.09328

it seems like the sampling is done by drawing equidistant samples in each inter-event interval:

Technically we should uniformly sample the intensity function over the whole time frame, but in practice we sample over some known inter-event intervals. What i did follows this https://github.com/yangalan123/anhp-andtt/blob/master/anhp/model/xfmr_nhp_fast.py

If you find it is indeed necessary to modify the code, you are welcome to make a pr.

Nov 14 '24 09:11 iLampard

Thank you for your response. The sentence

former part of equation (8) is the sum of individual-type log intensities at all event occurrences.

does seem misleading as I read it now. Sorry for my poor English. I meant what you said: it's the sum of the log intensities of types of event that actually happens. What I noticed is the code sums over all types: which is not right I think.

Nov 14 '24 09:11 HaochenWang1243

老哥，

lambda_type_mask [batch_size, seq_len, type_dim] , one-hot 的，最后一个 type_dim 对应的真实发生的 type 那个位置是1，其他是0 。

lambda_at_event [batch_size, seq_len, type_dim]，非 one-hot 的。

两个一乘，就是真实发生的那个type 对应的intensity 保留下来了，其他没发生的变成了零。再求和，就是所有真实发生的intensity。

同意否？

Nov 15 '24 03:11 iLampard

同意，之前看错以为 lambda_type_mask 只是用来区别padding的。谢谢！我是学生还在学习，多多包涵哈哈

Nov 15 '24 06:11 HaochenWang1243

nice. closed

Nov 15 '24 10:11 iLampard