tick
tick copied to clipboard
Question: Separate background from trigerred events in Hawkes simulation
Your library is great! I am using SimuHawkesExpKernels to simulate a Hawkes process in a network. I was wondering if it was possible to separate events/timestamps caused by the baseline intensities and the events/timestamps caused by the kernel functions? And ideally which events triggered which events?
Hello,
No sadly, there is no such a thing. I could be possible to add it in the simulation process by modifying this function https://github.com/X-DataInitiative/tick/blob/master/lib/cpp/hawkes/simulation/simu_point_process.cpp#L107.
But, to separate what was caused by the baseline, you can have a look at the intensity λ at the time of the event (using track_intensity
) and look at the ratio between the baseline μ and λ. This event was caused by the baseline with a probability μ / λ.
Ok thanks for letting me know! I'll have a look at modifying the function. If I'm successful do you accept merge requests?
Oh yes, they are very welcome :)
So I've been doing some digging. From my understanding you do not create a set of base events/timestamps and then calculate the triggered events/timestamps. Instead, you calculate the time until the next event based on the current intensity which is the sum of the base intensity plus contributions from kernels. Hence I don't think it's possible to pinpoint which event triggered further events. Am I correct?
Instead, what do you think of separating the contributions from the kernels and the base? So we add a new parameter called tracked_intensity_node which is a matrix of [nodes,nodes,time] and has the intensity from all nodes j to a node i at time t
To me it is strictly equivalent to assign the simulated event to a node / baseline at simulation time or afterwards, by recalculating the contributions while simulation in finished.
Indeed, such a tracked_intensity_per_node
could be helpful to have a full vision of what happened. It should have its own flag and be disabled by default (since it should slow down things a lot) but is probably the cleanest way to obtain the branching structure of the simulation.
Would you need any help for this ?
Yes please! Let me do some more research and will come back with list of questions :-)
On Thu, 7 May 2020, 08:00 Martin, [email protected] wrote:
To me it is strictly equivalent to assign the simulated event to a node / baseline at simulation time or afterwards, by recalculating the contributions while simulation in finished.
Indeed, such a tracked_intensity_per_node could be helpful to have a full vision of what happened. It should have its own flag and be disabled by default (since it should slow down things a lot) but is probably the cleanest way to obtain the branching structure of the simulation.
Would you need any help for this ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/X-DataInitiative/tick/issues/438#issuecomment-625069473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFII2WC4RKZW4AFM2QUUEKLRQJMCJANCNFSM4MXUFIVA .
Hey so I just started to work through your developer documentation and familiarising myself with swig. I'm currently re-building the extensions every time I modify the cpp code which takes ages with python setup.py build_ext --inplace. Is there a trick to speeding this up? If I'm only modifying simu_hawkes.cpp file the build process is intelligent enough to build the hawkes directory and ignores the other modules. Is there a way of only building simu_hawkes.cpp and not the other files in the hawkes directory?
The easiest way to iterate quickly is to test it directly in C++ (for example by modifying this file https://github.com/X-DataInitiative/tick/blob/master/lib/cpp-test/hawkes/simulation/hawkes_simulation.cpp). Otherwise, to speed up the compilations you can use ccache but @Dekken would explain this much better than I would.
CC="ccache g++" python3 setup.py build_ext --inplace
this should work
alternatively there's another build tool support which is explained here: https://github.com/X-DataInitiative/tick/blob/master/INSTALL.md#alternative-installation-method
so you could just do
./sh/mkn.sh hawkes/simulation
to build just hawkes simulation
this open PR might be useful https://github.com/X-DataInitiative/tick/pull/440/files
Thanks for the quick responses! I decided to go with ccache since it seemed the most straightforward and works really well. I've added a line to INSTALL.md.
The easiest way to iterate quickly is to test it directly in C++ (for example by modifying this file https://github.com/X-DataInitiative/tick/blob/master/lib/cpp-test/hawkes/simulation/hawkes_simulation.cpp). Otherwise, to speed up the compilations you can use ccache but @Dekken would explain this much better than I would.
I also tried this approach since I will have to run the unit tests anyway.
I first installed gtest using this approach instead of cloning gtest: gtest
I then compiled lib/cpp-test/hawkes/simulation/CMakeLists.txt: cmake lib/cpp-test/hawkes/simulation/CMakeLists.txt
This creates a few files include a Makefile within the simulation directory as expected.
However if I try to run the Makefile I get the following error message: make -C lib/cpp-test/hawkes/simulation/
fatal error: tick/hawkes/simulation/hawkes_kernels/hawkes_kernel_exp.h: No such file or directory 4 | #include "tick/hawkes/simulation/hawkes_kernels/hawkes_kernel_exp.h" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Any idea what I'm doing wrong? Apologies if this is a beginner question - completely new to integrating C++/python!
I believe to utilize the cmake configurations you must go through python also such like
python3 setup.py build_ext --inplace cpptest
That didn't work either ... Am I supposed to run sh/gtest.sh perhaps?
"./sh/gtest.sh" uses the other build tool "mkn".
I believe python3 setup.py build_ext --inplace cpptest
should work, perhaps you can show me the error.
These are the steps I followed:
- Delete all the files I created using the CMakeLists.txt (I split up cpptest into makeccptest and runcpptest to make things easier)
2a) Execute python3 setup.py build_ext --inplace makecpptest which completes
2b) Execute python3 setup.py build_ext --inplace runcpptest which throws following error:
make[3]: *** [CMakeFiles/check.dir/build.make:57: CMakeFiles/check] Segmentation fault (core dumped)
make[2]: *** [CMakeFiles/Makefile2:506: CMakeFiles/check.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:513: CMakeFiles/check.dir/rule] Error 2
make: *** [Makefile:129: check] Error 2
Traceback (most recent call last):
File "setup.py", line 929, in
Segmentation fault
that's not so good.
This is probably unrelated but the .cpp files include header files such as "tick/hawkes/simulation/simu_hawkes.h" which do not exist in my repo. What command is supposed to generate the header files?
pretty sure it's already there.
file lib/include/tick/hawkes/simulation/simu_hawkes.h
lib/include/tick/hawkes/simulation/simu_hawkes.h: C++ source, ASCII text
oops my bad :-)
Summary of proposed changes
New variables
intensity_contributions
- tracks current intensity of each component per node and the baseline
- ArrayDouble2d(n_nodes,n_nodes+1)
- populated by Hawkes::update_time_shift_()
itr_contributions
- The track records of the intensity per node and the baseline
- VArrayDoublePtrList2d (n_nodes,nodes+1,timestamps)
- populated by itr_process()
Process / pseudo code
If this was python I would do something like:
for i in range(n_nodes):
for j in range(n_nodes+1):
itr_contributions[i,j].append(intensity_contributions[i,j])
Status
The C++ code doesn't like me referencing the i,j element of itr_contributions and fails with the following error
lib/cpp/hawkes/simulation/simu_point_process.cpp:81:27: error: left operand of comma operator has no effect [-Werror=unused-value]
81 | itr_contributions[i,j] = VArrayDouble::new_ptr();
| ^
lib/cpp/hawkes/simulation/simu_point_process.cpp:81:56: error: no match for ‘operator=’ (operand types are ‘__gnu_cxx::__alloc_traits<std::allocator<std::vector<std::shared_ptr<VArray
Questions
- What do you think of approach?
- How to do I refer to i,j element of itr_contributions?
That looks great, I would just add a flag to enable / disable this as it can slow down the simulation.
Concerning refering to element i,j have you tried doing itr_contributions[i][j]
?
After some investigation I found that intensity_contributions[i,j] or [i][j] does't work with ArrayDouble2d but it works with ArrayDoubleList1D.
For the flag - should I add a new track_intensity_contributions flag or use the existing track_intensity flag?
Yes you are right the current intensity flag should be enough.
I've made some progress and I now return a (node,node+1,timestamp) matrix when track_intensity = 1. The sum of the individual contributions add up to the tracked_intensity apart from t=0 as you can see if you run the new python unit test I added. I was wondering if someone would be willing to discuss the code with me, perhaps over zoom, to find out what I'm doing wrong? There are sections of the code I still don't understand.
Summary of Issues
-
The intensity at t=0 is currently set to 0 instead of the baseline. I have added init_intensity_contributions_() to simu_hawkes to replicate the init_intensity_() but I couldn't see where this is called in simu_point_process()
-
I ran the python unit tests and they pass apart from two (see below). I still can't run the cpp tests unfortunately.
====================================================================== FAIL: test_simulation_1d_inhomogeneous_poisson (tick.hawkes.simulation.tests.simu_inhomogeneous_poisson_test.Test) ...Test if simulation of a 1d inhomogeneous Poisson process
Traceback (most recent call last): File "/mnt/c/Users/souma/PycharmProjects/tick/tick/hawkes/simulation/tests/simu_inhomogeneous_poisson_test.py", line 36, in test_simulation_1d_inhomogeneous_poisson self.assertEqual(np.prod(tf.value(timestamps[0]) > 0), 1) AssertionError: 0 != 1
====================================================================== FAIL: test_track_intensity (tick.hawkes.simulation.tests.simu_point_process_test.Test) ...Test that point process intensity is tracked correctly
Traceback (most recent call last): File "/mnt/c/Users/souma/PycharmProjects/tick/tick/hawkes/simulation/tests/simu_point_process_test.py", line 88, in test_track_intensity tf_1.value(intensity_times)) File "/home/soumau/miniconda3/envs/tick_env/lib/python3.6/site-packages/numpy/testing/_private/utils.py", line 1047, in assert_array_almost_equal precision=decimal) File "/home/soumau/miniconda3/envs/tick_env/lib/python3.6/site-packages/numpy/testing/_private/utils.py", line 846, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not almost equal to 6 decimals
Mismatched elements: 347 / 348 (99.7%) Max absolute difference: 1. Max relative difference: 3.8e+13 x: array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5 , 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0. 5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0. 5, 0.5,... y: array([5.000000e-01, 5.987649e-01, 6.952869e-01, 6.975299e-01, 7.943037e-01, 8.857679e-01, 9.772321e-01, 1.061026e+00, 1.138429e+00, 1.215831e+00, 1.277047e+00, 1.334666e+00, 1.367059e+ 00, 1.390099e+00, 1.423676e+00, 1.457252e+00, 1.481183e+00, 1.488235e+00, 1.491171e+00, 1.495287e+00, 1.485128e+00, 1.465134e+00, 1.445140e+00, 1.401902e+00, 1.369592e+00, 1.356340e+00, 1.306742e+00, 1.238980e+00, 1.186962e+00, 1.171218e+00, 1.095642e+00, 1.086823e+00, 1.085701e+00, 1.010689e+00, 9.257364e-01, 8.328482e-01, 7.369841e-01, 6.411200e-01, 5.414310e-01,. ..
I fixed the errors and all python unit tests now pass on my side. However when I raised a merge request test_s_sparse_array2d_memory_leaks() is failing which is strange since I did not modify the array module...
I want to add another attribute called something like intensity_tracked_node. This would be a vector of size (timestamps) to record at which node the timestamp occurred. In the case where no tick occured (because dt = 0.01) the value would be recorded as -1. This would make it easier to extract and visualise the cascade of timestamps. What do you think?
Hey so I have just realised a big problem - we track the intensity immediately "after" the timestamp has occurred, instead of the intensity when the timestamp has occurred. If I want to trace the contagion chain I need the intensity contributions when the timestamp has occurred. Going to investigate how to best fix this.
Updated Summary of Changes
I have completely re-written the changes. SimuPointProcess() now has three new properties:
-
contribution_intensities -(nodes+1,n_total_jumps) array -records the intensity contributions only at the node where a timestamp occurred, when the timestamp occurred -this minimises the size of the array by discarding superfluous information -unfortunately it means that summing this array will not equate to tracked_intensities which could be confusing
-
contribution_nodes -(n_total_jumps) array -node where timestamp occurred
-
contribution_timestamps -(n_total_jumps) array -time that timestamp/tick occurred
The last two properties can be somewhat derived from timestamps but would require searching on a double which did not seem right
More work to be done, but this shows how the contribution_intensities can be used to generate a contagion cascade for a hawkes simulation with 100 nodes as example: