tick icon indicating copy to clipboard operation
tick copied to clipboard

Question: Separate background from trigerred events in Hawkes simulation

Open sumau opened this issue 4 years ago • 31 comments

Your library is great! I am using SimuHawkesExpKernels to simulate a Hawkes process in a network. I was wondering if it was possible to separate events/timestamps caused by the baseline intensities and the events/timestamps caused by the kernel functions? And ideally which events triggered which events?

sumau avatar May 02 '20 10:05 sumau

Hello, No sadly, there is no such a thing. I could be possible to add it in the simulation process by modifying this function https://github.com/X-DataInitiative/tick/blob/master/lib/cpp/hawkes/simulation/simu_point_process.cpp#L107. But, to separate what was caused by the baseline, you can have a look at the intensity λ at the time of the event (using track_intensity) and look at the ratio between the baseline μ and λ. This event was caused by the baseline with a probability μ / λ.

Mbompr avatar May 04 '20 07:05 Mbompr

Ok thanks for letting me know! I'll have a look at modifying the function. If I'm successful do you accept merge requests?

sumau avatar May 04 '20 14:05 sumau

Oh yes, they are very welcome :)

Mbompr avatar May 04 '20 15:05 Mbompr

So I've been doing some digging. From my understanding you do not create a set of base events/timestamps and then calculate the triggered events/timestamps. Instead, you calculate the time until the next event based on the current intensity which is the sum of the base intensity plus contributions from kernels. Hence I don't think it's possible to pinpoint which event triggered further events. Am I correct?

Instead, what do you think of separating the contributions from the kernels and the base? So we add a new parameter called tracked_intensity_node which is a matrix of [nodes,nodes,time] and has the intensity from all nodes j to a node i at time t

sumau avatar May 07 '20 01:05 sumau

To me it is strictly equivalent to assign the simulated event to a node / baseline at simulation time or afterwards, by recalculating the contributions while simulation in finished.

Indeed, such a tracked_intensity_per_node could be helpful to have a full vision of what happened. It should have its own flag and be disabled by default (since it should slow down things a lot) but is probably the cleanest way to obtain the branching structure of the simulation.

Would you need any help for this ?

Mbompr avatar May 07 '20 07:05 Mbompr

Yes please! Let me do some more research and will come back with list of questions :-)

On Thu, 7 May 2020, 08:00 Martin, [email protected] wrote:

To me it is strictly equivalent to assign the simulated event to a node / baseline at simulation time or afterwards, by recalculating the contributions while simulation in finished.

Indeed, such a tracked_intensity_per_node could be helpful to have a full vision of what happened. It should have its own flag and be disabled by default (since it should slow down things a lot) but is probably the cleanest way to obtain the branching structure of the simulation.

Would you need any help for this ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/X-DataInitiative/tick/issues/438#issuecomment-625069473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFII2WC4RKZW4AFM2QUUEKLRQJMCJANCNFSM4MXUFIVA .

sumau avatar May 07 '20 10:05 sumau

Hey so I just started to work through your developer documentation and familiarising myself with swig. I'm currently re-building the extensions every time I modify the cpp code which takes ages with python setup.py build_ext --inplace. Is there a trick to speeding this up? If I'm only modifying simu_hawkes.cpp file the build process is intelligent enough to build the hawkes directory and ignores the other modules. Is there a way of only building simu_hawkes.cpp and not the other files in the hawkes directory?

sumau avatar May 10 '20 01:05 sumau

The easiest way to iterate quickly is to test it directly in C++ (for example by modifying this file https://github.com/X-DataInitiative/tick/blob/master/lib/cpp-test/hawkes/simulation/hawkes_simulation.cpp). Otherwise, to speed up the compilations you can use ccache but @Dekken would explain this much better than I would.

Mbompr avatar May 10 '20 14:05 Mbompr

CC="ccache g++" python3 setup.py build_ext --inplace

this should work

PhilipDeegan avatar May 10 '20 17:05 PhilipDeegan

alternatively there's another build tool support which is explained here: https://github.com/X-DataInitiative/tick/blob/master/INSTALL.md#alternative-installation-method

so you could just do

./sh/mkn.sh hawkes/simulation

to build just hawkes simulation

this open PR might be useful https://github.com/X-DataInitiative/tick/pull/440/files

PhilipDeegan avatar May 10 '20 17:05 PhilipDeegan

Thanks for the quick responses! I decided to go with ccache since it seemed the most straightforward and works really well. I've added a line to INSTALL.md.

sumau avatar May 10 '20 22:05 sumau

The easiest way to iterate quickly is to test it directly in C++ (for example by modifying this file https://github.com/X-DataInitiative/tick/blob/master/lib/cpp-test/hawkes/simulation/hawkes_simulation.cpp). Otherwise, to speed up the compilations you can use ccache but @Dekken would explain this much better than I would.

I also tried this approach since I will have to run the unit tests anyway.

I first installed gtest using this approach instead of cloning gtest: gtest

I then compiled lib/cpp-test/hawkes/simulation/CMakeLists.txt: cmake lib/cpp-test/hawkes/simulation/CMakeLists.txt

This creates a few files include a Makefile within the simulation directory as expected.

However if I try to run the Makefile I get the following error message: make -C lib/cpp-test/hawkes/simulation/

fatal error: tick/hawkes/simulation/hawkes_kernels/hawkes_kernel_exp.h: No such file or directory 4 | #include "tick/hawkes/simulation/hawkes_kernels/hawkes_kernel_exp.h" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Any idea what I'm doing wrong? Apologies if this is a beginner question - completely new to integrating C++/python!

sumau avatar May 11 '20 01:05 sumau

I believe to utilize the cmake configurations you must go through python also such like

python3 setup.py build_ext --inplace cpptest

PhilipDeegan avatar May 11 '20 08:05 PhilipDeegan

That didn't work either ... Am I supposed to run sh/gtest.sh perhaps?

sumau avatar May 11 '20 11:05 sumau

"./sh/gtest.sh" uses the other build tool "mkn".

I believe python3 setup.py build_ext --inplace cpptest should work, perhaps you can show me the error.

PhilipDeegan avatar May 11 '20 11:05 PhilipDeegan

These are the steps I followed:

  1. Delete all the files I created using the CMakeLists.txt (I split up cpptest into makeccptest and runcpptest to make things easier)

2a) Execute python3 setup.py build_ext --inplace makecpptest which completes

2b) Execute python3 setup.py build_ext --inplace runcpptest which throws following error:

make[3]: *** [CMakeFiles/check.dir/build.make:57: CMakeFiles/check] Segmentation fault (core dumped) make[2]: *** [CMakeFiles/Makefile2:506: CMakeFiles/check.dir/all] Error 2 make[1]: *** [CMakeFiles/Makefile2:513: CMakeFiles/check.dir/rule] Error 2 make: *** [Makefile:129: check] Error 2 Traceback (most recent call last): File "setup.py", line 929, in 'License :: OSI Approved :: BSD License'], File "/home/soumau/miniconda3/envs/tick_env/lib/python3.6/site-packages/setuptools/init.py", line 144, in setup return distutils.core.setup(**attrs) File "/home/soumau/miniconda3/envs/tick_env/lib/python3.6/distutils/core.py", line 148, in setup dist.run_commands() File "/home/soumau/miniconda3/envs/tick_env/lib/python3.6/distutils/dist.py", line 955, in run_commands self.run_command(cmd) File "/home/soumau/miniconda3/envs/tick_env/lib/python3.6/distutils/dist.py", line 974, in run_command cmd_obj.run() File "setup.py", line 721, in run subprocess.check_call(make_cmd, cwd=self.cpp_build_dir) File "/home/soumau/miniconda3/envs/tick_env/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['make', 'check']' returned non-zero exit status 2.

sumau avatar May 11 '20 13:05 sumau

Segmentation fault

that's not so good.

PhilipDeegan avatar May 11 '20 13:05 PhilipDeegan

This is probably unrelated but the .cpp files include header files such as "tick/hawkes/simulation/simu_hawkes.h" which do not exist in my repo. What command is supposed to generate the header files?

sumau avatar May 11 '20 14:05 sumau

pretty sure it's already there.

file lib/include/tick/hawkes/simulation/simu_hawkes.h
lib/include/tick/hawkes/simulation/simu_hawkes.h: C++ source, ASCII text

PhilipDeegan avatar May 11 '20 14:05 PhilipDeegan

oops my bad :-)

sumau avatar May 11 '20 14:05 sumau

Summary of proposed changes

New variables

intensity_contributions

  • tracks current intensity of each component per node and the baseline
  • ArrayDouble2d(n_nodes,n_nodes+1)
  • populated by Hawkes::update_time_shift_()

itr_contributions

  • The track records of the intensity per node and the baseline
  • VArrayDoublePtrList2d (n_nodes,nodes+1,timestamps)
  • populated by itr_process()

Process / pseudo code

If this was python I would do something like:

for i in range(n_nodes):
  for j in range(n_nodes+1):     
     itr_contributions[i,j].append(intensity_contributions[i,j])

Status

The C++ code doesn't like me referencing the i,j element of itr_contributions and fails with the following error

lib/cpp/hawkes/simulation/simu_point_process.cpp:81:27: error: left operand of comma operator has no effect [-Werror=unused-value] 81 | itr_contributions[i,j] = VArrayDouble::new_ptr(); | ^ lib/cpp/hawkes/simulation/simu_point_process.cpp:81:56: error: no match for ‘operator=’ (operand types are ‘__gnu_cxx::__alloc_traits<std::allocator<std::vector<std::shared_ptr<VArray > > >, std::vector<std::shared_ptr<VArray > > >::value_type’ {aka ‘std::vector<std::shared_ptr<VArray > >’} and ‘std::shared_ptr<VArray >’) 81 | itr_contributions[i,j] = VArrayDouble::new_ptr();

Questions

  1. What do you think of approach?
  2. How to do I refer to i,j element of itr_contributions?

sumau avatar May 11 '20 22:05 sumau

That looks great, I would just add a flag to enable / disable this as it can slow down the simulation. Concerning refering to element i,j have you tried doing itr_contributions[i][j] ?

Mbompr avatar May 12 '20 07:05 Mbompr

After some investigation I found that intensity_contributions[i,j] or [i][j] does't work with ArrayDouble2d but it works with ArrayDoubleList1D.

For the flag - should I add a new track_intensity_contributions flag or use the existing track_intensity flag?

sumau avatar May 12 '20 15:05 sumau

Yes you are right the current intensity flag should be enough.

Mbompr avatar May 12 '20 16:05 Mbompr

I've made some progress and I now return a (node,node+1,timestamp) matrix when track_intensity = 1. The sum of the individual contributions add up to the tracked_intensity apart from t=0 as you can see if you run the new python unit test I added. I was wondering if someone would be willing to discuss the code with me, perhaps over zoom, to find out what I'm doing wrong? There are sections of the code I still don't understand.

Summary of Issues

  1. The intensity at t=0 is currently set to 0 instead of the baseline. I have added init_intensity_contributions_() to simu_hawkes to replicate the init_intensity_() but I couldn't see where this is called in simu_point_process()

  2. I ran the python unit tests and they pass apart from two (see below). I still can't run the cpp tests unfortunately.

====================================================================== FAIL: test_simulation_1d_inhomogeneous_poisson (tick.hawkes.simulation.tests.simu_inhomogeneous_poisson_test.Test) ...Test if simulation of a 1d inhomogeneous Poisson process

Traceback (most recent call last): File "/mnt/c/Users/souma/PycharmProjects/tick/tick/hawkes/simulation/tests/simu_inhomogeneous_poisson_test.py", line 36, in test_simulation_1d_inhomogeneous_poisson self.assertEqual(np.prod(tf.value(timestamps[0]) > 0), 1) AssertionError: 0 != 1

====================================================================== FAIL: test_track_intensity (tick.hawkes.simulation.tests.simu_point_process_test.Test) ...Test that point process intensity is tracked correctly

Traceback (most recent call last): File "/mnt/c/Users/souma/PycharmProjects/tick/tick/hawkes/simulation/tests/simu_point_process_test.py", line 88, in test_track_intensity tf_1.value(intensity_times)) File "/home/soumau/miniconda3/envs/tick_env/lib/python3.6/site-packages/numpy/testing/_private/utils.py", line 1047, in assert_array_almost_equal precision=decimal) File "/home/soumau/miniconda3/envs/tick_env/lib/python3.6/site-packages/numpy/testing/_private/utils.py", line 846, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not almost equal to 6 decimals

Mismatched elements: 347 / 348 (99.7%) Max absolute difference: 1. Max relative difference: 3.8e+13 x: array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5 , 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0. 5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0. 5, 0.5,... y: array([5.000000e-01, 5.987649e-01, 6.952869e-01, 6.975299e-01, 7.943037e-01, 8.857679e-01, 9.772321e-01, 1.061026e+00, 1.138429e+00, 1.215831e+00, 1.277047e+00, 1.334666e+00, 1.367059e+ 00, 1.390099e+00, 1.423676e+00, 1.457252e+00, 1.481183e+00, 1.488235e+00, 1.491171e+00, 1.495287e+00, 1.485128e+00, 1.465134e+00, 1.445140e+00, 1.401902e+00, 1.369592e+00, 1.356340e+00, 1.306742e+00, 1.238980e+00, 1.186962e+00, 1.171218e+00, 1.095642e+00, 1.086823e+00, 1.085701e+00, 1.010689e+00, 9.257364e-01, 8.328482e-01, 7.369841e-01, 6.411200e-01, 5.414310e-01,. ..


sumau avatar May 13 '20 00:05 sumau

I fixed the errors and all python unit tests now pass on my side. However when I raised a merge request test_s_sparse_array2d_memory_leaks() is failing which is strange since I did not modify the array module...

sumau avatar May 14 '20 02:05 sumau

I want to add another attribute called something like intensity_tracked_node. This would be a vector of size (timestamps) to record at which node the timestamp occurred. In the case where no tick occured (because dt = 0.01) the value would be recorded as -1. This would make it easier to extract and visualise the cascade of timestamps. What do you think?

sumau avatar May 14 '20 17:05 sumau

Hey so I have just realised a big problem - we track the intensity immediately "after" the timestamp has occurred, instead of the intensity when the timestamp has occurred. If I want to trace the contagion chain I need the intensity contributions when the timestamp has occurred. Going to investigate how to best fix this.

sumau avatar May 15 '20 12:05 sumau

Updated Summary of Changes

I have completely re-written the changes. SimuPointProcess() now has three new properties:

  1. contribution_intensities -(nodes+1,n_total_jumps) array -records the intensity contributions only at the node where a timestamp occurred, when the timestamp occurred -this minimises the size of the array by discarding superfluous information -unfortunately it means that summing this array will not equate to tracked_intensities which could be confusing

  2. contribution_nodes -(n_total_jumps) array -node where timestamp occurred

  3. contribution_timestamps -(n_total_jumps) array -time that timestamp/tick occurred

The last two properties can be somewhat derived from timestamps but would require searching on a double which did not seem right

sumau avatar May 15 '20 16:05 sumau

More work to be done, but this shows how the contribution_intensities can be used to generate a contagion cascade for a hawkes simulation with 100 nodes as example:

nx_test

sumau avatar May 15 '20 18:05 sumau