tick icon indicating copy to clipboard operation
tick copied to clipboard

Methods needed to assess fit of Hawkes Process

Open sharonxu opened this issue 7 years ago • 6 comments

I think it would be extremely helpful to have methods to assess the fit of a Hawkes Process, namely:

  1. Residuals. To assess fit, we would like to check that the residual process is homogenous and has exponentially distributed inter-event times (F. Lorenzen: Analysis of Order Clustering Using High Frequency Data: A Point Process Approach pdf).

  2. Conditional intensity function given the history/timestamped data (+1 for plotting functionality with respect to the corresponding timestamped data).

sharonxu avatar Nov 21 '17 15:11 sharonxu

Hi, indeed it looks pretty interesting, we simply did not have enough time for that yet. I can help you if you want to add one of these features, otherwise, I might implement them but maybe not in a very near future.

Mbompr avatar Nov 24 '17 10:11 Mbompr

This is certainly interesting, let's schedule this for a milestone Version 0.5

stephanegaiffas avatar Jan 16 '18 16:01 stephanegaiffas

Hi, I have implemented a workaround to asses the quality of the fits. Not perfect, definitely slow but should work. See: https://github.com/majkee15/TICK-goodness-of-fit

majkee15 avatar May 22 '19 12:05 majkee15

Thanks @majkee15, I just used it on my 1d data. It seems to work fine.

mvicrob avatar May 25 '19 20:05 mvicrob

@majkee15 / @mvicrob

Would either of you be able to provide me with a small script which tests this so I can see about adding it

PhilipDeegan avatar May 26 '19 12:05 PhilipDeegan

In fact, the notebook provided by @majkee15 is probably what you are looking for https://github.com/majkee15/TICK-goodness-of-fit/blob/master/time_transform_with_tick.ipynb.

I slightly modified this notebook for my need:

  • For a 1d Hawkes process (I'm interested in an intertrade time series)
  • With Statsmodel QQ plot instead of Scipy Probplot (because of https://stackoverflow.com/questions/48108582/how-to-interpret-scipy-stats-probplot-results)
  • With a loop on potential decays, as in the example of https://stats.stackexchange.com/questions/366381/fit-hawkes-process-to-1d-data-using-python-package-tick

Below if a very draft idea of the result (tested with Python 3.6):

import matplotlib.pyplot as plt
import numpy as np
from tick.hawkes import SimuHawkesExpKernels, HawkesExpKern
from tick.plot import plot_point_process
from scipy import integrate
from scipy import stats
from tick.plot import plot_basis_kernels, plot_hawkes_kernels
import statsmodels.api as sm

plt.style.use('ggplot')


# Part 1: simulate and fit
###############################################################################################################################

end_time = 30000
decay = 0.3
decay_candidates = [0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0, 1.5, 2.0]
baseline = [0.12]
adjacency = [[0.1]]

hawkes_exp_kernels = SimuHawkesExpKernels(adjacency=adjacency, decays=decay, baseline=baseline, end_time=end_time, verbose=False, seed=1039)
hawkes_exp_kernels.track_intensity(0.1)
hawkes_exp_kernels.simulate()

best_score = -1e100
best_hawkes_learner = None
for i, decay in enumerate(decay_candidates):
    hawkes_learner = HawkesExpKern(decay, penalty='l1', C=10, gofit='likelihood', verbose=True, tol=1e-11, solver='svrg', step=1e-5, max_iter=10000)
    hawkes_learner.fit(hawkes_exp_kernels.timestamps)
    
    hawkes_score = hawkes_learner.score()
    if hawkes_score > best_score:
        print(f'Obtained {hawkes_score} with {decay}')
        best_hawkes_learner = hawkes_learner
        best_score = hawkes_score
        best_decay = decay

print('Best decay is:' + str(best_decay))
t_min = 100
t_max = 200
fig, ax = plt.subplots(1, 1, figsize=(10, 6))
best_hawkes_learner.plot_estimated_intensity(hawkes_exp_kernels.timestamps, t_min=t_min, t_max=t_max, ax=ax)
plot_point_process(hawkes_exp_kernels, plot_intensity=True, t_min=t_min, t_max=t_max, ax=ax)

# Enhance plot
# Set labels to plots
ax.lines[0].set_label('Estimated')
ax.lines[1].set_label('Original')

# Change original intensity style
ax.lines[1].set_linestyle('--')
ax.set_alpha(0.8)

# avoid duplication of scatter plots of events
ax.collections[1].set_alpha(0)

ax.legend()
plt.title("1d Hawkes process with exponential kernel and it's likelihood best fit")

fig.tight_layout()
plt.savefig('Estimated_vs_Original.pdf', bbox_inches='tight', dpi=500)


# Part 2: residuals process computation and their QQ plot
###############################################################################################################################

arrivals = hawkes_exp_kernels.timestamps
step = 0.01
intensities = best_hawkes_learner.estimated_intensity(arrivals,step)[0]
intensities_times = best_hawkes_learner.estimated_intensity(arrivals,step)[1]

def resid(intensities_times, intensities, timestamps, dim, method):
    arrivals = timestamps[dim]
    thetas = np.zeros(len(arrivals)-1)
    ints = intensities[dim]
    for i in range(1, len(arrivals)):
        mask = (intensities_times <= arrivals[i]) & (intensities_times >= arrivals[i-1])
        xs = intensities_times[mask]
        ys = ints[mask]
        try:
            thetas[i - 1] = method(ys,xs)
        except:
            thetas[i - 1] = np.nan
    return thetas

def goodness_of_fit_par(learner, arrivals, step, method):
    dimension = learner.n_nodes
    intensities = learner.estimated_intensity(arrivals, step)[0]
    intensities_times = learner.estimated_intensity(arrivals, step)[1]
    residuals = [resid(intensities_times, intensities, arrivals, dim, method) for dim in range(dimension)]
    return residuals

def plot_resid(res):
    fig, ax = plt.subplots(nrows=1, ncols=1)
    fig.subplots_adjust(hspace=0.5)
    fig.suptitle('Goodness-of-fit for parametric Hawkes process')
    
    # QQ-plot
    probplot = sm.ProbPlot(res, stats.expon, fit=True)
    fig, ax = plt.subplots()
    fig.suptitle('QQ Plot', fontsize=20)
    probplot.qqplot(line='45')
    plt.savefig('QQPlot.pdf', bbox_inches='tight', dpi=500)

residuals = goodness_of_fit_par(best_hawkes_learner, arrivals, step, integrate.simps)
plot_resid(residuals[0])

mvicrob avatar May 26 '19 13:05 mvicrob