tick
tick copied to clipboard
Methods needed to assess fit of Hawkes Process
I think it would be extremely helpful to have methods to assess the fit of a Hawkes Process, namely:
-
Residuals. To assess fit, we would like to check that the residual process is homogenous and has exponentially distributed inter-event times (F. Lorenzen: Analysis of Order Clustering Using High Frequency Data: A Point Process Approach pdf).
-
Conditional intensity function given the history/timestamped data (+1 for plotting functionality with respect to the corresponding timestamped data).
Hi, indeed it looks pretty interesting, we simply did not have enough time for that yet. I can help you if you want to add one of these features, otherwise, I might implement them but maybe not in a very near future.
This is certainly interesting, let's schedule this for a milestone Version 0.5
Hi, I have implemented a workaround to asses the quality of the fits. Not perfect, definitely slow but should work. See: https://github.com/majkee15/TICK-goodness-of-fit
Thanks @majkee15, I just used it on my 1d data. It seems to work fine.
@majkee15 / @mvicrob
Would either of you be able to provide me with a small script which tests this so I can see about adding it
In fact, the notebook provided by @majkee15 is probably what you are looking for https://github.com/majkee15/TICK-goodness-of-fit/blob/master/time_transform_with_tick.ipynb.
I slightly modified this notebook for my need:
- For a 1d Hawkes process (I'm interested in an intertrade time series)
- With Statsmodel QQ plot instead of Scipy Probplot (because of https://stackoverflow.com/questions/48108582/how-to-interpret-scipy-stats-probplot-results)
- With a loop on potential decays, as in the example of https://stats.stackexchange.com/questions/366381/fit-hawkes-process-to-1d-data-using-python-package-tick
Below if a very draft idea of the result (tested with Python 3.6):
import matplotlib.pyplot as plt
import numpy as np
from tick.hawkes import SimuHawkesExpKernels, HawkesExpKern
from tick.plot import plot_point_process
from scipy import integrate
from scipy import stats
from tick.plot import plot_basis_kernels, plot_hawkes_kernels
import statsmodels.api as sm
plt.style.use('ggplot')
# Part 1: simulate and fit
###############################################################################################################################
end_time = 30000
decay = 0.3
decay_candidates = [0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0, 1.5, 2.0]
baseline = [0.12]
adjacency = [[0.1]]
hawkes_exp_kernels = SimuHawkesExpKernels(adjacency=adjacency, decays=decay, baseline=baseline, end_time=end_time, verbose=False, seed=1039)
hawkes_exp_kernels.track_intensity(0.1)
hawkes_exp_kernels.simulate()
best_score = -1e100
best_hawkes_learner = None
for i, decay in enumerate(decay_candidates):
hawkes_learner = HawkesExpKern(decay, penalty='l1', C=10, gofit='likelihood', verbose=True, tol=1e-11, solver='svrg', step=1e-5, max_iter=10000)
hawkes_learner.fit(hawkes_exp_kernels.timestamps)
hawkes_score = hawkes_learner.score()
if hawkes_score > best_score:
print(f'Obtained {hawkes_score} with {decay}')
best_hawkes_learner = hawkes_learner
best_score = hawkes_score
best_decay = decay
print('Best decay is:' + str(best_decay))
t_min = 100
t_max = 200
fig, ax = plt.subplots(1, 1, figsize=(10, 6))
best_hawkes_learner.plot_estimated_intensity(hawkes_exp_kernels.timestamps, t_min=t_min, t_max=t_max, ax=ax)
plot_point_process(hawkes_exp_kernels, plot_intensity=True, t_min=t_min, t_max=t_max, ax=ax)
# Enhance plot
# Set labels to plots
ax.lines[0].set_label('Estimated')
ax.lines[1].set_label('Original')
# Change original intensity style
ax.lines[1].set_linestyle('--')
ax.set_alpha(0.8)
# avoid duplication of scatter plots of events
ax.collections[1].set_alpha(0)
ax.legend()
plt.title("1d Hawkes process with exponential kernel and it's likelihood best fit")
fig.tight_layout()
plt.savefig('Estimated_vs_Original.pdf', bbox_inches='tight', dpi=500)
# Part 2: residuals process computation and their QQ plot
###############################################################################################################################
arrivals = hawkes_exp_kernels.timestamps
step = 0.01
intensities = best_hawkes_learner.estimated_intensity(arrivals,step)[0]
intensities_times = best_hawkes_learner.estimated_intensity(arrivals,step)[1]
def resid(intensities_times, intensities, timestamps, dim, method):
arrivals = timestamps[dim]
thetas = np.zeros(len(arrivals)-1)
ints = intensities[dim]
for i in range(1, len(arrivals)):
mask = (intensities_times <= arrivals[i]) & (intensities_times >= arrivals[i-1])
xs = intensities_times[mask]
ys = ints[mask]
try:
thetas[i - 1] = method(ys,xs)
except:
thetas[i - 1] = np.nan
return thetas
def goodness_of_fit_par(learner, arrivals, step, method):
dimension = learner.n_nodes
intensities = learner.estimated_intensity(arrivals, step)[0]
intensities_times = learner.estimated_intensity(arrivals, step)[1]
residuals = [resid(intensities_times, intensities, arrivals, dim, method) for dim in range(dimension)]
return residuals
def plot_resid(res):
fig, ax = plt.subplots(nrows=1, ncols=1)
fig.subplots_adjust(hspace=0.5)
fig.suptitle('Goodness-of-fit for parametric Hawkes process')
# QQ-plot
probplot = sm.ProbPlot(res, stats.expon, fit=True)
fig, ax = plt.subplots()
fig.suptitle('QQ Plot', fontsize=20)
probplot.qqplot(line='45')
plt.savefig('QQPlot.pdf', bbox_inches='tight', dpi=500)
residuals = goodness_of_fit_par(best_hawkes_learner, arrivals, step, integrate.simps)
plot_resid(residuals[0])