Transformer-Hawkes-Process icon indicating copy to clipboard operation
Transformer-Hawkes-Process copied to clipboard

Instructions to obtain Structured-THP datasets

Open airalcorn2 opened this issue 4 years ago • 0 comments

Could you please provide additional details on how to obtain the 911-Calls and Earthquake datasets used in your paper? The CSV found at the provided webpage has 663,522 calls, all of which are in the EMS, fire, or traffic categories. For the 75 most frequent ZIP codes in this dataset, there are 582,045 total calls, which is considerably more than the 290,293 listed in Table 1 (see below code).

import pandas as pd

df = pd.read_csv("911.csv")
print(len(df))  # 663522
cats = ["EMS: ", "Fire: ", "Traffic: "]
in_cats = 0
for title in df["title"]:
    for cat in cats:
        if cat in title:
            in_cats += 1
            break

print(in_cats)  # 663522
zip_calls = (
    df.groupby("zip")
    .size()
    .reset_index(name="n_calls")
    .sort_values("n_calls", ascending=False)
)
print(zip_calls["n_calls"][:75].sum())  # 582045

The paper also states that:

An undirected edge exists between two vertices if their zipcodes are within 10 of each other.

Does this mean two vertices were considered neighbors if abs(ZIP_{1} - ZIP_{2}) <= 10?

For the Earthquake dataset, the provided website is in Chinese and seems to host a number of datasets. Could you provide precise instructions on where to find the specific earthquake dataset used in your paper?

airalcorn2 avatar Oct 02 '21 17:10 airalcorn2