cuspatial
cuspatial copied to clipboard
[BUG]`derive_trajectories` throws cudaErrorIllegalAddress
On passing columns from cudf. DataFrame into the derive_trajectories
function. I get the following error:
RuntimeError: scan failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
Dataset:
The dataset used is schema_HWY_20_AND_LOCUST-filtered.json. The link to download this dataset is provided in cuspatial/data
folder.
Length of the column being passed is 1305760.
However, when I reduce the length of the column by half. The above error disappears. The maximum data i can pass is half of the datatset, anything more than that cause the above mentioned error.
Environment detail : I am running the code on a 32GB GV100 which has CUDA 11.0 installed on it. In addition, I installed the cuspatial library from source and conda installed the latest version of cudf .
Reproducible code: Code passing in the entire 75% of the dataset and the stack trace:
In [1]:
...: import cudf
...: import cuspatial
...: df = cudf.read_json(
...: 'data/schema_HWY_20_AND_LOCUST-filtered.json', lines=True
...: )[["object", "@timestamp", "location", "lon", "alt"]]
...: # cudf.read_json has a few bugs reading nested JSON, so clean up a few
...: # names and cast to the correct dtypes
...: df = cudf.DataFrame({
...: "longitude": df["lon"],
...: "altitude": df["alt"].str.slice(0, -1).astype("float64"),
...: "object_id": df["object"].str.slice(len('{"id":"')).astype("int32"),
...: "latitude": df["location"].str.slice(len('{"lat":')).astype("float64"),
...: "timestamp": df["@timestamp"].str.replace('-', '') \
...: .str.replace('T', ' ') \
...: .str.replace('Z', '') \
...: .astype("datetime64[ms]")
...: })[["object_id", "longitude", "latitude", "altitude", "timestamp"]]
...: print(df.dtypes)
...: print("")
...: print(df)
...: l = int(len(df)*0.75)
...: ys = df.longitude[0:l]
...: xs = df.latitude[0:l]
...: ts = df.timestamp[0:l]
...: ids = df.object_id[0:l]
...:
...: num_traj, trajectories = cuspatial.derive_trajectories(ids, xs, ys, ts)
object_id int32
longitude float64
latitude float64
altitude float64
timestamp datetime64[ms]
dtype: object
object_id longitude latitude altitude timestamp
0 16820 0.0 0.0 0.0 2018-04-11 11:59:59.420
1 16821 0.0 0.0 0.0 2018-04-11 11:59:59.420
2 16822 0.0 0.0 0.0 2018-04-11 11:59:59.420
3 16823 0.0 0.0 0.0 2018-04-11 11:59:59.420
4 16824 0.0 0.0 0.0 2018-04-11 11:59:59.420
... ... ... ... ... ...
1305756 46694 0.0 0.0 0.0 2018-04-11 12:54:59.608
1305757 46689 0.0 0.0 0.0 2018-04-11 12:54:59.608
1305758 46685 0.0 0.0 0.0 2018-04-11 12:54:59.608
1305759 46679 0.0 0.0 0.0 2018-04-11 12:54:59.608
1305760 46637 0.0 0.0 0.0 2018-04-11 12:54:59.608
[1305761 rows x 5 columns]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-144163186541> in <module>
25 ids = df.object_id[0:l]
26
---> 27 num_traj, trajectories = cuspatial.derive_trajectories(ids, xs, ys, ts)
~/miniconda3/envs/cuspatial/lib/python3.8/site-packages/cuspatial/core/trajectory.py in derive_trajectories(object_ids, xs, ys, timestamps)
66 xs, ys = normalize_point_columns(as_column(xs), as_column(ys))
67 timestamps = normalize_timestamp_column(as_column(timestamps))
---> 68 objects, traj_offsets = cpp_derive_trajectories(
69 object_ids, xs, ys, timestamps
70 )
cuspatial/_lib/trajectory.pyx in cuspatial._lib.trajectory.derive_trajectories()
cuspatial/_lib/trajectory.pyx in cuspatial._lib.trajectory.derive_trajectories()
RuntimeError: scan failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
@Salonijain27 Those object_id
and lat
/lon
columns look suspiciously like bad data. It's possible cudf.read_json
has changed since we wrote that import logic, so we should double check whether the ingestion is still working correctly.
@Salonijain27 this code will extract the lats and lons that you want from the above file. cudf.read_json
is only sufficient for reading json objects with no object nesting. You'll have to use a host based json reader until it is fixed.
import cuspatial
import json
import numpy as np
data = json.load(open('schema_HWY_20_AND_LOCUST-filtered.json'))
lats = np.zeros(len(data), dtype="float64")
lons = np.zeros(len(data), dtype="float64")
for i in range(len(data)):
lats[i] = data[i]['object']['location']['lat']
lons[i] = data[i]['object']['location']['lon']
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
When I try to repro this, I get a segfault in cudf.read_json
. I suspect the file is too big.