l5kit
l5kit copied to clipboard
[Error 5] Access is denied for create_chopped_dataset due to zarr append
en: window 7 , python 3.7, lyftkit 1.10, zarr 2.4, torchvision 0.7
in the example "agent_motion_prediction" at evaluation, line:
eval_base_path = create_chopped_dataset(dm.require(eval_cfg["key"]), cfg["raster_params"]["filter_agents_threshold"], num_frames_to_chop, cfg["model_params"]["future_num_frames"], MIN_FUTURE_STEPS)
4 elements of chunk dataset is appened in zarr_utils, it works well with "sample.zarr", however when i use validate.zarr (much bigger size), i got [Error 5] Access is denied at the directory of "validate_chopped_100"
i create another chunk data set without linking to any storage on harddrive and dont get this problem, thus i strongly believe it is zarr's issue of append method
i did some modifications on zarr_utils and zarr_dataset in lyftk
it to get rid of this issue, let me know if you want to me contribute or you want to see my modifications (you may optimize it better). I am willing to share and discuss
i closed a duplicate version of this one with wrong fomart
mmm..I've never experienced this issue under linux for the test.zarr (which has around the same size of validate.zarr).
i create another chunk data set without linking to any storage on harddrive and dont get this problem, thus i strongly believe it is zarr's issue of append method
Do you mean you're holding everything in RAM?
i use a local object that is similar to chunk dataset but no link to any storage in zarr_scenes_chop (zarr_utils). After the loop of append (copy). the output_dataset (linked to a storage) append the elements of the local object accordingly (append is called only one for the output_dataset)
btw, should i better use validate.zarr or test.zarr to evaluate the model?
btw, should i better use validate.zarr or test.zarr to evaluate the model?
test.zarr is used in the competition, but you don't have GT for that one stored in the zarr. So If you want to chop something and also have the GT validate.zarr
is better
i use a local object that is similar to chunk dataset but no link to any storage in zarr_scenes_chop (zarr_utils). After the loop of append (copy). the output_dataset (linked to a storage) append the elements of the local object accordingly (append is called only one for the output_dataset)
Can you maybe link here your code?
i use a local object that is similar to chunk dataset but no link to any storage in zarr_scenes_chop (zarr_utils). After the loop of append (copy). the output_dataset (linked to a storage) append the elements of the local object accordingly (append is called only one for the output_dataset)
Can you maybe link here your code?
in zarr_utils.py
for idx in tqdm(range(len(input_dataset.scenes)), desc="copying"):
# get data and immediately chop frames, agents and traffic lights
scene = input_dataset.scenes[idx]
first_frame_idx = scene["frame_index_interval"][0]
frames = input_dataset.frames[first_frame_idx : first_frame_idx + num_frames_to_copy]
agents = input_dataset.agents[get_agents_slice_from_frames(*frames[[0, -1]])]
tl_faces = input_dataset.tl_faces[get_tl_faces_slice_from_frames(*frames[[0, -1]])]
# reset interval relative to our output (subtract current history and add output history)
scene["frame_index_interval"][0] = cur_frame_idx
scene["frame_index_interval"][1] = cur_frame_idx + num_frames_to_copy # address for less frames
frames["agent_index_interval"] += cur_agent_idx - frames[0]["agent_index_interval"][0]
frames["traffic_light_faces_index_interval"] += (
cur_tl_face_idx - frames[0]["traffic_light_faces_index_interval"][0]
)
temp_dataset.scenes.append(scene[None, ...])
temp_dataset.frames.append(frames)
temp_dataset.agents.append(agents)
temp_dataset.tl_faces.append(tl_faces)
cur_scene_idx += len(scene)
cur_frame_idx += len(frames)
cur_agent_idx += len(agents)
cur_tl_face_idx += len(tl_faces)
output_dataset.scenes.append(temp_dataset.scenes)
output_dataset.frames.append(temp_dataset.frames)
output_dataset.agents.append(temp_dataset.agents)
output_dataset.tl_faces.append(temp_dataset.tl_faces)
output_dataset is the one linked to the storage on harddrive and temp_dataset's constructor is almost similar to chunk dataset, only a tiny difference here in zarr_dataset.py in
def initialize(
self, mode: str = "w", num_scenes: int = 0, num_frames: int = 0, num_agents: int = 0, num_tl_faces: int = 0
) -> "ChunkedDataset":
instead of
self.root = zarr.open_group(self.path, mode=mode)it is
self.root = zarr.open_group( 'w') # not linked to any storage`