l5kit icon indicating copy to clipboard operation
l5kit copied to clipboard

[Error 5] Access is denied for create_chopped_dataset due to zarr append

Open quangsonle opened this issue 4 years ago • 6 comments

en: window 7 , python 3.7, lyftkit 1.10, zarr 2.4, torchvision 0.7

in the example "agent_motion_prediction" at evaluation, line: eval_base_path = create_chopped_dataset(dm.require(eval_cfg["key"]), cfg["raster_params"]["filter_agents_threshold"], num_frames_to_chop, cfg["model_params"]["future_num_frames"], MIN_FUTURE_STEPS) 4 elements of chunk dataset is appened in zarr_utils, it works well with "sample.zarr", however when i use validate.zarr (much bigger size), i got [Error 5] Access is denied at the directory of "validate_chopped_100" i create another chunk data set without linking to any storage on harddrive and dont get this problem, thus i strongly believe it is zarr's issue of append method i did some modifications on zarr_utils and zarr_dataset in lyftk it to get rid of this issue, let me know if you want to me contribute or you want to see my modifications (you may optimize it better). I am willing to share and discuss

quangsonle avatar Oct 15 '20 07:10 quangsonle

i closed a duplicate version of this one with wrong fomart

quangsonle avatar Oct 15 '20 07:10 quangsonle

mmm..I've never experienced this issue under linux for the test.zarr (which has around the same size of validate.zarr).

i create another chunk data set without linking to any storage on harddrive and dont get this problem, thus i strongly believe it is zarr's issue of append method

Do you mean you're holding everything in RAM?

lucabergamini avatar Oct 15 '20 08:10 lucabergamini

i use a local object that is similar to chunk dataset but no link to any storage in zarr_scenes_chop (zarr_utils). After the loop of append (copy). the output_dataset (linked to a storage) append the elements of the local object accordingly (append is called only one for the output_dataset)

btw, should i better use validate.zarr or test.zarr to evaluate the model?

quangsonle avatar Oct 15 '20 09:10 quangsonle

btw, should i better use validate.zarr or test.zarr to evaluate the model?

test.zarr is used in the competition, but you don't have GT for that one stored in the zarr. So If you want to chop something and also have the GT validate.zarr is better

lucabergamini avatar Oct 15 '20 11:10 lucabergamini

i use a local object that is similar to chunk dataset but no link to any storage in zarr_scenes_chop (zarr_utils). After the loop of append (copy). the output_dataset (linked to a storage) append the elements of the local object accordingly (append is called only one for the output_dataset)

Can you maybe link here your code?

lucabergamini avatar Oct 15 '20 11:10 lucabergamini

i use a local object that is similar to chunk dataset but no link to any storage in zarr_scenes_chop (zarr_utils). After the loop of append (copy). the output_dataset (linked to a storage) append the elements of the local object accordingly (append is called only one for the output_dataset)

Can you maybe link here your code?

in zarr_utils.py

    for idx in tqdm(range(len(input_dataset.scenes)), desc="copying"):
        # get data and immediately chop frames, agents and traffic lights
        scene = input_dataset.scenes[idx]
        first_frame_idx = scene["frame_index_interval"][0]
        frames = input_dataset.frames[first_frame_idx : first_frame_idx + num_frames_to_copy]
        agents = input_dataset.agents[get_agents_slice_from_frames(*frames[[0, -1]])]
        tl_faces = input_dataset.tl_faces[get_tl_faces_slice_from_frames(*frames[[0, -1]])]
        # reset interval relative to our output (subtract current history and add output history)
        scene["frame_index_interval"][0] = cur_frame_idx
        scene["frame_index_interval"][1] = cur_frame_idx + num_frames_to_copy  # address for less frames
        frames["agent_index_interval"] += cur_agent_idx - frames[0]["agent_index_interval"][0]
        frames["traffic_light_faces_index_interval"] += (
            cur_tl_face_idx - frames[0]["traffic_light_faces_index_interval"][0]
        )
        temp_dataset.scenes.append(scene[None, ...])
        temp_dataset.frames.append(frames)
        temp_dataset.agents.append(agents) 
        temp_dataset.tl_faces.append(tl_faces)
        cur_scene_idx += len(scene)
        cur_frame_idx += len(frames)
        cur_agent_idx += len(agents)
        cur_tl_face_idx += len(tl_faces)    
    output_dataset.scenes.append(temp_dataset.scenes)
    output_dataset.frames.append(temp_dataset.frames)   
    output_dataset.agents.append(temp_dataset.agents)
    output_dataset.tl_faces.append(temp_dataset.tl_faces)

output_dataset is the one linked to the storage on harddrive and temp_dataset's constructor is almost similar to chunk dataset, only a tiny difference here in zarr_dataset.py in

def initialize( self, mode: str = "w", num_scenes: int = 0, num_frames: int = 0, num_agents: int = 0, num_tl_faces: int = 0 ) -> "ChunkedDataset": instead of self.root = zarr.open_group(self.path, mode=mode)it isself.root = zarr.open_group( 'w') # not linked to any storage`

quangsonle avatar Oct 16 '20 10:10 quangsonle