PROB icon indicating copy to clipboard operation
PROB copied to clipboard

Codebase writes malformed `learned*ft.json` files

Open yangengineering opened this issue 1 year ago • 8 comments

Dear orr,

Thanks for your opening code again, but when I use this code, I found a question about exemplar_replay_prev_file. When I finish the code PY_ARGS=${@:1} python -u main_open_world.py
--output_dir "${EXP_DIR}/t2" --dataset TOWOD --PREV_INTRODUCED_CLS 20 --CUR_INTRODUCED_CLS 20
--train_set 'owod_t2_train' --test_set 'owod_all_task_test' --epochs 51
--model_type 'prob' --obj_loss_coef 8e-4 --obj_temp 1.3 --freeze_prob_model
--wandb_name "${WANDB_NAME}_t2"
--exemplar_replay_selection --exemplar_replay_max_length 1743 --exemplar_replay_dir ${WANDB_NAME}
--exemplar_replay_prev_file "learned_owod_t1_ft.txt" --exemplar_replay_cur_file "learned_owod_t2_ft.txt"
--pretrain "${EXP_DIR}/t1/checkpoint0040.pth" --lr 2e-5
--resume ./exps/MOWODB/PROB/t2/checkpoint0050.pth
${PY_ARGS} There are some images can't be found as follows (such as 02): 5c7afff3b5c2f00b7b16e30a8f37ddd I must delete these images' name by myself, and then the next code can be running. I don't know why did this thing happen. I am looking forword to your help.

Bests, Zhenni Yang

yangengineering avatar Jul 04 '24 15:07 yangengineering

Hi @yangengineering,

I can't reproduce this bug. What GPU/Cuda version are you using? did you follow my installation instructions?

Best, Orr

orrzohar avatar Jul 12 '24 02:07 orrzohar

I follow the installation instructions and use the four 3090. I note in the last issue that someone also has this problem, but I don't know why did this thing happen

yangengineering avatar Aug 11 '24 13:08 yangengineering

this question is the same as issue 13

yangengineering avatar Aug 11 '24 13:08 yangengineering

Hi @yangengineering,

Yes this was an issue, but we debugged and fixed this (as indicated in issue #13) in this PR https://github.com/orrzohar/PROB/pull/15.

When did you clone this repository? For now, I suggest you just delete the ones that are malformed. Looking at this now, the only thing I can think of is adding: https://github.com/orrzohar/PROB/blob/28afbc1d7f5bdfb9a384ce09c870fcd829adc1b2/main_open_world.py#L385

to:

if args.exemplar_replay_selection and utils.is_main_process(): 

as maybe multiple processes are trying to write to the files, causing this issue? Best, Orr

orrzohar avatar Aug 11 '24 18:08 orrzohar

Also, please update if this works so I can fix this issue permanently.

Best,

Orr

orrzohar avatar Aug 11 '24 18:08 orrzohar

@orrzohar I have replaced 'if args.exemplar_replay_selection:' as' if args.exemplar_replay_selection and utils.is_main_process(): '. But I met a new problem caused by this code. image When I finish the first step, image 'the exemplar_replay_cur_file' isn't saved in 'the exemplar_replay_dir' and the code is stopped here image At the same time, the code has the wrong issues as follows: image image

Bests, Zhenni Yang

yangengineering avatar Sep 11 '24 02:09 yangengineering

@orrzohar At the same time, I try the M_benchmark again with the code ' if args.exemplar_replay_selection: '.The result is shown as follows: image That is the same as I have said in July 4. But M_benchmark_random and S_benchmark don't have these issues.

Best, Zhenni Yang.

yangengineering avatar Sep 12 '24 09:09 yangengineering

Hi Zhenni,

Di you see the 11.jpg file? does is exist or just not there at all?

Orr

orrzohar avatar Jan 15 '25 17:01 orrzohar