Codebase writes malformed `learned*ft.json` files
Dear orr,
Thanks for your opening code again, but when I use this code, I found a question about exemplar_replay_prev_file. When I finish the code
PY_ARGS=${@:1}
python -u main_open_world.py
--output_dir "${EXP_DIR}/t2" --dataset TOWOD --PREV_INTRODUCED_CLS 20 --CUR_INTRODUCED_CLS 20
--train_set 'owod_t2_train' --test_set 'owod_all_task_test' --epochs 51
--model_type 'prob' --obj_loss_coef 8e-4 --obj_temp 1.3 --freeze_prob_model
--wandb_name "${WANDB_NAME}_t2"
--exemplar_replay_selection --exemplar_replay_max_length 1743 --exemplar_replay_dir ${WANDB_NAME}
--exemplar_replay_prev_file "learned_owod_t1_ft.txt" --exemplar_replay_cur_file "learned_owod_t2_ft.txt"
--pretrain "${EXP_DIR}/t1/checkpoint0040.pth" --lr 2e-5
--resume ./exps/MOWODB/PROB/t2/checkpoint0050.pth
${PY_ARGS}
There are some images can't be found as follows (such as 02):
I must delete these images' name by myself, and then the next code can be running.
I don't know why did this thing happen. I am looking forword to your help.
Bests, Zhenni Yang
Hi @yangengineering,
I can't reproduce this bug. What GPU/Cuda version are you using? did you follow my installation instructions?
Best, Orr
I follow the installation instructions and use the four 3090. I note in the last issue that someone also has this problem, but I don't know why did this thing happen
this question is the same as issue 13
Hi @yangengineering,
Yes this was an issue, but we debugged and fixed this (as indicated in issue #13) in this PR https://github.com/orrzohar/PROB/pull/15.
When did you clone this repository? For now, I suggest you just delete the ones that are malformed. Looking at this now, the only thing I can think of is adding: https://github.com/orrzohar/PROB/blob/28afbc1d7f5bdfb9a384ce09c870fcd829adc1b2/main_open_world.py#L385
to:
if args.exemplar_replay_selection and utils.is_main_process():
as maybe multiple processes are trying to write to the files, causing this issue? Best, Orr
Also, please update if this works so I can fix this issue permanently.
Best,
Orr
@orrzohar
I have replaced 'if args.exemplar_replay_selection:' as' if args.exemplar_replay_selection and utils.is_main_process(): '.
But I met a new problem caused by this code.
When I finish the first step,
'the exemplar_replay_cur_file' isn't saved in 'the exemplar_replay_dir' and the code is stopped here
At the same time, the code has the wrong issues as follows:
Bests, Zhenni Yang
@orrzohar
At the same time, I try the M_benchmark again with the code ' if args.exemplar_replay_selection: '.The result is shown as follows:
That is the same as I have said in July 4.
But M_benchmark_random and S_benchmark don't have these issues.
Best, Zhenni Yang.
Hi Zhenni,
Di you see the 11.jpg file? does is exist or just not there at all?
Orr