A11en4z
A11en4z
> > It is not related to the port. Make --nproc_per_node=1 pls > > I set --nproc_per_node=1, but I am still getting the error torch.distributed.elastic.multiprocessing.errors.ChildFailedError. How can I resolve this...
> I trained 70 epochs but the results are still bad, including the errors and the loss. The loss is always like 33, 34..., is it normal or something goes...
> ## ❓ Questions and Help > 2024-03-08 21:33:06,932 maskrcnn_benchmark INFO: Using 1 GPUs 2024-03-08 21:33:06,932 maskrcnn_benchmark INFO: AMP_VERBOSE: False DATALOADER: ASPECT_RATIO_GROUPING: True NUM_WORKERS: 4 SIZE_DIVISIBILITY: 32 DATASETS: TEST: ('VG_stanford_filtered_with_attribute_test',)...