mseg-semantic
mseg-semantic copied to clipboard
class
There's a little bit of an error with the class.
I changed one line of code to change args.tc.classes to args.tc.num_uclasses as shown below.
Could you please tell me how to fix this error? And where did I find the args.dataset_name.
Is there a use_naive_taxonomy parameter missing here?
Author, there is this problem at present, I don't know how to solve it, the situation is urgent, thank you!!
This is how I changed the number of dataset into the number of GPU 1. I don't know if this means the training has begun
Author, could you spare me a moment? Could you help me with the question I had two days ago? Thank you
Hi @luocmin , please pull the latest version. Let me know if this doesn't answer your questions:
(1) You're right that tc.classes
has been changed to tc.num_uclasses
, thanks for catching that. I've corrected the train.py
script in my latest commit.
(2) dataset_name
is set in Line 497: https://github.com/mseg-dataset/mseg-semantic/blob/training/mseg_semantic/tool/train.py#L497
(3) Please pull in the latest master
of mseg-semantic
into your branch to see the parameter use_naive_taxonomy
: https://github.com/mseg-dataset/mseg-semantic/blob/master/mseg_semantic/utils/transform.py#L54
(4) I didn't catch how you are doing the dataset to GPU mapping, could you explain in more detail here?
If you are limited by GPU RAM, you could also
@johnwlambert Hi,
1、What I want to ask about this picture is what is the interface for successful training?
2、In addition, I modified some of the code in the MSEg-3m.YAMl file,
May I ask whether this modification is correct ?
as shown below:
change:
to:dataset: [ade20k-150-relabeled]
GPU_map:
to:dataset_gpu_mapping: {'ade20k-150-relabeled': [0]}
3、If I modified this class, do I need to recompile mseg_semantic
The following problems occurred in training two data sets with two CARDS. What is the reason? How to solve it?
Hi @luocmin, there is no compilation involved in our repo since all files are pure Python or bash.
Our configuration is to use 7 processes. Each process processes one dataset https://github.com/mseg-dataset/mseg-semantic/blob/f8afb3cb637bd5e921a1689681e5a7044a716b57/mseg_semantic/tool/train.py#L537
The gpu index (0,1,2...,6) is the rank
and we call
def get_rank_to_dataset_map(args) -> Dict[int,str]:
"""
Obtain a mapping from GPU rank (index) to the name of the dataset residing on this GPU.
Args:
- args
Returns:
- rank_to_dataset_map
"""
rank_to_dataset_map = {}
for dataset, gpu_idxs in args.dataset_gpu_mapping.items():
for gpu_idx in gpu_idxs:
rank_to_dataset_map[gpu_idx] = dataset
print('Rank to dataset map: ', rank_to_dataset_map)
return rank_to_dataset_map
args.dataset_name = rank_to_dataset_map[args.rank]
...
train_data = dataset.SemData(split='train', data_root=args.data_root[args.dataset_name], data_list=args.train_list[args.dataset_name], transform=train_transform)
...
train_loader = torch.utils.data.DataLoader(train_data, batch_size=args.batch_size, shuffle=(train_sampler is None), num_workers=args.workers, pin_memory=True, sampler=train_sampler, drop_last=True)
See here
Changing our config by mapping each dataset to the same GPU will mean that only one dataset is trained (the last one to hit line 394 https://github.com/mseg-dataset/mseg-semantic/blob/f8afb3cb637bd5e921a1689681e5a7044a716b57/mseg_semantic/tool/train.py#L394).
You will need a different strategy about how to use fewer GPUs. I mentioned a few already (concatenating all image IDs into a single dataset, which could be sharded across 4 gpus, or instead you could accumulate gradients in place over 2 forward and backward passes, and then perform a single gradient update).
Thank you, but being white I won't change the dataloader code for now
I have been trying to run two data sets with two CARDS, but it still doesn't work. May I ask if I need to use the script you mentioned before to run the command? There is no script in the report.
This is the command I ran on the server with an error: