mseg-semantic icon indicating copy to clipboard operation
mseg-semantic copied to clipboard

class

Open luocmin opened this issue 4 years ago • 11 comments

There's a little bit of an error with the class. image I changed one line of code to change args.tc.classes to args.tc.num_uclasses as shown below. image

luocmin avatar Oct 19 '20 02:10 luocmin

Could you please tell me how to fix this error? And where did I find the args.dataset_name. image Is there a use_naive_taxonomy parameter missing here? image

luocmin avatar Oct 19 '20 03:10 luocmin

Author, there is this problem at present, I don't know how to solve it, the situation is urgent, thank you!!

luocmin avatar Oct 19 '20 06:10 luocmin

This is how I changed the number of dataset into the number of GPU 1. I don't know if this means the training has begun image

luocmin avatar Oct 19 '20 06:10 luocmin

Author, could you spare me a moment? Could you help me with the question I had two days ago? Thank you

luocmin avatar Oct 22 '20 01:10 luocmin

Hi @luocmin , please pull the latest version. Let me know if this doesn't answer your questions:

(1) You're right that tc.classes has been changed to tc.num_uclasses, thanks for catching that. I've corrected the train.py script in my latest commit.

(2) dataset_name is set in Line 497: https://github.com/mseg-dataset/mseg-semantic/blob/training/mseg_semantic/tool/train.py#L497

(3) Please pull in the latest master of mseg-semantic into your branch to see the parameter use_naive_taxonomy: https://github.com/mseg-dataset/mseg-semantic/blob/master/mseg_semantic/utils/transform.py#L54

(4) I didn't catch how you are doing the dataset to GPU mapping, could you explain in more detail here?

If you are limited by GPU RAM, you could also

johnwlambert avatar Oct 22 '20 03:10 johnwlambert

@johnwlambert Hi, 1、What I want to ask about this picture is what is the interface for successful training? image

2、In addition, I modified some of the code in the MSEg-3m.YAMl file, May I ask whether this modification is correct ? as shown below: change: image to:dataset: [ade20k-150-relabeled] GPU_map: image to:dataset_gpu_mapping: {'ade20k-150-relabeled': [0]}

luocmin avatar Oct 22 '20 03:10 luocmin

3、If I modified this class, do I need to recompile mseg_semantic image

luocmin avatar Oct 22 '20 03:10 luocmin

The following problems occurred in training two data sets with two CARDS. What is the reason? How to solve it? image

luocmin avatar Oct 22 '20 06:10 luocmin

Hi @luocmin, there is no compilation involved in our repo since all files are pure Python or bash.

Our configuration is to use 7 processes. Each process processes one dataset https://github.com/mseg-dataset/mseg-semantic/blob/f8afb3cb637bd5e921a1689681e5a7044a716b57/mseg_semantic/tool/train.py#L537

The gpu index (0,1,2...,6) is the rank and we call

def get_rank_to_dataset_map(args) -> Dict[int,str]:
    """
        Obtain a mapping from GPU rank (index) to the name of the dataset residing on this GPU.
        Args:
        -   args
        Returns:
        -   rank_to_dataset_map
    """
    rank_to_dataset_map = {}
    for dataset, gpu_idxs in args.dataset_gpu_mapping.items():
        for gpu_idx in gpu_idxs:
            rank_to_dataset_map[gpu_idx] = dataset
    print('Rank to dataset map: ', rank_to_dataset_map)
    return rank_to_dataset_map

args.dataset_name = rank_to_dataset_map[args.rank]
...
train_data = dataset.SemData(split='train', data_root=args.data_root[args.dataset_name], data_list=args.train_list[args.dataset_name], transform=train_transform)
...
train_loader = torch.utils.data.DataLoader(train_data, batch_size=args.batch_size, shuffle=(train_sampler is None), num_workers=args.workers, pin_memory=True, sampler=train_sampler, drop_last=True)

See here

Changing our config by mapping each dataset to the same GPU will mean that only one dataset is trained (the last one to hit line 394 https://github.com/mseg-dataset/mseg-semantic/blob/f8afb3cb637bd5e921a1689681e5a7044a716b57/mseg_semantic/tool/train.py#L394).

You will need a different strategy about how to use fewer GPUs. I mentioned a few already (concatenating all image IDs into a single dataset, which could be sharded across 4 gpus, or instead you could accumulate gradients in place over 2 forward and backward passes, and then perform a single gradient update).

johnwlambert avatar Oct 22 '20 21:10 johnwlambert

Thank you, but being white I won't change the dataloader code for now

luocmin avatar Oct 23 '20 01:10 luocmin

I have been trying to run two data sets with two CARDS, but it still doesn't work. May I ask if I need to use the script you mentioned before to run the command? There is no script in the report. image This is the command I ran on the server with an error: image

luocmin avatar Oct 23 '20 02:10 luocmin