mmsegmentation icon indicating copy to clipboard operation
mmsegmentation copied to clipboard

EncoderDecoder is not in the model registry.

Open MatCorr opened this issue 1 year ago • 10 comments

I am installing mmsegmentation on a sif image exactly as it is recommended on this official Dockerfile. The only big difference is that I am using an Nvidia base image. Here's what I am doing:

FROM nvcr.io/nvidia/pytorch:23.03-py3

# Install MMCV
ARG MMCV="2.0.0"
RUN ["/bin/bash", "-c", "pip install openmim"]
RUN ["/bin/bash", "-c", "mim install mmengine"]
RUN ["/bin/bash", "-c", "mim install mmcv==${MMCV}"]

# Install MMSEGMENTATION
RUN git clone -b main https://github.com/open-mmlab/mmsegmentation.git /mmsegmentation
WORKDIR /mmsegmentation
ENV FORCE_CUDA="1"
run pip install -v -e .

# Install requirements
COPY requirements_pip.txt /var/tmp/requirements_pip.txt
RUN pip --no-cache-dir install -r /var/tmp/requirements_pip.txt

Yet, when I try to run a training script via Slurm, this is the error I get:

Traceback (most recent call last):
  File "/mmsegmentation/tools/train.py", line 104, in <module>
    main()
  File "/mmsegmentation/tools/train.py", line 93, in main
    runner = Runner.from_cfg(cfg)
  File "/usr/local/lib/python3.8/dist-packages/mmengine/runner/runner.py", line 439, in from_cfg
    runner = cls(
  File "/usr/local/lib/python3.8/dist-packages/mmengine/runner/runner.py", line 406, in __init__
    self.model = self.build_model(model)
  File "/usr/local/lib/python3.8/dist-packages/mmengine/runner/runner.py", line 813, in build_model
    model = MODELS.build(model)
  File "/usr/local/lib/python3.8/dist-packages/mmengine/registry/registry.py", line 548, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/usr/local/lib/python3.8/dist-packages/mmengine/registry/build_functions.py", line 250, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
  File "/usr/local/lib/python3.8/dist-packages/mmengine/registry/build_functions.py", line 100, in build_from_cfg
    raise KeyError(
KeyError: 'EncoderDecoder is not in the model registry. Please check whether the value of `EncoderDecoder` is correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'

In case it's important, here's the slurm call I am making:

IMAGE="path_to_image/image.sif"
CONFIG="path_to_config/config.py"
OUTDIR="path_to_dir/output_dir/"
COMMAND="python -u /mmsegmentation/tools/train.py ${CONFIG} --work-dir ${OUTDIR}"

srun -N 1 singularity exec --nv --pwd /mmsegmentation $IMAGE $COMMAND

Can anyone tell me what's wrong? I've been banging my head against this error for quite a while.

MatCorr avatar May 09 '23 18:05 MatCorr

Ok, I fixed this.

I am not sure what was causing the issue, but it turns out that it all worked when rather than placing the config.py file in a random location, I placed it inside the mmsegmentation/configs/setr path.

MatCorr avatar May 11 '23 20:05 MatCorr

@openmmlab-bot is there a official singularity file provided for mmsegmentation? for those who run it in HPC

bobleegogogo avatar Jul 19 '23 19:07 bobleegogogo

@MatCorr is that possible to share the sif you used for singularity? otherwise a def file would be also very helfpul. Thank you in advance

bobleegogogo avatar Jul 19 '23 19:07 bobleegogogo

@bobleegogogo, I'm sending you the Dockerfile I'm using to generate the SIF. I hope it's helpful!

FROM nvcr.io/nvidia/pytorch:23.04-py3

# SLURM PMI2 version 20.11.9
RUN apt-get update -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        bzip2 \
        file \
        make \
        perl \
        tar \
        wget && \
    rm -rf /var/lib/apt/lists/*
RUN mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp https://download.schedmd.com/slurm/slurm-20.11.9.tar.bz2 && \
    mkdir -p /var/tmp && tar -x -f /var/tmp/slurm-20.11.9.tar.bz2 -C /var/tmp -j && \
    cd /var/tmp/slurm-20.11.9 &&   ./configure --prefix=/usr/local/slurm-pmi2 && \
    cd /var/tmp/slurm-20.11.9 && \
    make -C contribs/pmi2 install && \
    rm -rf /var/tmp/slurm-20.11.9 /var/tmp/slurm-20.11.9.tar.bz2

# Install MMCV
ARG MMCV="2.0.0"
RUN ["/bin/bash", "-c", "pip install openmim"]
RUN ["/bin/bash", "-c", "mim install mmengine"]
RUN ["/bin/bash", "-c", "mim install mmcv==${MMCV}"]

# Install MMSEGMENTATION
RUN git clone -b main https://github.com/open-mmlab/mmsegmentation.git /mmsegmentation
WORKDIR /mmsegmentation
ENV FORCE_CUDA="1"
run pip install -v -e .

MatCorr avatar Jul 20 '23 10:07 MatCorr

@MatCorr great, really appreciate it ;)

bobleegogogo avatar Jul 21 '23 07:07 bobleegogogo

I didnt build mmsegmentation from source but build it as a dependence. Where should I place the config file? I get this error. It is official model type and it is so confusing.

Sere1nz avatar Oct 20 '23 08:10 Sere1nz

@Sere1nz, if I remember correctly, if you create a new config file, you can't just place it anywhere you feel like. You must place it inside the configs folder of mmsegmentation; that is, somewhere inside one of these folders.

MatCorr avatar Oct 20 '23 10:10 MatCorr

This issue still persists even if we place the file under configs > any_model > your_config.py. I don't know what's causing the issue and inspected all model fields. image

nitec427 avatar Nov 29 '23 22:11 nitec427

即使我们将文件放在 configs > any_model > your_config.py下,此问题仍然存在。我不知道是什么原因导致了问题,并检查了所有模型字段。 图像

I have same problem ,you have solved it ?

reviewjie avatar Mar 11 '24 09:03 reviewjie