nnUNet icon indicating copy to clipboard operation
nnUNet copied to clipboard

Integration nnUNet-wandb

Open omaruus99 opened this issue 1 year ago • 6 comments

Hello @FabianIsensee I'm currently working on integrating weights and biases with nnUNet, to do this you first need to dock nnUNet, I've carried out various tests for nnUNet on Docker locally, and the results are good!

With the same docker image and configuration, when I launch the job from wandb, nnUNet can't find the environment variables, and it returns the famous error :

RuntimeError: Could not find a dataset with the ID 4. Make sure the requested dataset ID exists and that nnU-Net knows where raw and preprocessed data are located (see Documentation - Installation). Here are your currently defined folders: nnUNet_preprocessed=/nnUNet_preprocessed nnUNet_results=/nnUNet_results nnUNet_raw=/nnUNet_raw If something is not right, adapt your environment variables.

omaruus99 avatar Feb 07 '24 14:02 omaruus99

My configuration is as follows (for the docker image that does the preprocessing): -Dockerfile : FROM python:3.9-slim-bullseye

WORKDIR /nnunet

RUN apt-get update

RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

COPY . /nnunet

RUN cd /nnunet

RUN pip install -e .

RUN pip install wandb

ENV nnUNet_raw="/nnUNet_raw" ENV nnUNet_preprocessed="/nnUNet_preprocessed" ENV nnUNet_results="/nnUNet_results"

ENTRYPOINT ["nnUNetv2_plan_and_preprocess"] CMD ["-d", "004", "--verify_dataset_integrity"]

For information, tracking, versioning and many wandb features work correctly on nnUNet, but the problem is only with these environment variables.

omaruus99 avatar Feb 07 '24 16:02 omaruus99

Hey, did you check that the folders are properly mounted when running the docker? You can just run the image in interactive mode, navigate to them and then check if everything is there as it should be. I would also like to tag @seziegler who already has a w&b integration (for logging) that doesn't require docker

FabianIsensee avatar Feb 08 '24 07:02 FabianIsensee

@FabianIsensee Yes, I did check the docker image, I tested it and there's no problem, but when I launch this same docker image from wandb I get this error. Concerning logging, yes I've done that too and without docker, but to create jobs and automate the workflow, you'll need to use docker.

omaruus99 avatar Feb 08 '24 08:02 omaruus99

This is how i run my docker image : docker run -it --gpus all --ipc=host --name nn_mlops_container -e WANDB_PROJECT="Hippocampus_seg" -e WANDB_ENTITY="ip_team" -e WANDB_API_KEY="XXX" -e WANDB_DOCKER="nn_mlops_image:latest" -v "/c/Users/dataset/nnUNet_raw:/nnUNet_raw" -v "/c/Users/dataset/nnUNet_preprocessed:/nnUNet_preprocessed" -v "/c/Users/dataset/nnUNet_results:/nnUNet_results" nn_mlops_image:latest

omaruus99 avatar Feb 08 '24 08:02 omaruus99

Hey, so this goes quite beyond anything we have done with w&b so far - I am afraid we cannot really help you here ;-/ Or is there something I am not seeing @seziegler ?

FabianIsensee avatar Feb 08 '24 12:02 FabianIsensee

Hey, unfortunately I haven't used wandb in combination with docker yet so I'm afraid I also don't know what might cause this error.

seziegler avatar Feb 08 '24 13:02 seziegler

Hi @omaruus99 , have you checked if the read and write rights are set properly for the mounted volumes? I have encountered such issues in other contexts when working with docker. I hope this helps!

GregorKoehler avatar Apr 22 '24 13:04 GregorKoehler

Closing this issue for now, as it was stale for roughly a week. Please feel free to re-open if you're still facing this issue!

GregorKoehler avatar Apr 27 '24 08:04 GregorKoehler

Is there a fork of this repo that has some weights and biases integration set up? Just for logging stuff.

ckolluru avatar May 02 '24 05:05 ckolluru

@ckolluru You could do something like this and log stuff on wandb.

import torch
import wandb

from nnunetv2.training.nnUNetTrainer.nnUNetTrainer import nnUNetTrainer
class nnUNetTrainer_Wandb_Logger(nnUNetTrainer):
    def __init__(self, plans: dict, configuration: str, fold: int, dataset_json: dict, unpack_dataset: bool = True,
                 device: torch.device = torch.device('cuda')):
        """used for debugging plans etc"""
        super().__init__(plans, configuration, fold, dataset_json, unpack_dataset, device)

        # TODO: Change how I store keys for running wandb 
        with open("wandb_key", "r") as f:
            wandb_key = f.read()
        wandb.login(key=wandb_key)

        # Wandb init
        run = wandb.init(project="Calcium Scoring", entity="artillery",
                         name=f"{fold}_{configuration}_{self.__class__.__name__}",
                         tags=[self.plans_manager.dataset_name, self.plans_manager.plans_name, self.__class__.__name__,
                               configuration])

    def on_epoch_end(self):
        super().on_epoch_end()

        # To keep the epoch count consistent with the current epoch
        # Since we are executing the super().on_epoch_end() above
        self.current_epoch -= 1
       
        print("Logging to Wandb")
        self.logger.wandb_log()

        self.current_epoch += 1

In the nnuner_logger.py add this code at the bottom

def wandb_log(self):
        epoch = min([len(i) for i in self.my_fantastic_logging.values()]) - 1  
        wandb.log(  
            {  
                "val_losses": self.my_fantastic_logging['val_losses'][epoch],  
                "train_losses": self.my_fantastic_logging['train_losses'][epoch],  
                "mean_fg_dice": self.my_fantastic_logging['mean_fg_dice'][epoch],  
                "ema_fg_dice": self.my_fantastic_logging['ema_fg_dice'][epoch]  
            }  
        )  

JackRio avatar Jun 10 '24 15:06 JackRio