InnerEye-DeepLearning
InnerEye-DeepLearning copied to clipboard
Medical Imaging Deep Learning library to train and deploy 3D segmentation models on Azure Machine Learning
We see most builds showing repeated errors saying "Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception cannot import name '_DistributedTraining' from 'azureml.train._distributed_training' (/home/jaalvare/miniconda3/envs/InnerEye/lib/python3.7/site-packages/azureml/train/_distributed_training.py).". Can those...
Tensorboard monitoring is presently hooked up to AzureML runs. It should be possible to point tensorboard to a local folder, and start the monitoring script for local runs. Upon job...
- GPU augmentations at training time should be configurable - Document the benefits of this feature [AB#3925](https://innereye.visualstudio.com/60ce1777-00d6-4015-82bc-488a0c00202f/_workitems/edit/3925)
tensorboard monitoring is something that could be made available as a commandline tool, without having to invoke it via "python ...". Same for other helpers https://python-packaging.readthedocs.io/en/latest/command-line-scripts.html [AB#3924](https://innereye.visualstudio.com/60ce1777-00d6-4015-82bc-488a0c00202f/_workitems/edit/3924)
At the moment, the model summary code in `generate_and_print_model_summary` does not store any of its results. The number of trainable parameters is logged to AzureML, but not anywhere else. It...
After Lightning refactoring, the average memory consumption graphs show that we have reduced memory consumption. * Does that reflect reality, or is it an artefact of averaging? * If it...
It is possible however that this will not work at all, or only with `ddp_spawn` as the accelerator. [AB#3913](https://innereye.visualstudio.com/60ce1777-00d6-4015-82bc-488a0c00202f/_workitems/edit/3913)
Regression models should at rank zero write this a prediction/target plot like this in the old code: if self._should_save_regression_error_plot(self.current_epoch): error_plot_name = f"error_plot_{self.train_val_params.epoch}" path = str(self.config.outputs_folder / f"{error_plot_name}.png") plot_variation_error_prediction(epoch_metrics.get_labels(), epoch_metrics.get_predictions(), path)...
- Enable the removed tests, like `test_rnn_classifier_via_config_2` - Ensure that temperature values are logged. [AB#3912](https://innereye.visualstudio.com/60ce1777-00d6-4015-82bc-488a0c00202f/_workitems/edit/3912)
The mean teacher model fell victim to the PL refactoring, re-enable it. - Add back all tests that use it (for example, `test_rnn_classifier_via_config_1`, `test_create_inference_pipeline`, `test_mean_teacher_model`, `test_recover_testing_from_run_recovery`). - Constructor of `ScalarLightning`...