image-segmentation-keras
image-segmentation-keras copied to clipboard
Tensorflow 2.5.0 compatibility with the package
I am trying to make your powerful library with an RTX 3070 Ti. After creating a python 3.8 anaconda environment with CUDA 11.3.1 and cuDNN 8.2.1 The lowest tensorflow version that properly utilzes my RTX 3070 Ti is 2.5.0. The training runs properly but with one problem. Instead of creating the checkpoints as below:
mobilenet_segnet_no_aug.0 mobilenet_segnet_no_aug.1 mobilenet_segnet_no_aug.2 mobilenet_segnet_no_aug.3 mobilenet_segnet_no_aug.4 etc. mobilenet_segnet_no_aug_config.json
then the below checkpoints are produced:
mobilenet_segnet_no_aug.0.data-00000-of-00001 mobilenet_segnet_no_aug.0.index mobilenet_segnet_no_aug.1.data-00000-of-00001 mobilenet_segnet_no_aug.1.index mobilenet_segnet_no_aug.2.data-00000-of-00001 mobilenet_segnet_no_aug.2.index mobilenet_segnet_no_aug.3.data-00000-of-00001 mobilenet_segnet_no_aug.3.index mobilenet_segnet_no_aug.4.data-00000-of-00001 mobilenet_segnet_no_aug.4.index etc. mobilenet_segnet_no_aug_config.json
The problem is that after the training i cannot find a way to load all the new type checkpoints in order to use them for evaluating my training through the predict_multiple function.
P.S I use your framework without a problem but through a GTX 1070, CUDA 10.0 and Tensorflow 1.14.0, now we are trying to exploit the capabilities of a new generation GPU.
I have the same issue when trying to predict using the obtained checkpoints. The training was okay but cann't do any prediction. Did you solve this issue? Thanks.
Unfortunately, i am still facing the same issue. Because it was a side project i continued working with my GTX 1070 and the CUDA 10.0 anaconda environment.
I have the issue the other way around. I still have the checkpoints in the first format from training it with tensorflow 2.2: mobilenet_segnet_no_aug.0 mobilenet_segnet_no_aug.1 mobilenet_segnet_no_aug.2 mobilenet_segnet_no_aug.3
But after updating to a more recent version it fails loading the checkpoints generated with the old version. I suspect that tensorflow may have changed the format for checkpoints, which might be the issue for us.
I dug a bit deeper into this issue and my assumption of the different file formats being an issue here seems to be true.
Currently tensorflow supports 3 file formats: h5, tensorflow, keras. And the default format is the tensorflow one now. I suspect when this library was initially programmed, h5 was the default.
To fix this, this library needs to explicitely give the checkpoints the file ending ".h5" to make tensorflow aware of the file format. This needs to happen everywhere, where the methods. load_weights and save_weights are used. (train.py and predict.py) That should solve the issue.
Maybe someone can open a pr for this, I don't have the time to do this at the moment.