dl-4-tsc icon indicating copy to clipboard operation
dl-4-tsc copied to clipboard

best_model.hdf5

Open shahmustafa opened this issue 4 years ago • 10 comments

Does it generate the best_model? or how is it going to work?

OSError: Unable to open file (unable to open file: name = '/data1/prjs/code/ABTS/dl_4_tsc//results/fcn/UCRArchive_2018_itr_8/Coffee/best_model.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

shahmustafa avatar May 07 '20 09:05 shahmustafa

Yes, it uses model checkpoint.

hfawaz avatar May 08 '20 10:05 hfawaz

Its only generating last_model.hdf5 and model_init.hdf5 and I am getting error for (FileNotError) best_model.hdf5

shahmustafa avatar May 08 '20 10:05 shahmustafa

This means that your code was not executed successfully. Do you see any error when running the code ?

hfawaz avatar May 08 '20 11:05 hfawaz

python=3.6.8 tensorflow=1.14 when running with python main.py UCRArchive_2018 Coffee fcn _itr_8

getting this error

Traceback (most recent call last): File "main.py", line 152, in fit_classifier() File "main.py", line 44, in fit_classifier classifier.fit(x_train, y_train, x_test, y_test, y_true) File "/data1/prjs/code/ABTS/dl_4_tsc/classifiers/fcn.py", line 80, in fit model = keras.models.load_model(self.output_directory+'best_model.hdf5') File "/data1/prjs/code/ABTS/venv/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py", line 146, in load_model return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile) File "/data1/prjs/code/ABTS/venv/lib/python3.6/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 200, in load_model_from_hdf5 f = h5py.File(filepath, mode='r') File "/data1/prjs/code/ABTS/venv/lib/python3.6/site-packages/h5py/_hl/files.py", line 408, in init swmr=swmr) File "/data1/prjs/code/ABTS/venv/lib/python3.6/site-packages/h5py/_hl/files.py", line 173, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 88, in h5py.h5f.open OSError: Unable to open file (unable to open file: name = '/data1/prjs/code/ABTS/dl_4_tsc//results/fcn/UCRArchive_2018_itr_8/Coffee/best_model.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

shahmustafa avatar May 08 '20 11:05 shahmustafa

This means that the model was not saved, maybe recheck the paths. If it does not work, I believe you should install TF 2.0 and work with the new version. The code works with TF 2.0 now.

hfawaz avatar May 08 '20 11:05 hfawaz

With TF2.0 I am getting this

OSError: SavedModel file does not exist at: saved_model_dir/{saved_model.pbtxt|saved_model.pb}

shahmustafa avatar May 08 '20 11:05 shahmustafa

I think it may be write permissions for the target directory, not quite sure though.

hfawaz avatar May 11 '20 11:05 hfawaz

@hfawaz @shahmustafa I'm experiencing the same issue. Here is my env: Mac OS X: 10.15.5 Python (conda): 3.8 tensorflow 2.2.0
h5py 2.10.0 py38h3134771_0 hdf5 1.10.4
keras 2.3.1

The error seems to suggest out of memory problem when the code is trying to save intermediate result in HDF, here are some clues:

https://github.com/h5py/h5py/issues/1176 https://stackoverflow.com/questions/44117315/goes-out-of-memory-when-saving-large-array-with-hdf5-py... http://www.pytables.org/cookbook/inmemory_hdf5_files.html https://www.pytables.org/cookbook/inmemory_hdf5_files.html https://stackoverflow.com/questions/40449659/does-h5py-read-the-whole-file-into-memory

In my case, the problem only arise when I started using slightly larger training data (30KB vs 1.8MB). Of course, 30kb wouldn't cause such a problem.

nabito avatar Jun 14 '20 13:06 nabito

Here you're error log

Traceback (most recent call last):
  File "main.py", line 155, in <module>
    fit_classifier()
  File "main.py", line 44, in fit_classifier
    classifier.fit(x_train, y_train, x_test, y_test, y_true)
  File "/mnt/batch/tasks/shared/LS_root/jobs/datascience-ml/azureml/resnet-timeseries_1592133278_dfbeddf7/mounts/workspaceblobstore/azureml/resnet-timeseries_1592133278_dfbeddf7/classifiers/resnet.py", line 142, in fit
    y_pred = self.predict(x_val, y_true, x_train, y_train, y_val,
  File "/mnt/batch/tasks/shared/LS_root/jobs/datascience-ml/azureml/resnet-timeseries_1592133278_dfbeddf7/mounts/workspaceblobstore/azureml/resnet-timeseries_1592133278_dfbeddf7/classifiers/resnet.py", line 160, in predict
    model = keras.models.load_model(model_path)
  File "/azureml-envs/azureml_eca0112c9008c12b467c806af1888db3/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 189, in load_model
    loader_impl.parse_saved_model(filepath)
  File "/azureml-envs/azureml_eca0112c9008c12b467c806af1888db3/lib/python3.8/site-packages/tensorflow/python/saved_model/loader_impl.py", line 110, in parse_saved_model
    raise IOError("SavedModel file does not exist at: %s/{%s|%s}" %
OSError: SavedModel file does not exist at: /mnt/batch/tasks/shared/LS_root/jobs/datascience-ml/azureml/resnet-timeseries_1592133278_dfbeddf7/mounts/workspaceblobstore/azureml/resnet-timeseries_1592133278_dfbeddf7/results/resnet/rtpcr_itr_9/qtower/best_model.hdf5/{saved_model.pbtxt|saved_model.pb}

@hfawaz Could you give us the specific version of all dependencies that works for you during publication?

nabito avatar Jun 14 '20 13:06 nabito

@hfawaz @shahmustafa I'm experiencing the same issue. Here is my env: Mac OS X: 10.15.5 Python (conda): 3.8 tensorflow 2.2.0 h5py 2.10.0 py38h3134771_0 hdf5 1.10.4 keras 2.3.1

The error seems to suggest out of memory problem when the code is trying to save intermediate result in HDF, here are some clues:

h5py/h5py#1176 https://stackoverflow.com/questions/44117315/goes-out-of-memory-when-saving-large-array-with-hdf5-py... http://www.pytables.org/cookbook/inmemory_hdf5_files.html https://www.pytables.org/cookbook/inmemory_hdf5_files.html https://stackoverflow.com/questions/40449659/does-h5py-read-the-whole-file-into-memory

In my case, the problem only arise when I started using slightly larger training data (30KB vs 1.8MB). Of course, 30kb wouldn't cause such a problem.

Has anyone been able to fix this problem?

arieell25 avatar Nov 25 '21 11:11 arieell25