tf-keras icon indicating copy to clipboard operation
tf-keras copied to clipboard

Model serializer path is incompatible with Windows

Open HJasperson opened this issue 3 years ago • 10 comments

System information.

  • OS Platform and Distribution: Windows Version 10.0.19044
  • TensorFlow version: 2.7
  • Python version: 3.9.12

Describe the problem.

Both the serializer and deserializer construct the temp_dir path using the hard coded prefix "ram://". This works for Unix-based systems, but the prefix is not valid on Windows.

Describe the current behavior.

Produces the following error, which stems from the fact that f (L75) is invalid because dest_path (L74) is not a valid memory address.

Traceback (most recent call last):
  File "C:\Users\Hope\Anaconda3\envs\Association\lib\site-packages\keras\saving\pickle_utils.py", line 77, in serialize_model_as_bytecode
    info.size = f.size()
  File "C:\Users\Hope\Anaconda3\envs\Association\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 99, in size
    return stat(self.__name).length
  File "C:\Users\Hope\Anaconda3\envs\Association\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 910, in stat
    return stat_v2(filename)
  File "C:\Users\Hope\Anaconda3\envs\Association\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 926, in stat_v2
    return _pywrap_file_io.Stat(compat.path_to_str(path))
tensorflow.python.framework.errors_impl.NotFoundError

Suggested fix.

Implement something like the _get_temp_folder() method from scikeras

HJasperson avatar Mar 25 '22 21:03 HJasperson

@HJasperson , I was able to execute the code without any errors.Please find the gist here.

tilakrayal avatar Mar 28 '22 08:03 tilakrayal

  1. the gist doesn't actually create a model and serialize/deserialize it, so you wouldn't get the error
  2. you need to run the serializer on Windows. "ram://" is not valid on Windows.

tagging @adriangb since he's worked on this file and scikeras

HJasperson avatar Mar 28 '22 15:03 HJasperson

The lack of support for ram:// on windows was a known issue at the time we wrote that module.

To be quite honest, I've somewhat lost track of the work on that in the ~ 1 year since I did the implementation. I believe there were still some issues, e.g. https://github.com/tensorflow/tensorflow/issues/48086.

But at a high level, what is your goal? Do you need to pickle your models, or can you used the SavedModel API directly?

adriangb avatar Mar 28 '22 16:03 adriangb

This method gets called by RLlib's internal serialization (for passing environments to worker processes), so unfortunately its usage is unavoidable

HJasperson avatar Mar 28 '22 18:03 HJasperson

Gotcha. Did that just not work at all before this module was implemented in TF?

adriangb avatar Mar 28 '22 18:03 adriangb

I have no idea; I'm an RLlib newbie

HJasperson avatar Mar 28 '22 18:03 HJasperson

@adriangb is this issue purely a side effect of https://github.com/tensorflow/tensorflow/issues/48086 then? Meaning that when https://github.com/tensorflow/tensorflow/issues/48086 is fixed, this issue will automatically be fixed.

hertschuh avatar Mar 31 '22 22:03 hertschuh

I believe so. I don't 100% remember though

adriangb avatar Mar 31 '22 22:03 adriangb

Some context: https://github.com/keras-team/keras/pull/14748#discussion_r654980437

adriangb avatar Mar 31 '22 23:03 adriangb

I think we could ask to @mihaimaruseac It they have someone to comment in the upstream ticket. Probably we don't want to lost all the analsys we spent to debug that bug.

bhack avatar Apr 10 '22 19:04 bhack