kedro
kedro copied to clipboard
TensorFlowModelDataset doesn't overides existing local model when trying to save new model at the same local path
Description
TensorFlowModelDataset doesn't overides existing local model when saving new model
Context
The bug occurs when experimenting with different tensorflow models architectures trained and saved by running a kedro pipeline. The versioning of those experiments is done with Kedro-mlflow (@Galileo-Galilei @takikadiri) which manages the upload of the different local training artifacts (including the saved model) in to a remote mlflow server)
Steps to Reproduce
- Create a toy tensorflow model and train it by running a dedicated a kedro pipeline
- Persist the trained model as a TensorFlowModelDataset declared in the data catalog
- Change the architecture of the model (by modyfing the node used to create the model archictecture), retrain the model by exectuting the training pipeline again and check to see if your new model has replaced the old model in your local path
Expected Result
The new trained model should overide the older model that exists at the declared local path
Actual Result
The new model is silently NOT saved : the local path still contains the old model (no warning or error is shown...)
## Your Environment
Include as many relevant details about the environment in which you experienced the bug:
* Kedro version used (`pip show kedro` or `kedro -V`): 0.16.5
* Python version used (`python -V`): Python 3.6.8
* Operating system and version: CentOS Linux 7 (Core)
Hi @aquerkent, thanks for reporting this issue. Could you provide the catalog entry you're using for the model?
Hi @aquerkent , do you still need help resolving this issue?
Hi @aquerkent I'm closing this issue, but feel free to re-open if you still need help with this.
Discord user experiencing similar issue
What I found is that, the old model still exists after saving the new one, and the new one is in a subdirectory. It's like the fsspec.put
operation copies the root tmp directory that TensorFlowModelDataset
creates into the dataset's filepath.
Catalog entry:
tf_model_deploy:
type: kedro.extras.datasets.tensorflow.TensorFlowModelDataset
filepath: data/09_deployment/tf_model
The _save
method that creates a temporal folder and then makes a call to fsspec.put
:
def _save(self, data: tf.keras.Model) -> None:
save_path = get_filepath_str(self._get_save_path(), self._protocol)
# Make sure all intermediate directories are created.
save_dir = Path(save_path).parent
save_dir.mkdir(parents=True, exist_ok=True)
with tempfile.TemporaryDirectory(prefix=self._tmp_prefix) as path:
if self._is_h5:
path = str(PurePath(path) / TEMPORARY_H5_FILE)
tf.keras.models.save_model(data, path, **self._save_args)
# Use fsspec to take from local tempfile directory/file and
# put in ArbitraryFileSystem
if self._is_h5:
self._fs.copy(path, save_path)
else:
self._fs.put(path, save_path, recursive=True)
Some more insights into this: This seems to be an issue since fsspec version > 2021.07.0. So pinning fsspec to fsspec==2021.07.0 solves the issue for put
operation.
Given https://github.com/fsspec/filesystem_spec/issues/820 however, this seems to be expected behaviour for put
and instead, cp
should be used. There are also other issues, e.g. https://github.com/fsspec/filesystem_spec/issues/882 that might be related indicating that is indeed not an expected behaviour.
Following the argumentation of https://github.com/fsspec/filesystem_spec/issues/820, it might be a valid patch to TensorFlowModelDataset
(and other folder-based data sets) to use cp
instead, e.g.
self._fs.copy(path, save_path, recursive=True)
In case there are no expected side effects and put
is not used here on purpose, I am happy to open a pull request.
We would absolutely love a PR @eileen-kuehn !