kedro icon indicating copy to clipboard operation
kedro copied to clipboard

TensorFlowModelDataset doesn't overides existing local model when trying to save new model at the same local path

Open aquerkent opened this issue 4 years ago • 7 comments

Description

TensorFlowModelDataset doesn't overides existing local model when saving new model

Context

The bug occurs when experimenting with different tensorflow models architectures trained and saved by running a kedro pipeline. The versioning of those experiments is done with Kedro-mlflow (@Galileo-Galilei @takikadiri) which manages the upload of the different local training artifacts (including the saved model) in to a remote mlflow server)

Steps to Reproduce

  1. Create a toy tensorflow model and train it by running a dedicated a kedro pipeline
  2. Persist the trained model as a TensorFlowModelDataset declared in the data catalog
  3. Change the architecture of the model (by modyfing the node used to create the model archictecture), retrain the model by exectuting the training pipeline again and check to see if your new model has replaced the old model in your local path

Expected Result

The new trained model should overide the older model that exists at the declared local path

Actual Result

The new model is silently NOT saved : the local path still contains the old model (no warning or error is shown...)


## Your Environment
Include as many relevant details about the environment in which you experienced the bug:

* Kedro version used (`pip show kedro` or `kedro -V`): 0.16.5
* Python version used (`python -V`): Python 3.6.8
* Operating system and version: CentOS Linux 7 (Core)

aquerkent avatar Feb 23 '21 09:02 aquerkent

Hi @aquerkent, thanks for reporting this issue. Could you provide the catalog entry you're using for the model?

merelcht avatar Mar 03 '21 13:03 merelcht

Hi @aquerkent , do you still need help resolving this issue?

merelcht avatar Mar 22 '21 14:03 merelcht

Hi @aquerkent I'm closing this issue, but feel free to re-open if you still need help with this.

merelcht avatar Apr 06 '21 08:04 merelcht

Discord user experiencing similar issue

datajoely avatar Mar 29 '22 21:03 datajoely

What I found is that, the old model still exists after saving the new one, and the new one is in a subdirectory. It's like the fsspec.put operation copies the root tmp directory that TensorFlowModelDataset creates into the dataset's filepath.

Catalog entry:

tf_model_deploy: 
  type: kedro.extras.datasets.tensorflow.TensorFlowModelDataset
  filepath: data/09_deployment/tf_model 

The _save method that creates a temporal folder and then makes a call to fsspec.put:

def _save(self, data: tf.keras.Model) -> None:
        save_path = get_filepath_str(self._get_save_path(), self._protocol)

        # Make sure all intermediate directories are created.
        save_dir = Path(save_path).parent
        save_dir.mkdir(parents=True, exist_ok=True)

        with tempfile.TemporaryDirectory(prefix=self._tmp_prefix) as path:
            if self._is_h5:
                path = str(PurePath(path) / TEMPORARY_H5_FILE)

            tf.keras.models.save_model(data, path, **self._save_args)

            # Use fsspec to take from local tempfile directory/file and
            # put in ArbitraryFileSystem
            if self._is_h5:
                self._fs.copy(path, save_path)
            else:
                self._fs.put(path, save_path, recursive=True)

williamcaicedo avatar Mar 29 '22 21:03 williamcaicedo

Some more insights into this: This seems to be an issue since fsspec version > 2021.07.0. So pinning fsspec to fsspec==2021.07.0 solves the issue for put operation. Given https://github.com/fsspec/filesystem_spec/issues/820 however, this seems to be expected behaviour for put and instead, cp should be used. There are also other issues, e.g. https://github.com/fsspec/filesystem_spec/issues/882 that might be related indicating that is indeed not an expected behaviour.

Following the argumentation of https://github.com/fsspec/filesystem_spec/issues/820, it might be a valid patch to TensorFlowModelDataset (and other folder-based data sets) to use cp instead, e.g.

self._fs.copy(path, save_path, recursive=True)

In case there are no expected side effects and put is not used here on purpose, I am happy to open a pull request.

eileen-kuehn avatar May 23 '22 20:05 eileen-kuehn

We would absolutely love a PR @eileen-kuehn !

datajoely avatar May 24 '22 14:05 datajoely