Set up two new data folders splitting old .data folder
Motivation
To better manage data in a cloud environment, we want to split the .data folder according to the data ownership. We identified two types of data:
- Business data produced or explicitly located by the user: Input data files like CSV files, output data files that need to be located in a specific folder to be integrated with external components, or intermediate data nodes like models that users must manually access. These are the data written by the
data_node.writefunction. The users have ownership of this data. - Internal data managed by Taipy: The entities, including meta-data on business data. Taipy has ownership of this data, meaning we need to maintain compatibility across Taipy versions and user application versions.
Description
Add the following fields to the core configuration:
-
"taipy_folder": This new attribute refers to the taipy folder. This folder will contain the internal state of taipy-core, aka the entities when thefilesystem(default) repository is activated. A good default name for the folder could be.taipy. Today, the data is stored in the folder corresponding to the config attribute_STORAGE_FOLDER_KEY = "storage_folder"with.dataas the default value. -
"data_folder": This new attribute refers to the data folder. This folder will contain the user data files (CSV, Excel, pickles, etc.). These files can be explicitly stored by the user providing a path or by Taipy, which generates a unique path. A good default name for the folder could be.data. Today, the data generated by Taipy is stored in the folder corresponding to the config attribute named_STORAGE_FOLDER_KEY = "storage_folder"with.dataas the default value.
Integration with Taipy Cloud
These configuration attributes will be overloaded on taipy-cloud to point to persistent storage. Applications will not be able to write files and folders anywhere other than these two paths
Breaking changes
Breaking changes must be avoided as much as possible. Any breaking change must be identified, and a migration path must be found.
Note: Even if .data seems to be a good default name for the data_folder, we may want to use a different default name to make the migration easier or to avoid conflicts with the old folder.
Note
Adding a third folder for temporary data has been discussed. However, this has been rejected.
- REJECTED:
TMP_FOLDER: This folder should contain temporary files and folders. This is a public configuration that users can use for temporary data.
Documentation must be updated.
https://github.com/Avaiga/taipy-cloud/wiki/Data-persistency-on-Taipy%E2%80%90Cloud