azure-sdk-for-python
azure-sdk-for-python copied to clipboard
Intermittent dropping of files while mounting FileDataset in local machine.
- Package Name: azureml-core
- Package Version: 1.44.0
- Operating System: Ubuntu 18.04.6 LTS
- Python Version: Python 3.6.13 |Anaconda, Inc.| (default, Jun 4 2021, 14:25:59)
Describe the bug The task is fairly simple. I have uploaded Synscapes dataset (https://synscapes.on.liu.se/download.html) to Azure Blob Storage and have registered the same in my AzureML Studio datastores. The dataset contains 25000 image ".png" files that needs to be mounted to the local machine and then on each image a small feature extraction using OpenCV needs to be done. The extracted features then are stored as a .pckl file. Out of the 25000 image files some files at random are not found in the mount path as suggested by the OpenCV error: [ WARN:[email protected]] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('<path/to/file/filename>.png'): can't open/read file: check file path/integrity
Refer this code snippet for details:
from azureml.core import Workspace, Dataset
workspace = Workspace(subscription_id, resource_group, workspace_name)
datastorage = Dataset.get_by_name(workspace, name='<azureml_datastore_name>')
with datastorage.mount() as mount_context:
features = []
mount_path = mount_context.mount_point
for image_path in image_paths:
image = cv2.imread(os.path.join(mount_path, label_path), cv2.IMREAD_UNCHANGED)
features.append(get_feature(image))
with open(save_path, "wb") as handle:
pickle.dump(features , handle, protocol = pickle.HIGHEST_PROTOCOL)
Output: After running successfully for few thousand images, OpenCV throws the following OpenCV error: [ WARN:[email protected]] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('<path/to/file/filename>.png'): can't open/read file: check file path/integrity
I have veryfied that the image file exists and is not corrupted. One thing that verifies this claim is that a given image gets loaded in few runs while throws error in other runs.
To Reproduce Steps to reproduce the behavior:
- Register Synscapes dataset to Azure Blob Storage.
- Register the same to Azure datastorage.
- Mount data to local machine as shown in the code snippet above.
- Wait for the errors to be thrown.
Expected behavior All the 25000 images (in this scenario) should be accessible on the local machine after mounting the dataset from AzureML Studio.
Thanks for the feedback, we’ll investigate asap.
@azureml-github
@deepankersingh96 as a first thing to try is do upgrade to latest SDK version, which is currently on 1.48, it does have few bug fixes and major rewrite of mount logic.
Additionally it's important to know how your file dataset was defined, could you please share output of
Dataset.get_by_name(workspace, name='<azureml_datastore_name>')._dataflow._steps
Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!