io
io copied to clipboard
Using filesystem from tensorboard
This is a question or maybe request for documentation:
How would we use a tensorflow/io filesystem from tensorboard from the command line? Similar to how we can do it with GCS as tensorboard --logdir=gs://bucket/path/to/logs I'd like to be able to do the same with the recent azure blob storage file system.
I believe the GCS file system is built into the main tensorflow repo and so I assume gets packaged up and available when tensorboard is built also.
can someone help me understand if this is possible? @damienpontifex, were you able to get this working?
@ms-lolo At the moment azure blob storage file system has been fully built into tensorflow-io, so you should be able to use import tensorflow_io as tfio and the file system will be ready when you run tensorflow:
import tensorflow as tf
import tensorflow_io as tfio
...
...
# using az://accountname/path/to/logs the same way as gs://bucket/path/to/logs
For tensorboard, in theory it should be similar as long as import tensorflow_io as tfio is present in your python program. Please give it a try and let us know if running into any issues.
Also a tutorial about azfs is available in https://www.tensorflow.org/io/tutorials/azure
thanks for the response, @yongtang, I'm a little confused by this ticket then: https://github.com/tensorflow/tensorboard/issues/2424
I guess it's not clear to me how this support works when I am trying to run tensorboard as a terminal command and not as part of a notebooks. I am specifically trying to run something like tensorboard --logsdir az://accountname/path/to/logs but it sounds like the import you are referring to needs to happen in the tensorboard startup process. Is that right?
@ms-lolo Yes the import tensorflow_io as tfio needs to happen inside the python script when tensorboard tries to run tensorflow. I am not very familiar with tensorboard but I would assume the change will not be big (except need to find the right place to add it).
ok last question! the docs you linked mention TF_AZURE_STORAGE_KEY, do you know if tensorflow-io supports using user and system managed identities for the blob operations? This should be possible if tensorflow-io is using the python blob sdk and uses something like the DefaultAzureCredential class: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/identity/azure-identity#authenticating-with-defaultazurecredential
The main requirement for us is to not have access to SAS tokens or other secrets.
@ms-lolo We use Azure Storage CPP SDK (https://github.com/Azure/azure-storage-cpplite) so in theory it conforms to the same methods like python SDK. Can you give it a try, and, in case it is not supported on tensorflow-io yet, report back? We will fix any issues if needed.
Just tested and I am seeing 404 errors when trying to run tf.io.gfile.mkdir(pathname). I think this is the default error azure returns when you lack permissions to a location (so it doesn't expose information about the location existing). I didn't set any environment variables and just ran something like this:
pip install tensorflow-io
import os
import tensorflow as tf
import tensorflow_io as tfio
pathname = 'az://[account]/[container]/foo'
tf.io.gfile.mkdir(pathname)
The command hangs for a dozen seconds or so before giving me a 404 error.
The docs also mention a azfs:// scheme being registered but using that gives me an immediate error of UnimplementedError: File system scheme 'azfs' not implemented when I try that. I'm guessing the docs are maybe out of date and az:// is the new scheme.
if it helps, this is how I would access the same location using the python blob sdk (just listing the blobs at this path):
from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient
credential = DefaultAzureCredential()
container_client = ContainerClient(
account_url=f"https://[account].blob.core.windows.net/",
container_name="[container]",
credential=credential)
blobs = container_client.list_blobs(name_starts_with="foo/")
The important part is that this works without specifying any authentication secrets. If this code runs on a machine that has the appropriate access permissions, the code will run and be authenticated automatically.
- Has tensorboard been changed to import tf io already? @yongtang
- I think we can add support for MSI to the azure file system bindings. @ms-lolo maybe let’s have a separate issue for that?