zenml
zenml copied to clipboard
Fix step logging when using GCS Artifact Store
Open Source Contributors Welcomed!
Please comment below if you would like to work on this issue!
Contact Details [Optional]
What happened?
There seems to be an issue with StepLogging when using GCS (Google Cloud Storage) as the artifact store. Specifically, only the last parts of the logs appear in the file, which suggests a problem with the log writing or saving mechanism.
Steps to Reproduce
Here's a snippet to reproduce the issue:
import gcsfs
from zenml.client import Client
from zenml.logging.step_logging import StepLogsStorage
client = Client()
_ = client.active_stack
TEST_FILE="gs://<<your_bucket>>/test_log.log"
log_storage = StepLogsStorage(logs_uri=TEST_FILE, max_messages=5)
for i in range(0,11):
log_storage.write(f"I'm log line #{i}")
log_storage.save_to_file()
fs = gcsfs.GCSFileSystem()
with fs.open(TEST_FILE, 'r') as f:
all_of_it = f.read()
print(all_of_it)
Expected Behavior
All log lines should be saved and visible in the GCS file, not just the last few.
Potential Solution
Consider using the logging.StreamHandler facility to temporarily write logs to the remote file (GCS, S3, etc.). Here's an example:
import logging
import fsspec
f = fsspec.open("gs://<<my_gcs_bucket>>/test_log.log", "w")
with f as of:
log_handler = logging.StreamHandler(of)
logger = logging.getLogger() # Root logger
logger.addHandler(log_handler)
for i in range(0, 5000):
logger.warning(f"I'm log line #{i}")
logger.removeHandler(log_handler)
This approach could fit nicely in the StepLogsStorageContext class.
Additional Context
Proper log handling is crucial for debugging and monitoring pipeline performance, especially when dealing with large-scale data processing in cloud environments.
Code of Conduct
- [ ] I agree to follow this project's Code of Conduct
Hello @strickvl, I'm trying to reproduce this issue but can't. I made a GCS bucket and tried to run the first snippet and got the following error. Please let me know if you need the traceback.
ValueError: No file systems were found for the scheme: gs://. Please make sure that you are using the right path and the all the necessary integrations are properly installed.
The error was raised for the following line,
log_storage = StepLogsStorage(logs_uri=TEST_FILE, max_messages=5)
Here I'd patch in @bcdurak who I think was most involved with that particular part of the codebase. I think he should be able to help with this. Other things to check:
- make sure gcfs is installed?
- try a simpler GCFS-related example to make sure it's not a permissions issue etc? something like:
import gcsfs
fs = gcsfs.GCSFileSystem()
with fs.open('gs://your-bucket-name/test.txt', 'w') as f:
f.write('Hello, world!')
with fs.open('gs://your-bucket-name/test.txt', 'r') as f:
print(f.read())
(Replace 'gs://your-bucket-name/test.txt' with a valid path in your GCS bucket.)
Thank you for the code you provided. I did have some permission issues, which I resolved after trying this code, and the code provided correctly prints Hello, world!. However, the previous error I got persists even now.
ValueError: No file systems were found for the scheme: gs://. Please make sure that you are using the right path and the all the necessary integrations are properly installed.
EDIT:
I think I understand the source of this error. I have attached the traceback below. The code uses fileio to open the URI which raises error. Instead, at this step, gcsfs needs to be used like in the previous code provided.
I think I see what's going on now. Are you running the code with a GCS artifact store configured in your ZenML stack? (fileio will use whatever stack you have configured and set up for ZenML, so if you have a GCS artifact store then it should work).
I see. I tried to setup a GCS artifact store but am facing some errors. I don't understand a few steps and will first acquaint myself. Could you please assign me to this issue?
I was able to reproduce the issue. The output I get for the initial code is
I'm log line #10
I will now work on solving the issue.
@strickvl I have fixed the issue locally and I'm getting the expected output as shown below
However I'm facing an issue in following the Contributions guidelines. While running the command mypy --install-types I get the error error: Can't determine which types to install with no files to check (and no cache from previous mypy run). Could you please help with this?
Also, while opening a pull request, I read this pre-requisite: I have added tests to cover my changes. To fix the bug I made a change to src/zenml/logging/step_logging.py. So I think I need to add tests, but I'm not sure how to do this. Request help on this.
For our cloud integrations, it's enough to demonstrate that you've tested it. We don't currently run integration tests on cloud environments, so basically for something like this it wouldn't be possible to test it locally. Icing on the cake would be to include instructions how someone from the core team could reproduce your local test (code snippet and reminder of what the stack setup would be) in the PR, but beyond that I think you're ok.
Also for mypy I think you can ignore that and just make the PR. Any issues will be revealed there.