huggingface_hub
huggingface_hub copied to clipboard
Tensorboard Not Displaying
Describe the bug
Since a few days ago the hub has been unable to render any tensorboards, instead displaying the following message
No dashboards are active for the current data set.
Probable causes:
You haven’t written any data to your event files. TensorBoard can’t find your event files.
I ran the exact training script but with a new hub_repo_id with report_to="tensorboard". The previous run from a week ago renders a tensorboard page, the new repo doesn't.
Reproduction
https://huggingface.co/models?library=tensorboard&sort=created
I see tensorboards working from ~3 days ago, but none render in the last 2 days.
Sorry if this is the wrong repo for a huggingface.co issue, please let me know where else I should submit it if it's the wrong place.
cc: @severo
Thanks for reporting @lapp0! This should be fixed now. Could you retry on your model and close this issue if appropriate? Thanks!
It's working now! Thanks for fixing quickly @Wauplin, great job!
Glad your problem's solved :) Kudos goes to @XciD @severo!
I'm still seeing this issue transiently, is it possible it was reintroduced?
Hi, I'm seeing this issue again @XciD @severo
It seems that
- Tensorboards aren't being populated with updated when provided new logs as of ~48 hours ago. Example
- ~~all other Tensorboards hang with a "Loading Tensorboard" message. Example: A model from 2022~~
Any chance we could get Tensorboard on https://status.huggingface.co/ ?
Edit:
Tensorboards no longer fail to start, however they're still not being updated with log files from last ~48 hours.
cc @XciD ^
Appears to be resolved, thanks!
Re-occurring.
Here's the hacky script I'm using to render a hub tensorboard locally.
python3 run.py distily/distily_dataset_sweep
It retrieves all tfevent files from a repo, puts them in a temporary directory, and starts a tensorboard locally under that directory.
import os
import tempfile
from huggingface_hub import list_repo_files, hf_hub_download
from tensorboard import program
import time
import sys
def download_tensorboard_files(model_repo_id, temp_dir):
# List all files in the Hugging Face Hub model repository
repo_files = list_repo_files(model_repo_id)
# Filter out only tensorboard event files (those containing "tfevents" in their name)
tb_files = [f for f in repo_files if 'tfevents' in f]
if not tb_files:
print("No tensorboard files found in the repository.")
return []
for tb_file in tb_files:
# Create subdirectories in the temp_dir as in the repo
subdir = os.path.join(temp_dir, os.path.dirname(tb_file))
os.makedirs(subdir, exist_ok=True)
# Download the tensorboard event file to the corresponding subdirectory
file_path = hf_hub_download(repo_id=model_repo_id, filename=tb_file, local_dir=subdir)
print(f"Downloaded: {file_path}")
return temp_dir
def run_tensorboard(log_dir):
tb = program.TensorBoard()
# Start TensorBoard pointing to the log_dir
tb.configure(argv=[None, '--logdir', log_dir])
url = tb.launch()
print(f"TensorBoard is running at: {url}")
# Infinite loop to keep the script running
try:
while True:
time.sleep(60)
except KeyboardInterrupt:
print("TensorBoard process terminated.")
def main(model_repo_id):
# Create a temporary directory
temp_dir = tempfile.mkdtemp()
try:
# Download TensorBoard files to the temporary directory
log_dir = download_tensorboard_files(model_repo_id, temp_dir)
if log_dir:
# Run TensorBoard on the downloaded log files
run_tensorboard(log_dir)
else:
print("No TensorBoard logs to visualize.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Optionally, cleanup the temporary directory after use
# shutil.rmtree(temp_dir) # Uncomment to clean up
pass
if __name__ == "__main__":
# Replace with your Hugging Face Hub model ID
model_repo_id = sys.argv[1]
main(model_repo_id)
Error re-occurring starting some time in the past few days.