tensorboard Unable to Retrieve Embedding Arrays From TensorBoard Logs

Unable to Retrieve Embedding Arrays From TensorBoard Logs

Open Louagyd opened this issue 1 year ago • 1 comments

I am encountering difficulties in retrieving embedding arrays that were logged using add_embedding from TensorBoard logs. I am unable to locate the actual embedding arrays. Below is a detailed description of the issue and the steps I have taken so far.

Steps to Reproduce Logging Embeddings:

I used add_embedding to log embeddings in TensorBoard. Example code for logging embeddings:

from torch.utils.tensorboard import SummaryWriter
import numpy as np

# Create a SummaryWriter
log_dir = 'logs/embedding_example'
writer = SummaryWriter(log_dir)

# Generate some dummy embeddings
embedding_data = np.random.randn(100, 64)  # 100 items with 64-dim embeddings
metadata = [f'Label {i}' for i in range(100)]

# Write the embeddings
writer.add_embedding(mat=embedding_data, metadata=metadata, global_step=1)

writer.close()
Attempting to Retrieve Embeddings:

I tried using EventAccumulator to load and parse the event files but was unable to locate the embedding arrays. Example code for extracting embeddings:

import os
import numpy as np
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator

def extract_embeddings_from_log(log_dir):
    event_acc = EventAccumulator(log_dir, size_guidance={'tensors': 0})
    event_acc.Reload()

    embeddings = {}

    # Get tags for tensors (embeddings should be listed here)
    tensor_tags = event_acc.Tags()
    print(tensor_tags)

I would appreciate any guidance or suggestions on how to properly retrieve the embedding arrays logged using add_embedding. Specifically, I am looking for:

Confirmation on whether add_embedding embeddings should be accessible through EventAccumulator.
Corrections to my approach or alternative methods to extract the embeddings.
Any additional information on the correct tags or structures to look for within the TensorBoard logs.

Environment Details Framework: PyTorch Logging Library: TensorBoard TensorBoard Version: 2.16.2 Python Version: 3.10 Operating System: Ubuntu 22.04

Thank you for your assistance.

Jul 15 '24 03:07 Louagyd

Embeddings are treated differently than other logs as they are really part of the projector plugin. As a result they are written to a separate file projector_config.pbtxt and only read in by the projector plugin.

I'm not sure exactly what you're trying to read out, but you may find success using something like this.

import os
import tensorflow as tf
from google.protobuf import text_format
from tensorboard.plugins import projector

with tf.io.gfile.GFile(
    os.path.join(logdir, "projector_config.pbtxt")
) as f:
    config2 = projector.ProjectorConfig()
    text_format.Parse(f.read(), config2)
    print(config2)

Jul 17 '24 18:07 rileyajones

tensorboard tensorboard copied to clipboard

Unable to Retrieve Embedding Arrays From TensorBoard Logs

tensorboard
tensorboard copied to clipboard