How can we track the progress of UMAP for large data-set
For huge data-set is there a way where we can print something or track progress bar?
um = umap.UMAP(n_neighbors=10,min_dist=0.1,n_components=120,metric='cosine') vec_agent = um.fit_transform(embeddings.tolist(),show_progress_bar=True)
um = umap.UMAP(n_neighbors=10, min_dist=0.1, n_components=120, metric='cosine', verbose=True)
Will provide some verbose output while it works, giving at least some feedback. The current main branch on github also has a PR merged in that provides progress bars (a release to pip including this hasn't happened yet).
can we use callback function to update a tqdm progress bar after each iteration?
If I want to programmatically access the progress (so I can notify some external service every 100 epochs for example) is that possible?
As it stands you can potentially pass tqdm_kwds with a file argument that provides a buffer where tqdm will write progress to, and monitor that.
If you need something a little more specific then you can always patch the code here: https://github.com/lmcinnes/umap/blob/27a89123bf10fcb7678afb9b4474ea59a8fa50d8/umap/layouts.py#L411-L415 with a call out to whatever you wish (n is the epoch number).
In principle I could add a callback function; I was unsure what that should look like, so I didn't. I am open to suggestions though.
The buffer solution worked pretty well for me, doing something like this:
class ProgressWriter:
def write(self, text):
match = re.search(r"(\d+)/(\d+)", text)
if match:
n, total = map(int, match.groups())
print("custom progress", n, total)
# custom reporting logic here
def flush(self):
pass
tqdm_kwds = {"file": progress_writer", disable": False }
reducer = umap.UMAP(
n_neighbors=25,
min_dist=0.05,
metric='cosine',
tqdm_kwds=tqdm_kwds,
verbose=True,
)
thank you!