umap icon indicating copy to clipboard operation
umap copied to clipboard

How can we track the progress of UMAP for large data-set

Open pratikchhapolika opened this issue 4 years ago • 5 comments

For huge data-set is there a way where we can print something or track progress bar?

um = umap.UMAP(n_neighbors=10,min_dist=0.1,n_components=120,metric='cosine') vec_agent = um.fit_transform(embeddings.tolist(),show_progress_bar=True)

pratikchhapolika avatar Sep 14 '21 05:09 pratikchhapolika

um = umap.UMAP(n_neighbors=10, min_dist=0.1, n_components=120, metric='cosine', verbose=True)

Will provide some verbose output while it works, giving at least some feedback. The current main branch on github also has a PR merged in that provides progress bars (a release to pip including this hasn't happened yet).

lmcinnes avatar Sep 14 '21 13:09 lmcinnes

can we use callback function to update a tqdm progress bar after each iteration?

jackyko1991 avatar Apr 14 '23 16:04 jackyko1991

If I want to programmatically access the progress (so I can notify some external service every 100 epochs for example) is that possible?

enjalot avatar Apr 22 '23 05:04 enjalot

As it stands you can potentially pass tqdm_kwds with a file argument that provides a buffer where tqdm will write progress to, and monitor that.

If you need something a little more specific then you can always patch the code here: https://github.com/lmcinnes/umap/blob/27a89123bf10fcb7678afb9b4474ea59a8fa50d8/umap/layouts.py#L411-L415 with a call out to whatever you wish (n is the epoch number).

In principle I could add a callback function; I was unsure what that should look like, so I didn't. I am open to suggestions though.

lmcinnes avatar Apr 22 '23 13:04 lmcinnes

The buffer solution worked pretty well for me, doing something like this:

class ProgressWriter:
    def write(self, text):
        match = re.search(r"(\d+)/(\d+)", text)
        if match:
            n, total = map(int, match.groups())
            print("custom progress", n, total)
            # custom reporting logic here

    def flush(self):
        pass

tqdm_kwds = {"file": progress_writer", disable": False }

reducer = umap.UMAP(
    n_neighbors=25,
    min_dist=0.05,
    metric='cosine',
    tqdm_kwds=tqdm_kwds,
    verbose=True,
)

thank you!

enjalot avatar Apr 24 '23 22:04 enjalot