litdata icon indicating copy to clipboard operation
litdata copied to clipboard

Prints inside the worker processes mess up the progress bar

Open carmocca opened this issue 1 year ago • 2 comments

🐛 Bug

In my code, I am enabling a tqdm bar per worker with:

    global_rank = int(os.environ["DATA_OPTIMIZER_GLOBAL_RANK"])
    num_workers = int(os.environ["DATA_OPTIMIZER_NUM_WORKERS"])
    local_rank = global_rank % num_workers
    for example in tqdm(data, position=local_rank):
        tokens = tokenizer.encode(example)
        yield tokens

But litdata prints this in each rank:

Rank 3 inferred the following `['no_header_tensor:16']` data format.

Breaking the tqdm bars at the beginning.

Since this print doesn't seem very useful for users, I would suggest that it is removed or put under fast_dev_run or a similar verbose-like flag.

carmocca avatar Mar 24 '24 20:03 carmocca

Hi! thanks for your contribution!, great first issue!

github-actions[bot] avatar Mar 24 '24 20:03 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 16 '25 06:04 stale[bot]