litdata
litdata copied to clipboard
Prints inside the worker processes mess up the progress bar
🐛 Bug
In my code, I am enabling a tqdm bar per worker with:
global_rank = int(os.environ["DATA_OPTIMIZER_GLOBAL_RANK"])
num_workers = int(os.environ["DATA_OPTIMIZER_NUM_WORKERS"])
local_rank = global_rank % num_workers
for example in tqdm(data, position=local_rank):
tokens = tokenizer.encode(example)
yield tokens
But litdata prints this in each rank:
Rank 3 inferred the following `['no_header_tensor:16']` data format.
Breaking the tqdm bars at the beginning.
Since this print doesn't seem very useful for users, I would suggest that it is removed or put under fast_dev_run or a similar verbose-like flag.
Hi! thanks for your contribution!, great first issue!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.