Saaketh Narayan

Results 51 comments of Saaketh Narayan

@Oktai15 are you still seeing this with the latest version of streaming?

Closing out this issue as it has been inactive for a while.

Hey @con-bren, were you able to resolve this?

@LWprogramming You can also start writing shard files to a different directory and use the [`merge_index`](https://github.com/mosaicml/streaming/blob/3ba93010ce86cba1dc4d51ab9977c8cbbbbfb2c9/streaming/base/util.py#L219) function to combine the index files from multiple directories! But to your point, starting...

Hey @cinjon, were you able to resolve this? What was your approach?

Hmm interesting...normally, you shouldn't need to call `clean_stale_shared_memory()` at the start of your training script. Is this causing issues during training for you?

@rishabhm12 You should also make sure to set `persistent_workers=True` in the DataLoader so that workers are not shut down after each epoch, and the workers' dataset instances will stay alive....

@rishabhm12 This should solve the utilization drop if the issue was re-creating the worker StreamingDatasets. As I don't have your script, I don't know the exact improvement it will give...

@rishabhm12 Ah that's not good, mind sending over a version of your training script we can repro? Would love to get to the bottom of this. Also, I don't think...

@miguelalba96 @rishabhm12 Can you make sure that the correct `batch_size` is being passed to both `StreamingDataset` and the DataLoader? This batch size should be per-device. If that's correct, then can...