Saaketh Narayan
Saaketh Narayan
@Oktai15 are you still seeing this with the latest version of streaming?
Closing out this issue as it has been inactive for a while.
Hey @con-bren, were you able to resolve this?
@LWprogramming You can also start writing shard files to a different directory and use the [`merge_index`](https://github.com/mosaicml/streaming/blob/3ba93010ce86cba1dc4d51ab9977c8cbbbbfb2c9/streaming/base/util.py#L219) function to combine the index files from multiple directories! But to your point, starting...
Hey @cinjon, were you able to resolve this? What was your approach?
Hmm interesting...normally, you shouldn't need to call `clean_stale_shared_memory()` at the start of your training script. Is this causing issues during training for you?
@rishabhm12 You should also make sure to set `persistent_workers=True` in the DataLoader so that workers are not shut down after each epoch, and the workers' dataset instances will stay alive....
@rishabhm12 This should solve the utilization drop if the issue was re-creating the worker StreamingDatasets. As I don't have your script, I don't know the exact improvement it will give...
@rishabhm12 Ah that's not good, mind sending over a version of your training script we can repro? Would love to get to the bottom of this. Also, I don't think...
@miguelalba96 @rishabhm12 Can you make sure that the correct `batch_size` is being passed to both `StreamingDataset` and the DataLoader? This batch size should be per-device. If that's correct, then can...