streaming
streaming copied to clipboard
A Data Streaming Library for Efficient Neural Network Training
## Description of changes: remove part of the warning messages about v0.7.0 of streaming. ## Issue #, if available: ## Merge Checklist: _Put an `x` without space in the boxes...
## Environment - OS: Databricks runtime 15.3ML with mosaicml streaming 0.8.1. - Hardware (GPU, or instance type): g4dn.12xlarge ## To reproduce Steps to reproduce the behavior: ``` def get_dataloader_with_mosaic(path, batch_size,...
I am trying to use Streaming dataset and Streaming data loader, but it almost never works after crash/stopped training. A typical scenario 1. Start training on multi GPU 1. Cancel...
## 🚀 Feature Request Add sparse Numpy Arrays as supported field ## Motivation I was trying to serialize ~150 of overlapping binary masks (think SAM autogenerated masks) / Image to...
I am working on a file system that loves few huge files and hates many small files. To this end, I would simply set `size_limit=None` when creating a dataset using...
ENV: - Ubuntu 22.04.4 LTS - a2-ultragpu-8g ( 8 a100) - torch==1.13.1 - DDP - Data resides in gcs and we are using a service account I am trying to...
Loading large dataset gives an error: "MemoryError: Unable to allocate 202. GiB for an array with shape (27084389376,) and data type int64" ``` self.num_shards 237460 self.num_samples 27084388471 ``` ## Environment...
## Description of changes: LocalUploader can be used by fuse-mount file system which may be not as reliable. We have seen upload finishing but the shard files are missing, however...