streaming icon indicating copy to clipboard operation
streaming copied to clipboard

A Data Streaming Library for Efficient Neural Network Training

Results 88 streaming issues
Sort by recently updated
recently updated
newest added

## Description of changes: remove part of the warning messages about v0.7.0 of streaming. ## Issue #, if available: ## Merge Checklist: _Put an `x` without space in the boxes...

## Environment - OS: Databricks runtime 15.3ML with mosaicml streaming 0.8.1. - Hardware (GPU, or instance type): g4dn.12xlarge ## To reproduce Steps to reproduce the behavior: ``` def get_dataloader_with_mosaic(path, batch_size,...

bug

I am trying to use Streaming dataset and Streaming data loader, but it almost never works after crash/stopped training. A typical scenario 1. Start training on multi GPU 1. Cancel...

bug

## 🚀 Feature Request Add sparse Numpy Arrays as supported field ## Motivation I was trying to serialize ~150 of overlapping binary masks (think SAM autogenerated masks) / Image to...

enhancement

I am working on a file system that loves few huge files and hates many small files. To this end, I would simply set `size_limit=None` when creating a dataset using...

bug

ENV: - Ubuntu 22.04.4 LTS - a2-ultragpu-8g ( 8 a100) - torch==1.13.1 - DDP - Data resides in gcs and we are using a service account I am trying to...

bug

Loading large dataset gives an error: "MemoryError: Unable to allocate 202. GiB for an array with shape (27084389376,) and data type int64" ``` self.num_shards 237460 self.num_samples 27084388471 ``` ## Environment...

bug

## Description of changes: LocalUploader can be used by fuse-mount file system which may be not as reliable. We have seen upload finishing but the shard files are missing, however...