James Knighton
James Knighton
Specifically, you could remove this bit of code: https://github.com/mosaicml/streaming/blob/cb8e872359643fa84782c4e95c496dc66e495c44/streaming/base/dataset.py#L509-L514 And replace it with something like ``` self._shm_prefix_int = 8 # I'm feeling lucky ``` If on Mac OSX, shmem paths...
Is your `/tmp` or equivalent directory world-readable and writeable? I'm thinking if we switched to files for registration and ensured 777 perms, it should be fine cross-user? > If there...
If 1 GPU is fine but 8 hang, are you setting the env vars? https://docs.mosaicml.com/projects/streaming/en/stable/fundamentals/environments.html
Hi @universome, Something I noticed while scanning threads, sorry haven't fully read everything... > P.S. I had to also change the shuffling strategy in such a way that next_epoch is...
> It's not really clear to me what the parameter does beyond that a high number is important for the improved algorithm for some reason Canonical nodes is how many...
Cycling/interleaving/etc multiple StreamingDatasets/StreamingDataLoaders has the potential to result in complicated situations when it comes to coordination. Instead, why not just use Streams? The sample space of a StreamingDataset is the...
You could pickle them, but as I understand, pickle will encode the images as their CHW byte arrays, which would be rather wasteful for larger images. Let me know if...
Hello from the author of the index.json and MDS, and that particular decision where we standardized all the ints needed by MDS itself to be u32 not u64. I had...
First, can you reduce shard size? Our rule of thumb is very approximately 32MB depending on a small number of mostly network factors like max concurrent connections and typical available...
Thanks for reaching out! With massive datasets, our serialization format choices are critical to ultimate observed performance of the system. If we really care about performance, we must own the...