Tom

Results 170 comments of Tom

Perhaps. But that would be an incompatible change. Sometimes people make bad picks for names and we are stuck with it. I'll try to improve the documentation.

version information is updated by "invoke newversion". That was broken, but I have fixed it now.

Note that resampling after splitting results in slightly uneven sample probabilities. The ResampledShards implementation works great for large scale training with fast object stores. This is the case on high...

Sorry, the FAQ was wrong. There are two methods: - `with_length(n)` adds a `__len__` method to the pipeline so that len(dataset) returns n. It does not actually change anything about...

The value 2 should be good enough. If your dataset is so unbalanced that 2 is not good enough, you'll get an error, and that's my preferred behavior.But you can...

Sorry this took so long. I have added an mtime option to the ShardWriter class, so you can set that to any floating point value you want. I also have...

Yes, adding TIFF decoding/encoding is easy for users to do. The reason it isn't done by default is because TIFF has many format variants and it is difficult to decode...

You are not repeating your training data infinitely, so this won't work. If you want exactly one permutation of the training data per epoch and you want this distributed equally...

You can use RandomMix to mix sources with arbitrary probabilities, or you can use MultiShardSmaple to sample at the shard level.