reader now supports .send() syntax for specifying index
reader now supports .send() syntax for specifying index, MapDataset sets up iterator and uses it with .send()
retry of https://github.com/vahidk/tfrecord/pull/96
I added .send() functionality to reader to let it seek to somewhere in the index before returning a value. I added a TFRecordMapDataset that creates the generate and calls an initial next() on it to get one value and then is capable of setting indices.
Dataset requires a .tfindex has been built.
I don't really like this because we have to call next() on the iterator once and basically throw away the first value before being able to index, but it's a pretty minimal change to reader. Maybe I can refactor things to fix this dummy call.
I added a new flag map_access everywhere. Also the TFRecordMapDataset now has a try: block for the first next(iter), though I think we don't need it because I changed the yields to not call the pb2 parsing.
This is a lot more verbose than the original commit but its about 150% faster than the first commit for me. But I'm also loading ~7000 TFRecordMapDatasets and indexing randomly into them so dummy iter overhead is real.
This was broken but now it actually can send to the inner iterator
I don't believe the right strategy is to hack random access into the iterator, I think the right way is to refactor the code not to use an iterator but a class. Given the magnitude of the change I'll close this PR. I might take a stab at it if I get a chance. Note also random access for tfrecord is not encouraged. The whole point of this library is that it's fast because it does sequential access.
Yeah, agreed, what I had suggested was really ugly in a round peg -> square hole kind of way.
I have a situation where I am creating pairs from samples that are offset from each other in the same record, maybe there's a more elegant queue like system I can just implement in my own code. Anyways, thanks for following up!