Ohad Raviv comments

Results 13 comments of


                                            Ohad Raviv

IndexedSeq instead of Iterator in NearestNeighborIterator [Priority Queue Serialization Error]

Hi, we just came across the need to run this package. Spark's native `BucketedRandomProjectionLSH` wasn't good enough for (mainly because of the bucket skew issue), and this library worked perfectly....

IndexedSeq instead of Iterator in NearestNeighborIterator [Priority Queue Serialization Error]

Well, in my company (PayPal), we work with our private accounts in the public GitHub space, so we would have permissions also in the future. Maybe you still know someone...

No need for caching in sorted-iterator

I was also surprised that it just worked, but what happens is just that the parent blocks are saved in memory, and every time you need to read the next...

No need for caching in sorted-iterator

good point. I actually thought about that and forgot to check. but it looks like we're good. the sorted-iterator only has one pointer to [Node next](https://github.com/paypal/dione/blob/b102569cad81c4bc4e735e6e558cf472e2bfd27f/dione-hadoop/src/main/java/com/paypal/dione/avro/hadoop/file/AvroBtreeFile.java#L223) and the Node object...

No need for caching in sorted-iterator

ok.. so after looking at `get(key)` , I saw that we can hop backwards there if we were calling get() multiple times. So I added assertion to allow only bigger...

change to writing Avro B-tree blocks in pre-order

yeah.. it turned out to be relatively easy to implement. this one is the preliminary to the cache to work now #72 . and I tested #72 locally against s3...

change to writing Avro B-tree blocks in pre-order

not sure, currently the only place it is used (caching) is in `joinWithIndex`. are you using it? I mainly want to verify that we're not getting OOM during index file...

Reproducing strange bug

@eyala / @shay1bz - if you want to look at it.. we get this error `org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt` when I run the test...

Occasional corruption of index

@eyala - any update here?

Occasional corruption of index

@eyala - can we close this issue?