pyes icon indicating copy to clipboard operation
pyes copied to clipboard

Need a way to access data from a bulk that just failed

Open radu-gheorghe opened this issue 12 years ago • 3 comments

The issue I'm facing is that if a flush_bulk() fails for some reason, I'm basically losing the data I've tried to index. Or at least I couldn't find any way to do stuff like:

  • write the failed bulk to a file
  • try to index a smaller bulk, containing a part of the failed bulk

The way I see it is because of this code:

        with self.bulk_lock:
            if forced or len(self.bulk_data) >= self.bulk_size:
                batch = self.bulk_data
                self.bulk_data = []
            else:
                return None

which makes self.bulk_data empty before knowing the bulk result. On the other hand, I don't know how to access "batch" from my script because it's a local variable.

So maybe a nice fix to this would be to replace "batch" with something like "self.last_batch" that I could access later?

There might be other options, and I'd gladly submit a patch. Although I've never done that before but I think I'll figure it out :)

Thanks in advance.

radu-gheorghe avatar Apr 09 '12 07:04 radu-gheorghe

I'm studying the best way to provide different bulk helpers for the bulk insert operation (TimedBulker, ...)

So you'll can implement, your Bulker with your strategy.

I'll try to implement it tonight

aparo avatar Apr 10 '12 19:04 aparo

That's really cool, thanks a lot!

What's TimedBulker about? Flushing the bulk if a certain time passed since the last flush? Because I'm doing this in my application by watching the value of "self.bulk_size".

radu-gheorghe avatar Apr 11 '12 08:04 radu-gheorghe

Yes this is the idea. I like to have some different bulkers for common usages. A timed one (with limit) is very useful to low latency update/inserting tasks.

aparo avatar Apr 11 '12 08:04 aparo