pyes
pyes copied to clipboard
Need a way to access data from a bulk that just failed
The issue I'm facing is that if a flush_bulk() fails for some reason, I'm basically losing the data I've tried to index. Or at least I couldn't find any way to do stuff like:
- write the failed bulk to a file
- try to index a smaller bulk, containing a part of the failed bulk
The way I see it is because of this code:
with self.bulk_lock: if forced or len(self.bulk_data) >= self.bulk_size: batch = self.bulk_data self.bulk_data = [] else: return None
which makes self.bulk_data empty before knowing the bulk result. On the other hand, I don't know how to access "batch" from my script because it's a local variable.
So maybe a nice fix to this would be to replace "batch" with something like "self.last_batch" that I could access later?
There might be other options, and I'd gladly submit a patch. Although I've never done that before but I think I'll figure it out :)
Thanks in advance.
I'm studying the best way to provide different bulk helpers for the bulk insert operation (TimedBulker, ...)
So you'll can implement, your Bulker with your strategy.
I'll try to implement it tonight
That's really cool, thanks a lot!
What's TimedBulker about? Flushing the bulk if a certain time passed since the last flush? Because I'm doing this in my application by watching the value of "self.bulk_size".
Yes this is the idea. I like to have some different bulkers for common usages. A timed one (with limit) is very useful to low latency update/inserting tasks.