ibis icon indicating copy to clipboard operation
ibis copied to clipboard

bug: race condition in Table.cache()

Open NickCrews opened this issue 1 year ago • 4 comments

What happened?

This is low-priority, but maybe it is an easy fix.

Consider the implementation of the RefCountedCache:

https://github.com/ibis-project/ibis/blob/38e7e14d8e13c49db3561d6fc251d77542252f92/ibis/common/caching.py#L68-L80

It is possible (and ive run into it a few times, like 3% of the time??) for the .populate() to succeed, which creates a table in the backend, but then a KeyboardInterrupt to happen, and the self.cache[key] = self.lookup(name) is never executed. Then, the next time table.cache() is called, the CREATE TABLE DDL is executed again (because the key isn't in the cache), and this causes a "the table already exists" error. When this has happened, the solution for me has been to just restart the whole process, not terribly painful but wastes 10 minutes to re-run everything up until then.

The easy method would be to change it to to a CREATE OR REPLACE TABLE statement. The more correct method would be to do some more clever sort of synchronization.

What version of ibis are you using?

main

What backend(s) are you using, if any?

No response

Relevant log output

No response

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

NickCrews avatar Apr 03 '24 17:04 NickCrews

Thanks for the issue, nice find 🔍

I wonder if CREATE TABLE IF NOT EXISTS would be the right thing to do here. If the table was created, then it should be populated and so attempting to populate should take no action.

cpcloud avatar Apr 03 '24 20:04 cpcloud

that sounds like the correct thing to do. This is motivation to expose the "IF NOT EXISTS" functionality in .create_table().

Currently we have

  • overwrite=True: overwrite always
  • overwrite=False: write if not exists, raise if exists

If we were starting with no baggage I would say we should drop overwrite and instead use if_exists: Literal["error", "replace", "skip"] = "error", but IDK if we want to have that large of a migration path

I can't think of how to add an option to the current overwrite option that sounds intuitive: "if not exists"?, "never"?

NickCrews avatar Apr 04 '24 16:04 NickCrews

We had force in create_table, and got rid of it due to deeming mostly not that useful.

Since this particular bug is an internal use of create_table, I'm not sure we should use that to justify adding back the force-equvialent. That said, fixing this bug with my suggested will be a lot less code if I could do what I need to do with a flag.

cpcloud avatar Apr 04 '24 16:04 cpcloud

Playing around with this some, I think we should keep overwrite to avoid lots of breakage, and add another boolean parameter to indicate whether to skip if the table already exists.

We have overwrite in a bunch of other places (insert, attach_*) and I don't think it makes sense to force everything to be consistent with an if_* paradigm.

cpcloud avatar Apr 07 '24 12:04 cpcloud

Our implementation of this has changed significantly, and we no longer manually track refs, so I'm going to close this as stale.

cpcloud avatar Jul 16 '24 20:07 cpcloud