gitea icon indicating copy to clipboard operation
gitea copied to clipboard

Make TryInsert functions within the packages module use INSERT ... ON CONFLICT

Open zeripath opened this issue 3 years ago • 16 comments

The TryInsert* functions within the packages models make incorrect assumptions about transactional isolation within most databases. It is perfectly possible for a SELECT to return nothing but an INSERT fail with a duplicate in most DBs as it is only INSERT that the locking occurs.

This PR introduces a common InsertOnConflictDoNothing function which will attempt to INSERT provided bean but will return 0 as the number of rows affected if there is a conflict.

Fix #19586

Signed-off-by: Andrew Thornton [email protected]

zeripath avatar Sep 04 '22 20:09 zeripath

ugh looks like my worries about transactions were right - we're gonna need to use an upsert style...

zeripath avatar Sep 05 '22 00:09 zeripath

I'm going to have to mark this WIP as we actually have to use the UPSERT forms otherwise the transactions will fail.

zeripath avatar Sep 05 '22 08:09 zeripath

OK I think this is now ready for review.

I do think this kind of upsert style is a far better way of doing this than the current hacky and really quite racy style.

zeripath avatar Feb 07 '23 11:02 zeripath

Are there dedicated tests for InsertOnConflictDoNothing with all databases? Otherwise it would be very difficult to debug or improve it.

wxiaoguang avatar Feb 07 '23 12:02 wxiaoguang

Codecov Report

Merging #21063 (1a6f5df) into main (e72290f) will increase coverage by 47.12%. The diff coverage is 39.75%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff            @@
##           main   #21063       +/-   ##
=========================================
+ Coverage      0   47.12%   +47.12%     
=========================================
  Files         0     1149     +1149     
  Lines         0   151737   +151737     
=========================================
+ Hits          0    71510    +71510     
- Misses        0    71778    +71778     
- Partials      0     8449     +8449     
Impacted Files Coverage Δ
routers/api/packages/container/blob.go 63.07% <ø> (ø)
models/db/common.go 37.78% <37.74%> (ø)
models/packages/package.go 47.16% <62.50%> (ø)
models/packages/package_version.go 85.16% <62.50%> (ø)
models/packages/package_file.go 76.42% <66.66%> (ø)

... and 1144 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Feb 07 '23 15:02 codecov-commenter

If I understand correctly, the test cases haven't cover the behavior of comment:

// This function will update the ID of the provided bean if there is an insertion

That's the only test code for asserting m.ID, but it only tests a succeeded insertion

Update: out-dated

		_, _ = e.Exec("DELETE FROM one_unique")

		has, err := db.InsertOnConflictDoNothing(ctx, &OneUnique{})
		assert.Error(t, err)
		assert.False(t, has)

		toInsert := &OneUnique{Data: "test"}

		has, err = db.InsertOnConflictDoNothing(ctx, toInsert)
		assert.NoError(t, err)
		assert.True(t, has)
		assert.NotEqual(t, 0, toInsert.ID)

wxiaoguang avatar Feb 08 '23 01:02 wxiaoguang

If I understand correctly, there are still some problems in code:

  1. the has is misleading.

    • If insertion succeeds, it returns true, nil, if insertion fails, it returns false, nil
    • So the meaning of bool in (bool, error) is: whether the insertion succeeds, not about whether the record existed before.
  2. assert.NotEqual(t, int(0), int64(0)) always succeeds, But the ID is a int64, so you must use NotEqualValues or NotEmpty, otherwise the test code like assert.NotEqual(t, 0, toInsert.ID) always succeeds.

  3. There is only one ON CONFLICT DO NOTHING for SQLite3, I do not understand why it would return the ID for the duplicated insertion.

  4. The MySQL is highly likely needed to use update id=last_insert_id(id), but not update id=id. The second one doesn't seem the proper way to update the LAST_INSERT_ID for MySQL.

wxiaoguang avatar Feb 08 '23 02:02 wxiaoguang

Looks like things become too complex to difficult to review.

lunny avatar Feb 08 '23 06:02 lunny

If I understand correctly, the test cases haven't cover the behavior of comment:

It feels a bit like you're being deliberately obstructive and I don't quite understand why you're being this way.

This might be a language issue - but there are better ways to say I think you should add a line that checks if the ID remains at 0.

To which I might reply why would we care and why are you being so bloody difficult about it? That the ID remains unchanged is irrelevant. Fine I'll add it but honestly it's just making the test less clear and less easy to read for zero real world benefit.

It feels like you're targeting me for some reason and I don't understand why. You left maintainers - was it because of me?

If I understand correctly, there are still some problems in code:

  1. the has is misleading.

    • If insertion succeeds, it returns true, nil, if insertion fails, it returns false, nil
    • So the meaning of bool in (bool, error) is: whether the insertion succeeds, not about whether the record existed before.

I was using has as in «has been inserted» but I agree inserted would be better. Frankly its minor and you should just say use inserted instead of has instead of making some major issue about it.

  1. assert.NotEqual(t, int(0), int64(0)) always succeeds, But the ID is a int64, so you must use NotEqualValues or NotEmpty, otherwise the test code like assert.NotEqual(t, 0, toInsert.ID) always succeeds.

Again just make a suggestion. It's a good catch that I forgot about it but your tone is frankly insulting.

  1. There is only one ON CONFLICT DO NOTHING for SQLite3, I do not understand why it would return the ID for the duplicated insertion.

I don't understand your question?

There are two cases:

  1. INSERT proceeds because there is no conflict. The SQLite driver will return the inserted id because SQLite always does not.
  2. INSERT fails because there is a conflict.

In the second case we do not get an ID. Nor do we attempt to use it because 0 rows are affected.

  1. The MySQL is highly likely needed to use update id=last_insert_id(id), but not update id=id. The second one doesn't seem the proper way to update the LAST_INSERT_ID for MySQL.

This code is not attempting to update the id in fact it is deliberately not attempting to cause an update to the id - without the ON DUPLICATE KEY UPDATE id = id there is no ON CONFLICT DO NOTHING.

zeripath avatar Feb 08 '23 14:02 zeripath

It feels a bit like you're being deliberately obstructive and I don't quite understand why you're being this way.

I am not obstructive, but really want to help, I have tested a lot on my side.

It depends on the ID-behavior mentioned in https://github.com/go-gitea/gitea/pull/21063#discussion_r1098526327

I have tested SQLite3 and MySQL. I do not think your code could work for some cases, and the tests were incorrect.

For SQLite, insert ... on conflict update set id=id returning id (do nothing doesn't work and won't return the id)

For MySQL, insert ... on duplicate key update id=last_insert_id(id)

And the test has to cover: insert(keyA), insert(keyB), insert(keyA) and assert getting A-id

That's all.

It feels like you're targeting me for some reason and I don't understand why. You left maintainers - was it because of me?

No, when reviewing, I only care about the code and solution are right or not. Maybe you are also too strict for some problems, so there are more likely divergences. There are also some strong objections when my PRs get reviewed.

but your tone is frankly insulting.

Sorry I didn't mean that. But just want make the problem clear. Could you suggest about how to express the problem more gently. ps: I have made some suggestions, but I didn't find that you had got the point, and I also got "Why don't you try it", maybe I also misunderstood something, but the problems are indeed there.

We can have a private chat with the SQL problems later.

This code is not attempting to update the id in fact it is deliberately not attempting to cause an update to the id - without the ON DUPLICATE KEY UPDATE id = id there is no ON CONFLICT DO NOTHING.

There is still no comment about why INSERT IGNORE doesn't work for MySQL in such case. (there was just a joking for the first question https://github.com/go-gitea/gitea/pull/21063#discussion_r1098796187 , I am not good enough to guess).


And, maybe I really misunderstood something -- about the design or about the logic (no comment nor test at the beginning), I do not know. Maybe the divergence starts here https://github.com/go-gitea/gitea/pull/21063#discussion_r1098526327: "The code auto fills the ID of the bean in the same way that xorm's Insert() does.": The bean should has the ID filled when there is no error.

So feel free to dismiss all my comments. Thank you for your work!

wxiaoguang avatar Feb 08 '23 14:02 wxiaoguang

We don't use or get the ID when the INSERT fails because there is a conflict because OnConflictDoNothing.

I have tested SQLite3 and MySQL. I do not think your code could work for some cases, and the tests were incorrect.

For SQLite, insert ... on conflict update set id=id returning id (do nothing doesn't work and won't return the id)

Clearly I need to separate out the code paths here.

There is no RETURNING in the SQLite code. The RETURNING is only used in Postgres.

SQLite will always return the last inserted ID as part of its driver so we don't need to do that.

If there is a conflict we don't use it - we do nothing.

For MySQL, insert ... on duplicate key update id=last_insert_id(id)

No. As far as I can see that would cause every conflict to cause an update of the primary key (or fail.)

The code as written: ON DUPLICATE KEY UPDATE id = id makes the UPDATE a no-op so that no row is changed i.e. NOTHING happens when there is a conflict.

Again the code is not attempting to return the ID when there is a conflict.

And the test has to cover: insert(keyA), insert(keyB), insert(keyA) and assert getting A-id

See above, the code does not promise to get the ID when there is a conflict. It promises to do nothing.

There is still no comment about why INSERT IGNORE doesn't work for MySQL in such case. (there was just a joking for the first question #21063 (comment) , I am not good enough to guess).

Using INSERT IGNORE didn't work in my testing when there was a primary key with autoincrement - and if I didn't instead use the ON DUPLICATE KEY UPDATE no-op it would error out. This may have been warnings being returned rather than errors.

It seems however now that INSERT IGNORE will just work by itself but I do wonder if the warnings that IGNORE could cause failures or complaints by users.

And, maybe I really misunderstood something -- about the design or about the logic (no comment nor test at the beginning), I do not know. Maybe the divergence starts here #21063 (comment): "The code auto fills the ID of the bean in the same way that xorm's Insert() does.": The bean should has the ID filled when there is no error.

Ah. Perhaps that is the problem.

I'm not promising to get the ID when there is no insert.

The function has comment on it that says it will update the ID of the provided bean if it is inserted. It doesn't say it will do that when it's not inserted - and in fact it will NOT do that - instead on conflict it will do nothing as the function name suggests.

// InsertOnConflictDoNothing will attempt to insert the provided bean but if there is a conflict it will not error out
// This function will update the ID of the provided bean if there is an insertion
// This does not do all of the conversions that xorm would do automatically but it does quite a number of them
// once xorm has a working InsertOnConflictDoNothing this function could be removed.

There were always tests - just not specific ones. The package repositories code has good coverage of this function. As requested though I've added some specific tests and covered and fixed some more edge cases which should make the function usable in other contexts too.

zeripath avatar Feb 09 '23 00:02 zeripath

Using INSERT IGNORE didn't work in my testing when there was a primary key with autoincrement - and if I didn't instead use the ON DUPLICATE KEY UPDATE no-op it would error out. This may have been warnings being returned rather than errors.

It seems however now that INSERT IGNORE will just work by itself but I do wonder if the warnings that IGNORE could cause failures or complaints by users.

From my knowledge, I always use INSERT IGNORE for all cases and haven't got any problem. That's why the question first came .... because in my mind:

  • INSERT IGNORE should be fine for all cases, I never thought any user would be surprised or affected by the warning. If XORM is affected by the warning, that's another story. (if there was a comment for it, I could also get the point as early as possible).
  • Then after being told that the IGNORE doesn't work for autoincrement, I thought that since the insert on dup update id=id trick is used, it must be done by purpose to get the conflicted ID according to another comment, then the discussion happens.

Thank you for letting me know more details.

Anyway, it's clear for me now. I think this PR is good to have, maybe it could help to remove the workaround mutex for the package code after it gets merged:

  • #21862
  • https://github.com/go-gitea/gitea/pull/21862/files#diff-239d7f1fa93717ce75c12a91af1cc9f9d585993f7a1da9b5a507606689df4994R39-R41

wxiaoguang avatar Feb 09 '23 03:02 wxiaoguang

Ah yes that mutex can be removed too.

Apologies for the confusion earlier - it just felt we were going round and round in circles for no good reason and I couldn't understand why.

zeripath avatar Feb 09 '23 19:02 zeripath

Thanks for your hard-working @zeripath . It really should be a part of XORM. :)

lunny avatar Mar 12 '23 12:03 lunny

How about taking this PR now, and move to XORM in the future?

The "workaround mutex" really seems fragile, if there is a chance, we could use a DB insert-ignore instead of that in-process mutex for a stable release.

wxiaoguang avatar Mar 12 '23 12:03 wxiaoguang

I think the key problem is the package property table have no unique keys. per https://github.com/go-gitea/gitea/pull/30335#issuecomment-2051982310

lunny avatar May 07 '24 06:05 lunny