phoenix icon indicating copy to clipboard operation
phoenix copied to clipboard

PHOENIX-6587 Disallow specifying explicit pre-split on salted tables

Open stoty opened this issue 3 years ago • 13 comments

stoty avatar Nov 17 '21 12:11 stoty

Do we already prevent create table on preexisting table with salt set?

dbwong avatar Nov 17 '21 19:11 dbwong

Do we already prevent create table on preexisting table with salt set?

I wouldn't want to prevent that. Copying/Loading a Phoenix data table with HBase tools into a cluster, and then running create table to re-create the Phoenix metadata is a supported and standard procedure, (even if I have seen people messing that up.)

I haven't tested it, but as we're not creating a new HBase table, we probably either error out, or just ignore the pre-split points in this case. I'm going to try out what happens in this case.

stoty avatar Nov 18 '21 04:11 stoty

Do we already prevent create table on preexisting table with salt set?

I wouldn't want to prevent that. Copying/Loading a Phoenix data table with HBase tools into a cluster, and then running create table to re-create the Phoenix metadata is a supported and standard procedure, (even if I have seen people messing that up.)

I haven't tested it, but as we're not creating a new HBase table, we probably either error out, or just ignore the pre-split points in this case. I'm going to try out what happens in this case.

I know we had a case in pre-production where we got region boundaries that overlapped the salt boundaries via restore or manual create table ontop of existing table. This led to wrong results due to phoenix's explicit assumption on the salted boundaries. Maybe a better approach for this worry is to check during create table that the table is split sanely if its salted etc.

dbwong avatar Nov 18 '21 05:11 dbwong

thank you @dbwong

Maybe a better approach for this worry is to check during create table that the table is split sanely if its salted etc.

I have started working on this, but got sidetracked by a deadline on another project. I'm about 50% done with the split verification solution, and hope to get that ready and tested by early next year.

stoty avatar Dec 05 '21 05:12 stoty

Note that new patch removes the SchemaUtil.processSplit() logic, which attempts to expend the split point to cover the whole PK.

My reasons removing it:

  • It uses arbitrary lengths when working with PKs with variable length fields, and does not (and can not, because the length is not fixed) cover the actual PK.
  • I could not figure why it was there in the first place.
  • It made adding the validation logic awkward

stoty avatar Dec 16 '21 10:12 stoty

Some more reasoning to remove SchemaUtil.processSplit() :

AFAICT it tries to emulate how HBase would automatically split a region. However, I could not find any reason why we would want to do that.

  • It is not necessary for correctness
  • It always adds an extra region to scan, as a salt key will always extend to an extra region (but this extra region is guaranteed to not have any rows)

stoty avatar Jan 04 '22 07:01 stoty

There two outstanding problems:

  • Upgrade tests fail, because the split points on the System tables use the old longer split points, and the update code errors out.
  • org.apache.phoenix.end2end.LocalIndexSplitMergeIT.testLocalIndexScanWithMergeSpecialCase() fails, but I cannot figure out why. Maybe the local index code dependes on the longer split points feature that I have removed ?

stoty avatar Jan 11 '22 09:01 stoty

Both could probably be fixed by re-adding the split point extesion code, but I'd like to understand it more first.

stoty avatar Jan 11 '22 09:01 stoty

org.apache.phoenix.end2end.LocalIndexSplitMergeIT.testLocalIndexScanWithMergeSpecialCase() doesn't even use salted tables. I must have broken something else in the patch.

stoty avatar Mar 23 '22 05:03 stoty

Turns out that split processing is not necessary for salted tables, but rather for all tables that have local indexes, which LocalIndexSplitMergeIT.testLocalIndexScanWithMergeSpecialCase() has luckily caught.

This of course means that we must keep the split processing.

This also means that we are already silently changing the manually specified split points. I think that we should just fix the split points specified for salted tables, as there is precedence for similar behaviour, and I think that this is the closet thing to doing what the user wanted.

stoty avatar Mar 23 '22 08:03 stoty

Could you please review the new version @dbwong ?

stoty avatar Mar 23 '22 12:03 stoty

However, I need to update the ticket and patch description, as the patch does kind of the opposite of what it says.

stoty avatar Apr 06 '22 03:04 stoty

Squashed and renamed the patch.

stoty avatar Apr 06 '22 04:04 stoty

PHOENIX-6910 makes this moot.

stoty avatar May 24 '23 13:05 stoty