trino icon indicating copy to clipboard operation
trino copied to clipboard

Update to Phoenix 5.2.0 and HBase 2.5

Open wendigo opened this issue 10 months ago • 23 comments

wendigo avatar Apr 16 '24 11:04 wendigo

Screenshot 2024-04-16 at 15 37 22

wendigo avatar Apr 16 '24 13:04 wendigo

Is this related? image

kokosing avatar Apr 17 '24 13:04 kokosing

@kokosing I tried what that test does locally with the PR applied, also tried many more rows with parallel writers, etc. It all works here. So I am suspecting a problem with the testing server.

lhofhansl avatar Apr 22 '24 18:04 lhofhansl

Maybe the new version requires more resources that CI runner provides?

kokosing avatar Apr 23 '24 15:04 kokosing

@kokosing doubt that, it stales on my local machine randomly as wel

wendigo avatar Apr 23 '24 15:04 wendigo

Closing. I won't invest more time to this. There is no interest in the community to keep this connector alive.

wendigo avatar Apr 23 '24 17:04 wendigo

There is no interest in the community to keep this connector alive.

@virajjasani , @stoty, is that so? You guys don't care?

lhofhansl avatar Apr 24 '24 00:04 lhofhansl

We have meeting tomorrow, @stoty and I would like to join https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-24-apr-2024

virajjasani avatar Apr 24 '24 00:04 virajjasani

We can definitely help with test failures too, just need a bit more context. @lhofhansl @wendigo how are you running all Phoenix tests locally? Let me also run and confirm how things look.

virajjasani avatar Apr 24 '24 00:04 virajjasani

I am going to run TestPhoenix on IDE and will get back here.

virajjasani avatar Apr 24 '24 04:04 virajjasani

@lhofhansl do we run PhoenixQueryRunner for running tests on IDE? It doesn't seem to be able to bring up miniHBase cluster.

java.lang.RuntimeException: Master not initialized after 200000ms
	at org.apache.hadoop.hbase.util.JVMClusterUtil.waitForEvent(JVMClusterUtil.java:229)
	at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:197)
	at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:413)
	at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:259)
	at org.apache.hadoop.hbase.MiniHBaseCluster.<init>(MiniHBaseCluster.java:116)
	at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1169)
	at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1213)
	at io.trino.plugin.phoenix5.TestingPhoenixServer.<init>(TestingPhoenixServer.java:81)
	at io.trino.testing.SharedResource.getInstanceLease(SharedResource.java:46)
	at io.trino.plugin.phoenix5.TestingPhoenixServer.getInstance(TestingPhoenixServer.java:44)
	at io.trino.plugin.phoenix5.PhoenixQueryRunner.main(PhoenixQueryRunner.java:137)

I can debug this further, but not sure about the location of the logs.

For now, i am just trying to get the test running (without upgrading Phoenix to 5.2.0) on IDE.

virajjasani avatar Apr 24 '24 05:04 virajjasani

Minicluster startup failures are usually caused by using HBase artifacts not built for Hadoop 3. HBase 2.4.x needs to be rebuilt locally with Hadoop 3, for HBase 2.5 there are public artifacts available compiled with Hadoop3.

stoty avatar Apr 24 '24 06:04 stoty

Oh, it might all work for me because I happen to have an HBase version built against Hadoop 3 in my local Maven repo. Anyway, let's not discuss this here. @virajjasani I'll find you today and we can talk about this.

@wendigo Have some patience with us, I know this is frustrating.

lhofhansl avatar Apr 26 '24 17:04 lhofhansl

On the product test environment I can see an exception that I think is related to a test being stuck @stoty @lhofhansl:

2024-04-26T19:00:01,878 INFO  [ReadOnlyZKClient-127.0.0.1:2181@0x703f494c-SendThread(127.0.0.1:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, session id = 0x1001af67c2e0003, negotiated timeout = 40000
2024-04-26T19:00:01,881 INFO  [pool-100-thread-1] query.ConnectionQueryServicesImpl: er.connect(PhoenixDriver.java:229)
java.sql/java.sql.DriverManager.getConnection(DriverManager.java:677)
java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)
org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:433)
org.apache.phoenix.util.QueryUtil.getConnectionOnServer(QueryUtil.java:410)
org.apache.phoenix.util.QueryUtil.getConnectionOnServer(QueryUtil.java:391)
org.apache.phoenix.coprocessor.TaskRegionObserver$SelfHealingTask.run(TaskRegionObserver.java:164)
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
java.base/java.lang.Thread.run(Thread.java:829)

2024-04-26T19:00:01,930 INFO  [pool-100-thread-1] connectionqueryservice.ConnectionQueryServicesMetricsManager: Created object for NoOp Connection query service metrics manager

wendigo avatar Apr 26 '24 19:04 wendigo

Yep, startup issue. Added a wait condition and it passed for me locally couple of times.

wendigo avatar Apr 26 '24 19:04 wendigo

@mosabua yes, phoenix client checks the version on the server side. I've updated docker images sometime ago - without it, phoenix client won't connect.

wendigo avatar Apr 26 '24 19:04 wendigo

@mosabua docs updated, old unnecessary module removed.

wendigo avatar Apr 26 '24 19:04 wendigo

Test failures are unrelated so I think this is good to go @wendigo ..

mosabua avatar Apr 26 '24 20:04 mosabua

multinode-phoenix5 has failed :(

wendigo avatar Apr 26 '24 21:04 wendigo

multinode-phoenix5 has failed :(

Apologies for my ignorance... I do not see this in the test runs on this PR.

lhofhansl avatar Apr 27 '24 16:04 lhofhansl

@lhofhansl this is part of suite-6

wendigo avatar Apr 27 '24 18:04 wendigo

@lhofhansl @wendigo sorry I am still occupied with something, please give me one more week to get back to you.

virajjasani avatar May 01 '24 16:05 virajjasani

IIUC, TestPhoenix is the one failing right?

virajjasani avatar May 01 '24 16:05 virajjasani

Ran the tests locally on multinode-phoenix5 and now i can reproduce test failure:

presto-master       | 2024-05-04T04:13:13.385+0545  INFO    dispatcher-query-5  io.trino.event.QueryMonitor TIMELINE: Query 20240503_221800_00000_cjwcx :: FAILED (EXCEEDED_TIME_LIMIT) :: elapsed 612566ms :: planning 612566ms :: waiting 0ms :: scheduling 0ms :: running 0ms :: finishing 0ms :: begin 2024-05-04T04:03:00.792+05:45 :: end 2024-05-04T04:13:13.358+05:45
tests               | 2024-05-04 04:13:13 INFO: FlakyTestRetryAnalyzer not enabled: CONTINUOUS_INTEGRATION environment is not detected or system property 'io.trino.testng.services.FlakyTestRetryAnalyzer.enabled' is not set to 'true' (actual: <not set>)
tests               | 2024-05-04 04:13:13 INFO: FAILURE     /    io.trino.tests.product.phoenix.TestPhoenix.testCreateTableAsSelect (Groups: phoenix, profile_specific_tests) took 10 minutes and 13 seconds
tests               | 2024-05-04 04:13:13 SEVERE: Failure cause:
tests               | io.trino.tempto.query.QueryExecutionException: java.sql.SQLException: Query failed (#20240503_221800_00000_cjwcx): Query exceeded the maximum planning time limit of 10.00m
tests               |   at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:119)
tests               |   at io.trino.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:84)
tests               |   at io.trino.tests.product.utils.QueryExecutors$1.lambda$executeQuery$0(QueryExecutors.java:54)
...
...
...
tests               |   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
tests               |   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
tests               |   at java.base/java.lang.Thread.run(Thread.java:1570)
tests               | Caused by: java.sql.SQLException: Query failed (#20240503_221800_00000_cjwcx): Query exceeded the maximum planning time limit of 10.00m
tests               |   at io.trino.jdbc.AbstractTrinoResultSet.resultsException(AbstractTrinoResultSet.java:1937)
tests               |   at io.trino.jdbc.TrinoResultSet$ResultsPageIterator.computeNext(TrinoResultSet.java:294)
tests               |   at io.trino.jdbc.TrinoResultSet$ResultsPageIterator.computeNext(TrinoResultSet.java:254)
...
...
...
tests               |   at io.trino.jdbc.TrinoResultSet$AsyncIterator.lambda$new$1(TrinoResultSet.java:179)
tests               |   at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
tests               |   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
tests               |   ... 3 more
tests               |   Suppressed: java.lang.Exception: Query: CREATE TABLE nation AS SELECT * FROM tpch.tiny.nation
tests               |       at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:136)
tests               |       at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:112)
tests               |       at io.trino.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:84)
tests               |       at io.trino.tests.product.utils.QueryExecutors$1.lambda$executeQuery$0(QueryExecutors.java:54)
...
...
...
tests               |       at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
tests               |       at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
tests               |       at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
tests               |       ... 3 more
...
...
...
tests               | 2024-05-04 04:13:13 INFO: [2 of 2] io.trino.tests.product.TestConfiguredFeatures.selectConfiguredConnectors (Groups: configured_features)
presto-master       | 2024-05-04T04:13:14.166+0545  INFO    dispatcher-query-6  io.trino.event.QueryMonitor TIMELINE: Query 20240503_222813_00001_cjwcx :: FINISHED :: elapsed 428ms :: planning 66ms :: waiting 3ms :: scheduling 219ms :: running 98ms :: finishing 45ms :: begin 2024-05-04T04:13:13.713+05:45 :: end 2024-05-04T04:13:14.141+05:45
tests               | 2024-05-04 04:13:14 INFO: SUCCESS     /    io.trino.tests.product.TestConfiguredFeatures.selectConfiguredConnectors (Groups: configured_features) took 0.5 seconds
tests               | 2024-05-04 04:13:14 INFO: 
tests               | 2024-05-04 04:13:14 INFO: Completed 2 tests
tests               | 2024-05-04 04:13:14 INFO: 1 SUCCEEDED      /      1 FAILED      /      0 SKIPPED
tests               | 2024-05-04 04:13:14 INFO: Tests execution took 10 minutes and 14 seconds
tests               | 2024-05-04 04:13:14 INFO: ManageTestResources.onFinish: running checks
tests               | 
tests               | ===============================================
tests               | tempto-tests
tests               | Total tests run: 2, Failures: 1, Skips: 0
tests               | ===============================================

Let me see where I can check logs for test failure.

virajjasani avatar May 03 '24 22:05 virajjasani

@lhofhansl,

Both @vincentpoon and myself tried running the tests many times, and we see the tests flapping, sometimes they pass and sometimes fail.

tests               | 2024-05-04 08:16:19 INFO: SUCCESS     /    io.trino.tests.product.TestConfiguredFeatures.selectConfiguredConnectors (Groups: configured_features) took 0.1 seconds
tests               | 2024-05-04 08:16:19 INFO: 
tests               | 2024-05-04 08:16:19 INFO: Completed 2 tests
tests               | 2024-05-04 08:16:19 INFO: 2 SUCCEEDED      /      0 FAILED      /      0 SKIPPED
tests               | 2024-05-04 08:16:19 INFO: Tests execution took 16.9 seconds
tests               | 2024-05-04 08:16:19 INFO: ManageTestResources.onFinish: running checks
tests               | 
tests               | ===============================================
tests               | tempto-tests
tests               | Total tests run: 2, Failures: 0, Skips: 0
tests               | ===============================================
tests               | 
tests               | (exited)
tests               | (exited)

However, when we captured more thread dumps from the tests that are stuck (and eventually failing after ~10 min), we see deadlock issue.

"task-truncated-0" #132 daemon prio=5 os_prio=0 cpu=26.49ms elapsed=92.79s tid=0x0000ffff7e211800 nid=0x298f waiting on condition  [0x0000ffff3388a000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep([email protected]/Native Method)
    at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:1119)
    at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:897)
    at org.apache.hadoop.hbase.client.HRegionLocator.lambda$getRegionLocation$0(HRegionLocator.java:80)
    at org.apache.hadoop.hbase.client.HRegionLocator$$Lambda$447/0x00000008007fe040.call(Unknown Source)
    at org.apache.hadoop.hbase.client.HRegionLocator.tracedLocationFuture(HRegionLocator.java:162)
    at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:79)
    at org.apache.hadoop.hbase.client.RegionLocator.getRegionLocation(RegionLocator.java:71)
    at org.apache.hadoop.hbase.client.RegionLocator.getRegionLocation(RegionLocator.java:60)
    at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:219)
    at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:101)
    at org.apache.hadoop.hbase.client.HTable.get(HTable.java:390)
    at org.apache.hadoop.hbase.client.HTable.lambda$get$0(HTable.java:363)
    at org.apache.hadoop.hbase.client.HTable$$Lambda$446/0x00000008007fec40.call(Unknown Source)
    at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)
    at org.apache.hadoop.hbase.client.HTable.get(HTable.java:363)
    at org.apache.phoenix.util.UpgradeUtil.truncateStats(UpgradeUtil.java:2274)
    at org.apache.phoenix.coprocessor.MetaDataRegionObserver$1$1.run(MetaDataRegionObserver.java:198)
    at org.apache.phoenix.coprocessor.MetaDataRegionObserver$1$1.run(MetaDataRegionObserver.java:195)
    at java.security.AccessController.doPrivileged([email protected]/Native Method)
...

One of the similar deadlock issue was resolved as part of PHOENIX-7076. All we need to do is disable MetaDataRegionObserver as we no longer need Truncating Stats, nor do we need to scan SYSTEM.CATALOG for old index framework, but this can be done later, it should not be blocker for now.

However, we found something interesting, while listing tables from HBase shell, all system tables seem to have been created with namespace enabled. We no longer use namespace enabled config. @lhofhansl any idea if we really need namespace here?

hbase:002:0> list
TABLE                                                                                                                                                                                                                                                
SYSTEM:CATALOG                                                                                                                                                                                                                                       
SYSTEM:CHILD_LINK                                                                                                                                                                                                                                    
SYSTEM:FUNCTION                                                                                                                                                                                                                                      
SYSTEM:LOG                                                                                                                                                                                                                                           
SYSTEM:MUTEX                                                                                                                                                                                                                                         
SYSTEM:SEQUENCE                                                                                                                                                                                                                                      
SYSTEM:STATS                                                                                                                                                                                                                                         
SYSTEM:TASK                                                                                                                                                                                                                                          
SYSTEM:TRANSFORM                                                                                                                                                                                                                                     
NATION                                                                                                                                                                                                                                               
10 row(s)
Took 0.0120 seconds                                                                                                                                                                                                                                  

I saw bunch of configs that enable namespace, after making changes locally to disable namespace, when I trigger the docker test run again, I still see system tables with namespace. I changed all usage of phoenix.schema.isNamespaceMappingEnabled to false. While we do have coverage for some namespace tests, I don't think we should enable namespace for Trino tests (unless I am not aware of any historical reason behind this).

virajjasani avatar May 04 '24 03:05 virajjasani

Thanks @virajjasani and @vincentpoon .

As for the MetaDataRegionObserver, I think you are right. We might want to double check that local indexes and transactional indexes (via OMID) still work. They might still be using the old indexing framework...? How do I disable that RegionObserver so that I can try locally?

Namespacing is not needed in the test, that seems historical or incidental, although I do not see how this would lead to intermittent failures.

So the intermittent failures are due to the deadlock issue? (Probably when dropping all the system tables in end, which is rare outside of tests)?

We also might want to decrease that insane 10 minute, at least for the tests.

lhofhansl avatar May 04 '24 17:05 lhofhansl

How do I disable that RegionObserver so that I can try locally?

Disabling both phoenix.stats.collection.enabled and phoenix.index.failure.handling.rebuild will make this coproc a no-op. This is what we are doing in prod as well.

Namespacing is not needed in the test, that seems historical or incidental, although I do not see how this would lead to intermittent failures.

It's surprising that before scanning namespace enabled SYSTEM:CATALOG table, scanning of meta to retrieve the region location of SYSTEM:CATALOG is getting completely stuck. We have never come across such case for SYSTEM.CATALOG region. Even more surprising, no logs or no thread dumps indicate any problem with meta scan, hence i am suspecting some issue here. It's been ages since the last time i have seen namespace enabled tables :)

So the intermittent failures are due to the deadlock issue? (Probably when dropping all the system tables in end, which is rare outside of tests)?

The thread that needs to scan SYSTEM:CATALOG is getting stuck because it is not able to scan meta table to retrieve region of SYSTEM:CATALOG. Now that i think more about this, i wonder if splittable SysCat also has a role to play here.

So to make the tests similar to ideal production case, we need to:

  1. Disable phoenix.schema.isNamespaceMappingEnabled
  2. Disable phoenix.system.catalog.splittable
  3. Disable both phoenix.stats.collection.enabled and phoenix.index.failure.handling.rebuild
  4. Reduce test timeout from 10m to 2m

I still believe that this PR can still go if others agree because test is quite basic and it passes locally sometimes, we can follow-up on disabling above configs in separate PRs. Thoughts?

virajjasani avatar May 04 '24 19:05 virajjasani

It seems the problem here has to do with RegionServerRegistry which in turn relies on MetaRegionLocationCache.
The idea is to set a watch on znode updates, and cache all the META region location updates. The first problem I found is it tries to call listChildrenAndWatchForNewChildren on /hbase, which in this test case scenario, might not exist yet (race condition). However even after I patch that by making a call to watchAndCheckExists , which can set a watch on a node that doesn't exist yet, I find that the listener events stop getting triggered after a certain point. So the /hbase/meta-region-server CREATE event never gets picked up. Therefore, the MetaRegionLocationCache forever returns an empty result. The client gets stuck on retries, until the connection/test times out.

vincentpoon avatar May 05 '24 00:05 vincentpoon

test is quite basic and it passes locally sometimes

I think a flapping test is pretty annoying. Maybe we can disable the test and file a followup PR to fix the test... And then also followup and do that :))

@vincentpoon is that a new 5.2.x issue? If so, let's take it up on a Phoenix PR/Jira. (I realize you also do not really work on Phoenix anymore.)

lhofhansl avatar May 05 '24 00:05 lhofhansl

@lhofhansl As far as I can tell this is an HBase issue, perhaps not handling a somewhat artificial test race condition well.

When there is a cache miss in RegionServerRegistry , it just throws IOException("no meta location available") back to the client. No attempt to actually re-read from ZK and repopulate the cache, as I guess it assumes the watcher will do that. But the watcher isn't working for some reason here.

If I force a re-read from ZK right there after a cache miss, the test passes every time. It's a bunch of test code on my local for now, but will clean it up and do an HBase JIRA when I get more time if no-one has any better ideas.

vincentpoon avatar May 05 '24 00:05 vincentpoon