sparkling-water
sparkling-water copied to clipboard
How resouces are shared in case of multiple request to external sparking water backend in k8??
If I submit two request at a same time from sparkling water to external backend sparkling water deployment in k8, will it process one request at a time as the models are fully parallelized across the resources. Pls correct me if i am wrong.
Hi @gurumoorthy208524, requests to h2o backend are processed immediately after they are received. the requests will be executed in parallel and will share the resources.
@mn-mikke But currently, when I try with two request parallelly, one of the request have been paused at this state. And getting the following error
2022-05-23T09:23:26.784821463Z 22/05/23 09:23:26 INFO H2OContext: Trying to lock H2O cluster h2o-service-dummy.sparkling-water-dummy.svc.cluster.local:54321 - root.
2022-05-23T09:23:26.827108792Z 22/05/23 09:23:26 INFO RestApiUtils: H2O node http://h2o-service-dummy.sparkling-water-dummy.svc.cluster.local:54321/3/CloudLock successfully responded for the POST.
2022-05-23T09:23:26.921423441Z 22/05/23 09:23:26 INFO BlockManagerInfo: Removed broadcast_2_piece0 on main-py-77027e80f039367b-driver-svc.spark.svc:7079 in memory (size: 29.2 KiB, free: 1048.8 MiB)
2022-05-23T09:23:26.926948365Z 22/05/23 09:23:26 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 10.171.129.208:37273 in memory (size: 29.2 KiB, free: 1048.8 MiB)
2022-05-23T09:23:32.595447231Z 22/05/23 09:23:32 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (X.X.X.X:44352) with ID 2, ResourceProfileId 0
2022-05-23T09:23:32.762333029Z 22/05/23 09:23:32 INFO BlockManagerMasterEndpoint: Registering block manager X.X.X.X:35117 with 1048.8 MiB RAM, BlockManagerId(2, X.X.X.X, 35117, None)
2022-05-23T10:09:40.601292543Z 22/05/23 10:09:40 INFO RestApiUtils: H2O node http://h2o-service-dummy.sparkling-water-dummy.svc.cluster.local:54321/3/verifyWebOpen successfully responded for the GET.
2022-05-23T10:12:26.737441389Z 22/05/23 10:12:26 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
2022-05-23T10:12:26.737524314Z io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 81220125 (81235652)
2022-05-23T10:12:26.737532152Z at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:258)
2022-05-23T10:12:26.737537781Z at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
2022-05-23T10:12:26.737542481Z at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
2022-05-23T10:12:26.737547254Z at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
2022-05-23T10:12:26.737551856Z at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
2022-05-23T10:12:26.737556646Z at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
2022-05-23T10:12:26.737561391Z at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
2022-05-23T10:12:26.737565891Z at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
2022-05-23T10:12:26.737604382Z at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
2022-05-23T10:12:26.737610565Z at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
2022-05-23T10:12:26.737614640Z at java.base/java.lang.Thread.run(Unknown Source)```
Hi @gurumoorthy208524, The error doesn't seem to originate in Sparkling Water or H2O-3. The same problem with fabric8 library (used by Spark) is reported and discussed in the below links:
- https://issues.apache.org/jira/browse/SPARK-33349
- https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1498