sparkling-water icon indicating copy to clipboard operation
sparkling-water copied to clipboard

How resouces are shared in case of multiple request to external sparking water backend in k8??

Open KunfuPanda24 opened this issue 2 years ago • 3 comments

If I submit two request at a same time from sparkling water to external backend sparkling water deployment in k8, will it process one request at a time as the models are fully parallelized across the resources. Pls correct me if i am wrong.

KunfuPanda24 avatar May 21 '22 11:05 KunfuPanda24

Hi @gurumoorthy208524, requests to h2o backend are processed immediately after they are received. the requests will be executed in parallel and will share the resources.

mn-mikke avatar May 23 '22 15:05 mn-mikke

@mn-mikke But currently, when I try with two request parallelly, one of the request have been paused at this state. And getting the following error

2022-05-23T09:23:26.784821463Z 22/05/23 09:23:26 INFO H2OContext: Trying to lock H2O cluster h2o-service-dummy.sparkling-water-dummy.svc.cluster.local:54321 - root.
2022-05-23T09:23:26.827108792Z 22/05/23 09:23:26 INFO RestApiUtils: H2O node http://h2o-service-dummy.sparkling-water-dummy.svc.cluster.local:54321/3/CloudLock successfully responded for the POST.
2022-05-23T09:23:26.921423441Z 22/05/23 09:23:26 INFO BlockManagerInfo: Removed broadcast_2_piece0 on main-py-77027e80f039367b-driver-svc.spark.svc:7079 in memory (size: 29.2 KiB, free: 1048.8 MiB)
2022-05-23T09:23:26.926948365Z 22/05/23 09:23:26 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 10.171.129.208:37273 in memory (size: 29.2 KiB, free: 1048.8 MiB)
2022-05-23T09:23:32.595447231Z 22/05/23 09:23:32 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (X.X.X.X:44352) with ID 2,  ResourceProfileId 0
2022-05-23T09:23:32.762333029Z 22/05/23 09:23:32 INFO BlockManagerMasterEndpoint: Registering block manager X.X.X.X:35117 with 1048.8 MiB RAM, BlockManagerId(2, X.X.X.X, 35117, None)
2022-05-23T10:09:40.601292543Z 22/05/23 10:09:40 INFO RestApiUtils: H2O node http://h2o-service-dummy.sparkling-water-dummy.svc.cluster.local:54321/3/verifyWebOpen successfully responded for the GET.
2022-05-23T10:12:26.737441389Z 22/05/23 10:12:26 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
2022-05-23T10:12:26.737524314Z io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 81220125 (81235652)
2022-05-23T10:12:26.737532152Z 	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:258)
2022-05-23T10:12:26.737537781Z 	at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
2022-05-23T10:12:26.737542481Z 	at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
2022-05-23T10:12:26.737547254Z 	at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
2022-05-23T10:12:26.737551856Z 	at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
2022-05-23T10:12:26.737556646Z 	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
2022-05-23T10:12:26.737561391Z 	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
2022-05-23T10:12:26.737565891Z 	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
2022-05-23T10:12:26.737604382Z 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
2022-05-23T10:12:26.737610565Z 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
2022-05-23T10:12:26.737614640Z 	at java.base/java.lang.Thread.run(Unknown Source)```

KunfuPanda24 avatar May 23 '22 19:05 KunfuPanda24

Hi @gurumoorthy208524, The error doesn't seem to originate in Sparkling Water or H2O-3. The same problem with fabric8 library (used by Spark) is reported and discussed in the below links:

  • https://issues.apache.org/jira/browse/SPARK-33349
  • https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1498

mn-mikke avatar May 26 '22 10:05 mn-mikke