wave
wave copied to clipboard
Use k8s job for blob cache transfer
This PR will change the K8s strategy for blob transfer to use k8s job
I am testing it locally, pods and job are getting created but blobs are not getting uploaded to aws s3 bucket working on fixing that
Testing locally is not working, so I will test it in dev
got this error, while tested in dev
2024-04-30 18:42:12.093
16:42:12.092 [io-executor-thread-6] WARN i.s.w.s.b.impl.BlobCacheServiceImpl - == Blob cache failed for object 'cr.seqera.io/v2/public/nf-jdk/blobs/sha256:4edf64cf85c039184023bdfaa7e82e8a607c7f0a55286cce0c938431af0d83d3' - cause:
io.kubernetes.client.openapi.ApiException:
at io.kubernetes.client.openapi.ApiClient.handleResponse(ApiClient.java:989)
at io.kubernetes.client.openapi.ApiClient.execute(ApiClient.java:905)
at io.kubernetes.client.openapi.apis.BatchV1Api.createNamespacedJobWithHttpInfo(BatchV1Api.java:360)
at io.kubernetes.client.openapi.apis.BatchV1Api.createNamespacedJob(BatchV1Api.java:333)
at io.seqera.wave.service.k8s.K8sServiceImpl.transferJob(K8sServiceImpl.groovy:585)
at io.seqera.wave.service.blob.impl.KubeTransferStrategy.transfer(KubeTransferStrategy.groovy:53)
at io.seqera.wave.service.blob.impl.BlobCacheServiceImpl.store(BlobCacheServiceImpl.groovy:207)
at io.seqera.wave.service.blob.impl.BlobCacheServiceImpl.storeIfAbsent(BlobCacheServiceImpl.groovy:186)
at io.seqera.wave.service.blob.impl.BlobCacheServiceImpl.retrieveBlobCache(BlobCacheServiceImpl.groovy:102)
at io.seqera.wave.controller.RegistryProxyController.fromDownloadResponse(RegistryProxyController.groovy:326)
at io.seqera.wave.controller.RegistryProxyController.handleDelegate0(RegistryProxyController.groovy:231)
at io.seqera.wave.controller.RegistryProxyController.handleGet0(RegistryProxyController.groovy:200)
at io.seqera.wave.controller.RegistryProxyController.handleGet(RegistryProxyController.groovy:141)
at io.seqera.wave.controller.$RegistryProxyController$Definition$Exec.dispatch(Unknown Source)
at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:371)
at io.micronaut.context.DefaultBeanContext$4.invoke(DefaultBeanContext.java:594)
at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:303)
at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:111)
at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:103)
at io.micronaut.http.server.RouteExecutor.lambda$executeRoute$14(RouteExecutor.java:659)
at reactor.core.publisher.FluxDeferContextual.subscribe(FluxDeferContextual.java:49)
at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62)
at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194)
at io.micronaut.reactive.reactor.instrument.ReactorInstrumentation.lambda$init$0(ReactorInstrumentation.java:62)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
at io.micrometer.core.instrument.composite.CompositeTimer.recordCallable(CompositeTimer.java:129)
at io.micrometer.core.instrument.Timer.lambda$wrap$1(Timer.java:206)
at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedCallable.call(InvocationInstrumenterWrappedCallable.java:53)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
I think it needs permissions to create job in k8s
tested in local with working dev blob cache
- jobs and pods are created and blobs are also uploaded but docker pull fails:
munish.chouhan@Munishs-MacBook-Pro ~ % docker pull ffb6c929b79f.ngrok.app/wt/de20f99af173/public/nf-jdk:corretto-17.0.10
corretto-17.0.10: Pulling from wt/de20f99af173/public/nf-jdk
1243323cbbce: Retrying in 4 seconds
f9a83abe90dc: Retrying in 14 seconds
1fc87daf47ad: Download complete
6d3480d9a1b8: Downloading
unknown: {"operation":"pipe","success":true,"destination":"s3://wave-cache-dev/cr.seqera.io/v2/public/nf-jdk/blobs/sha256:6d3480d9a1b8740730551c49ea4792ce68b8ee74638fd4b58f34783304e96362","object":{"type":"file"}}
working now:
munish.chouhan@Munishs-MacBook-Pro blob_cache_testing % wave -i cr.seqera.io/public/nf-jdk:corretto-17.0.10 --wave-endpoint http://localhost:9090
ffb6c929b79f.ngrok.app/wt/6e2922114717/public/nf-jdk:corretto-17.0.10
munish.chouhan@Munishs-MacBook-Pro blob_cache_testing % docker pull ffb6c929b79f.ngrok.app/wt/6e2922114717/public/nf-jdk:corretto-17.0.10
corretto-17.0.10: Pulling from wt/6e2922114717/public/nf-jdk
1243323cbbce: Pull complete
f9a83abe90dc: Pull complete
1fc87daf47ad: Pull complete
6d3480d9a1b8: Pull complete
Digest: sha256:daf635dae478659aeb86296c64cfdc4dee6e1f3bc9ab49fe87034829c62d818a
Status: Downloaded newer image for ffb6c929b79f.ngrok.app/wt/6e2922114717/public/nf-jdk:corretto-17.0.10
ffb6c929b79f.ngrok.app/wt/6e2922114717/public/nf-jdk:corretto-17.0.10
tested in dev
% wave -i cr.seqera.io/public/nf-jdk:corretto-17.0.9 --wave-endpoint https://wave.dev-tower.net
wave.dev-tower.net/wt/5ed41ad6b28a/public/nf-jdk:corretto-17.0.9
munish.chouhan@Munishs-MacBook-Pro blob_cache_testing % docker pull wave.dev-tower.net/wt/5ed41ad6b28a/public/nf-jdk:corretto-17.0.9
corretto-17.0.9: Pulling from wt/5ed41ad6b28a/public/nf-jdk
6ebddf7084e9: Pull complete
fca718df3f34: Pull complete
18d54cc063eb: Pull complete
18afe249605d: Pull complete
Digest: sha256:b03a79c48047dbeaea8eed4117fc0b069458b9093cb833b93a38ad7d49b7a11a
Status: Downloaded newer image for wave.dev-tower.net/wt/5ed41ad6b28a/public/nf-jdk:corretto-17.0.9
@pditommaso ready for review
This PR requires this PR to be merge first https://github.com/seqeralabs/platform-deployment/pull/363
@munishchouhan can you please have a look at the merge conflicts?
working on fixing test
tested again: successfull
(base) munish.chouhan@Munishs-MacBook-Pro ~ % kubectl get jobs -n wave-local
NAME COMPLETIONS DURATION AGE
transfer-a43a48be12e6f4aa 0/1 2s 2s
transfer-918964c839b722ad 0/1 2s 2s
transfer-f96c05fe22f18394 0/1 2s 2s
(base) munish.chouhan@Munishs-MacBook-Pro ~ % kubectl get jobs -n wave-local
NAME COMPLETIONS DURATION AGE
transfer-f96c05fe22f18394 0/1 49s 49s
(base) munish.chouhan@Munishs-MacBook-Pro ~ % kubectl get jobs -n wave-local
No resources found in wave-local namespace.
@munishchouhan I've deployed 1.10.0-B3 to stage, please make some more tests in relation to latest changes in this PR
Single image transfer tested successfully, no errors in the logs
(base) munish.chouhan@Munishs-MacBook-Pro wave_testing % docker pull wave.stage-seqera.io/wt/xxxxxxx/public/nf-jdk:corretto-17.0.9
corretto-17.0.9: Pulling from wt/xxxxxxx/public/nf-jdk
6ebddf7084e9: Pull complete
fca718df3f34: Pull complete
18d54cc063eb: Pull complete
18afe249605d: Pull complete
Digest: sha256:b03a79c48047dbeaea8eed4117fc0b069458b9093cb833b93a38ad7d49b7a11a
Status: Downloaded newer image for wave.stage-seqera.io/wt/xxxxxxx/public/nf-jdk:corretto-17.0.9
wave.stage-seqera.io/wt/xxxxxxx/public/nf-jdk:corretto-17.0.9
What's next:
View a summary of image vulnerabilities and recommendations → docker scout quickview wave.stage-seqera.io/wt/xxxxxxx/public/nf-jdk:corretto-17.0.9
Now I will test with multiple transfer requests for load testing
Tested with 10 parallel images pull it was successful and no errors in the logs
Can't see any == Blob cache entry in the stage logs
== Blob cache
yes 116 enteries in last 3 hours
My fault, the cli tool he fooled me
I cannot merge it because it states all commits are not signed