flink-on-k8s-operator icon indicating copy to clipboard operation
flink-on-k8s-operator copied to clipboard

savepointDir only supported gs://? we want to use s3://

Open kaohaonan6666 opened this issue 4 years ago • 8 comments

how to use s3:// as our savepointDir how to solve the problem

kaohaonan6666 avatar Jan 18 '21 03:01 kaohaonan6666

Also wondering if wasb:// would be supported as well?

guanjieshen avatar Jan 19 '21 05:01 guanjieshen

@kaohaonan6666 You can write the savepoints into s3 bucket via s3a:// prefix like following:

  job:
    jarFile: /opt/flink-job.jar
    savepointsDir: s3a://mybucket/flink/savepoints
    autoSavepointSeconds: 360
 
(snip)

  flinkProperties:
    # for s3 access
    s3.access-key: "YOUR-ACCESS-KEY"
    s3.secret-key: "YOUR-SECRET-KEY

And also, you should make sure that your docker image contains the jars to access the s3 buckets. see https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/images/flink/docker/Dockerfile

I've revised Dockerfile to include jars in order to access aws s3, like following:

(snip)

# s3
ARG FLINK_S3_HADOOP_JAR_NAME=flink-s3-fs-hadoop-1.11.2.jar
ARG FLINK_S3_HADOOP_JAR_URI=https://repo1.maven.org/maven2/org/apache/flink/flink-s3-fs-hadoop/1.11.2/${FLINK_S3_HADOOP_JAR_NAME}

RUN echo "Downloading ${FLINK_S3_HADOOP_JAR_URI}" && \
  wget -q -O /opt/flink/lib/${FLINK_S3_HADOOP_JAR_NAME} ${FLINK_S3_HADOOP_JAR_URI}

Hope this helps.

youngwookim avatar Jan 19 '21 05:01 youngwookim

we want to start job with a savepoint rather than do a savepoint, we check that fromSavepoint can help us, but it only supports gs://

kaohaonan6666 avatar Jan 19 '21 05:01 kaohaonan6666

@kaohaonan6666 IMO, Flink savepoint is a path to save snapshot images on HCFS. I believe, if gs:// does work, then others like s3a:// or hdfs:// should work too.

youngwookim avatar Jan 19 '21 06:01 youngwookim

we have checked s3:// hdfs:// but no works ,just want to make sure can use it or not

kaohaonan6666 avatar Jan 19 '21 06:01 kaohaonan6666

@kaohaonan6666 I'm not sure that is a bug on flink-operator but Basically, Flink supports well-known object storage for storing savepoints, checkpoints and etc. ref., https://ci.apache.org/projects/flink/flink-docs-stable/deployment/filesystems/ So, you should double check required libraries and configurations for particular docker image for the fs scheme. That means, default docker image for flink-operator is works fine with GCS but if you want to use aws or azure, you should customize your docker image and flink properties respectively.

youngwookim avatar Jan 19 '21 07:01 youngwookim

This is not related to the operator.

  • The docker image you are using should have the plugin enabled (azure blob plugin for wasb:// for example)
  • You should provide the needed config for your filesystem (flink-conf.yaml, core-site.xml, etc..)

shashken avatar Jan 20 '21 11:01 shashken

we solve the problem the crd is fromSavepoint,but the doc is fromSavePoint we check the srouce code , use 'p' instead of 'P',then success, just a slip!

kaohaonan6666 avatar Jan 21 '21 08:01 kaohaonan6666