flink
flink copied to clipboard
[FLINK-11529][docs-zh] Translate the "DataStream API Tutorial" page into Chinese
[flink-11529][docs-zh]Translate the "DataStream API Tutorial" page into Chinese
What is the purpose of the change
(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)
Brief change log
(for example:)
- The TaskInfo is stored in the blob store on job creation time as a persistent artifact
- Deployments RPC transmits only the blob storage reference
- TaskManagers retrieve the TaskInfo from the blob cache
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
- Added integration tests for end-to-end deployment with large payloads (100MB)
- Extended integration test for recovery after master (JobManager) failure
- Added test that validates that TaskInfo is transferred only once across recoveries
- Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / no)
- The public API, i.e., is any changed class annotated with
@Public(Evolving): (yes / no) - The serializers: (yes / no / don't know)
- The runtime per-record code paths (performance sensitive): (yes / no / don't know)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
- The S3 file system connector: (yes / no / don't know)
Documentation
- Does this pull request introduce a new feature? (yes / no)
- If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review.
Review Progress
- ❓ 1. The [description] looks good.
- ❓ 2. There is [consensus] that the contribution should go into to Flink.
- ❓ 3. Needs [attention] from.
- ❓ 4. The change fits into the overall [architecture].
- ❓ 5. Overall code [quality] is good.
Please see the Pull Request Review Guide for a full explanation of the review process.Bot commands
The @flinkbot bot supports the following commands:
@flinkbot approve descriptionto approve one or more aspects (aspects:description,consensus,architectureandquality)@flinkbot approve allto approve all aspects@flinkbot approve-until architectureto approve everything untilarchitecture@flinkbot attention @username1 [@username2 ..]to require somebody's attention@flinkbot disapprove architectureto remove an approval you gave earlier
CI report:
Hi, thanks to your review,I will do that now------------------ 原始邮件 ------------------ 发件人: "Congxian Qiu"[email protected] 发送时间: 2019年7月14日(星期天) 下午5:32 收件人: "apache/flink"[email protected]; 抄送: "lakeshen"[email protected];"Mention"[email protected]; 主题: Re: [apache/flink] [FLINK-11529][docs-zh] Translate the "DataStreamAPI Tutorial" page into Chinese (#9097)
@klion26 commented on this pull request.
@LakeShen thanks for your contribution, I passed the first-term review and left some comments.
you can preview the translation locally by executing sh docs/build_docs.sh -p in Flink project, and open http://localhost:4000 in your browser.
In docs/getting-started/tutorials/datastream_api.zh.md:
@@ -26,19 +26,14 @@ under the License. * This will be replaced by the TOC {:toc} -In this guide we will start from scratch and go from setting up a Flink project to running -a streaming analysis program on a Flink cluster. +在本节指南中,我们将从零开始创建一个在 flink 集群上面进行流分析的 Flink 项目。 ️ Suggested change -在本节指南中,我们将从零开始创建一个在 flink 集群上面进行流分析的 Flink 项目。 +在本节指南中,我们将在 Flink 集群上从零开始创建一个流分析项目。
In docs/getting-started/tutorials/datastream_api.zh.md:
-Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to -read this channel in Flink and count the number of bytes that each user edits within -a given window of time. This is easy enough to implement in a few minutes using Flink, but it will -give you a good foundation from which to start building more complex analysis programs on your own. +维基百科提供了一个能够记录所有对 wiki 编辑的 IRC 通道。我们将使用 Flink 读取该通道的数据,同时 ️ Suggested change -维基百科提供了一个能够记录所有对 wiki 编辑的 IRC 通道。我们将使用 Flink 读取该通道的数据,同时 +维基百科提供了一个记录所有 wiki 编辑历史的 IRC 通道。我们将使用 Flink 读取该通道的数据,同时
In docs/getting-started/tutorials/datastream_api.zh.md:
-Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to -read this channel in Flink and count the number of bytes that each user edits within -a given window of time. This is easy enough to implement in a few minutes using Flink, but it will -give you a good foundation from which to start building more complex analysis programs on your own. +维基百科提供了一个能够记录所有对 wiki 编辑的 IRC 通道。我们将使用 Flink 读取该通道的数据,同时 +在给定的时间窗口,计算出每个用户在其中编辑的字节数。这使用 Flink 很容易就能实现,但它会为你提供一个良好的基础去开始构建你自己更为复杂的分析程序。
计算出每个用户在给定时间窗口内的编辑字节数?
In docs/getting-started/tutorials/datastream_api.zh.md:
-We are going to use a Flink Maven Archetype for creating our project structure. Please -see [Java API Quickstart]({{ site.baseurl }}/dev/projectsetup/java_api_quickstart.html) for more details -about this. For our purposes, the command to run is this: +我们准备使用 Flink Maven Archetype 创建项目结构。更多细节请查看[Java API 快速指南]({{ site.baseurl }}/zh/dev/projectsetup/java_api_quickstart.html)。项目运行命令如下: ️ Suggested change -我们准备使用 Flink Maven Archetype 创建项目结构。更多细节请查看[Java API 快速指南]({{ site.baseurl }}/zh/dev/projectsetup/java_api_quickstart.html)。项目运行命令如下: +我们准备使用 Flink Maven Archetype 创建项目结构。更多细节请查看 [Java API 快速指南]({{ site.baseurl }}/zh/dev/projectsetup/java_api_quickstart.html)。项目运行命令如下:
do we need to translate Maven Archetype here?
In docs/getting-started/tutorials/datastream_api.zh.md:
@@ -59,8 +54,7 @@ $ mvn archetype:generate \
I think we need to translate the Note also
In docs/getting-started/tutorials/datastream_api.zh.md:
-
+
maybe we do not change the url of image?
In docs/getting-started/tutorials/datastream_api.zh.md:
@@ -59,8 +54,7 @@ $ mvn archetype:generate \
{% endunless %} -You can edit thegroupId,artifactIdandpackageif you like. With the above parameters, -Maven will create a project structure that looks like this: +你可以根据自己需求编辑groupId、artifactId以及package。对于上面的参数,Maven 将会创建一个这样的项目结构:
"你可以按需修改 groupId、artifactId 以及 package"? 对于上面的参数,Maven 将会创建一个这样的项目结构 seems a little odd to me, do you think we can make it better?
In docs/getting-started/tutorials/datastream_api.zh.md:
@@ -76,16 +70,13 @@ wiki-edits/ └── log4j.properties {% endhighlight %} -There is our
pom.xmlfile that already has the Flink dependencies added in the root directory and -several example Flink programs insrc/main/java. We can delete the example programs, since -we are going to start from scratch: +项目根目录下的pom.xml文件已经将 Flink 依赖添加进来,同时在src/main/java目录下也有几个 Flink 程序实例。由于我们从头开始创建,我们可以删除程序实例:
"Flink 依赖已经添加到根目录下的 pom.xml 文件中"?
Flink 程序实例 -> Flink 实例程序?
由于我们将从头开始创建,因此可以删除这些实例程序?
In docs/getting-started/tutorials/datastream_api.zh.md:
{% highlight bash %} $ rm wiki-edits/src/main/java/wikiedits/*.java {% endhighlight %} -As a last step we need to add the Flink Wikipedia connector as a dependency so that we can -use it in our program. Edit the
dependenciessection of thepom.xmlso that it looks like this: +作为最后一步,我们需要添加 Flink 维基百科连接器作为依赖项,这样就可以在我们的项目中进行使用。编辑pom.xml的dependencies部分,使它看起来像这样: ️ Suggested change -作为最后一步,我们需要添加 Flink 维基百科连接器作为依赖项,这样就可以在我们的项目中进行使用。编辑pom.xml的dependencies部分,使它看起来像这样: +作为最后一步,我们需要添加 Flink 维基百科连接器的依赖,从而可以在项目中进行使用。修改pom.xml的dependencies部分,使它看起来像这样:
In docs/getting-started/tutorials/datastream_api.zh.md:
-It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and -create the file
src/main/java/wikiedits/WikipediaAnalysis.java: +现在是编程时间。启动你最喜欢的 IDE 并导入 Maven 项目或打开文本编辑器创建文件src/main/java/wikiedits/WikipediaAnalysis.java: ️ Suggested change -现在是编程时间。启动你最喜欢的 IDE 并导入 Maven 项目或打开文本编辑器创建文件src/main/java/wikiedits/WikipediaAnalysis.java: +现在是编程时间。启动你最喜欢的 IDE 并导入 Maven 项目或打开文本编辑器,然后创建文件src/main/java/wikiedits/WikipediaAnalysis.java:
In docs/getting-started/tutorials/datastream_api.zh.md:
-This concludes our little tour of Flink. If you have any questions, please don't hesitate to ask on our Mailing Lists. +这就结束了 Flink 项目构建之旅. 如果你有任何问题, 你可以在我们的 邮件组提出. ️ Suggested change -这就结束了 Flink 项目构建之旅. 如果你有任何问题, 你可以在我们的 邮件组提出. +这就结束了 Flink 项目构建之旅. 如果你有任何问题, 可以在我们的邮件组提出.
In docs/getting-started/tutorials/datastream_api.zh.md:
@@ -131,32 +120,24 @@ public class WikipediaAnalysis { } {% endhighlight %} -The program is very basic now, but we will fill it in as we go. Note that I'll not give -import statements here since IDEs can add them automatically. At the end of this section I'll show -the complete code with import statements if you simply want to skip ahead and enter that in your -editor. +这个程序现在很基础,但我们会边做边进行补充。注意我不会给出导入语句,因为 IDE 会自动添加它们。在本节的最后,我将展示带有导入语句的完整代码
边做边完善?
In docs/getting-started/tutorials/datastream_api.zh.md:
@@ -131,32 +120,24 @@ public class WikipediaAnalysis { } {% endhighlight %} -The program is very basic now, but we will fill it in as we go. Note that I'll not give -import statements here since IDEs can add them automatically. At the end of this section I'll show -the complete code with import statements if you simply want to skip ahead and enter that in your -editor. +这个程序现在很基础,但我们会边做边进行补充。注意我不会给出导入语句,因为 IDE 会自动添加它们。在本节的最后,我将展示带有导入语句的完整代码 +如果您只是想跳过并在您的编辑器中编辑他们。
,如果需要你可以将他们复制到你的编辑器中?
In docs/getting-started/tutorials/datastream_api.zh.md:
-The first step in a Flink program is to create a
StreamExecutionEnvironment-(orExecutionEnvironmentif you are writing a batch job). This can be used to set execution -parameters and create sources for reading from external systems. So let's go ahead and add -this to the main method: +在一个 Flink 程序中,首先你需要创建一个StreamExecutionEnvironment(或者处理批作业环境的ExecutionEnvironment)。这可以用来设置程序运行参数,同时也能够创建从外部系统读取的源。我们把这个添加到 main 方法中:
这可以用来设置程序运行参数、创建从外部系统读取的源?
In docs/getting-started/tutorials/datastream_api.zh.md:
{% highlight java %} StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); {% endhighlight %} -Next we will create a source that reads from the Wikipedia IRC log: +接下来我们将创建一个读取维基百科 IRC 数据源: ️ Suggested change -接下来我们将创建一个读取维基百科 IRC 数据源: +接下来我们将创建一个读取维基百科 IRC 数据的源:
In docs/getting-started/tutorials/datastream_api.zh.md:
{% highlight java %} DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource()); {% endhighlight %} -This creates a
DataStreamofWikipediaEditEventelements that we can further process. For -the purposes of this example we are interested in determining the number of added or removed -bytes that each user causes in a certain time window, let's say five seconds. For this we first -have to specify that we want to key the stream on the user name, that is to say that operations -on this stream should take the user name into account. In our case the summation of edited bytes in the windows -should be per unique user. For keying a Stream we have to provide aKeySelector, like this: +上面代码创建了一个WikipediaEditEvent事件的DataStream,我们可以进一步处理它。这个代码实例的目的是为了确定每个用户在特定时间窗口中添加或删除的字节数,比如5秒一个时间窗口。首先 ️ Suggested change -上面代码创建了一个WikipediaEditEvent事件的DataStream,我们可以进一步处理它。这个代码实例的目的是为了确定每个用户在特定时间窗口中添加或删除的字节数,比如5秒一个时间窗口。首先 +上面代码创建了一个WikipediaEditEvent事件的DataStream,我们可以进一步处理它。这个代码实例的目的是为了确定每个用户在特定时间窗口中添加或删除的字节数,比如 5 秒一个时间窗口。首先
In docs/getting-started/tutorials/datastream_api.zh.md:
{% highlight java %} DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource()); {% endhighlight %} -This creates a
DataStreamofWikipediaEditEventelements that we can further process. For -the purposes of this example we are interested in determining the number of added or removed -bytes that each user causes in a certain time window, let's say five seconds. For this we first -have to specify that we want to key the stream on the user name, that is to say that operations -on this stream should take the user name into account. In our case the summation of edited bytes in the windows -should be per unique user. For keying a Stream we have to provide aKeySelector, like this: +上面代码创建了一个WikipediaEditEvent事件的DataStream,我们可以进一步处理它。这个代码实例的目的是为了确定每个用户在特定时间窗口中添加或删除的字节数,比如5秒一个时间窗口。首先 +我们必须指定用户名来划分我们的数据流,也就是说这个流上的操作应该考虑用户名。
根据用户名来划分?
In docs/getting-started/tutorials/datastream_api.zh.md:
{% highlight java %} DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource()); {% endhighlight %} -This creates a
DataStreamofWikipediaEditEventelements that we can further process. For -the purposes of this example we are interested in determining the number of added or removed -bytes that each user causes in a certain time window, let's say five seconds. For this we first -have to specify that we want to key the stream on the user name, that is to say that operations -on this stream should take the user name into account. In our case the summation of edited bytes in the windows -should be per unique user. For keying a Stream we have to provide aKeySelector, like this: +上面代码创建了一个WikipediaEditEvent事件的DataStream,我们可以进一步处理它。这个代码实例的目的是为了确定每个用户在特定时间窗口中添加或删除的字节数,比如5秒一个时间窗口。首先 +我们必须指定用户名来划分我们的数据流,也就是说这个流上的操作应该考虑用户名。 +在我们这个统计窗口编辑的字节数的例子中,每个用户应该唯一的。对于划分一个数据流,我们必须提供一个KeySelector,像这样:
I think here does not mean "每个用户应该是唯一的", It means "每个不同的用户每个窗口都应该计算一个结果"
In docs/getting-started/tutorials/datastream_api.zh.md:
@@ -203,26 +180,20 @@ DataStream<Tuple2<String, Long>> result = keyedEdits }); {% endhighlight %} -The first call,
.timeWindow(), specifies that we want to have tumbling (non-overlapping) windows -of five seconds. The second call specifies a Aggregate transformation on each window slice for -each unique key. In our case we start from an initial value of("", 0L)and add to it the byte -difference of every edit in that time window for a user. The resulting Stream now contains -aTuple2<String, Long>for every user which gets emitted every five seconds. +首先调用.timeWindow()方法指定五秒翻滚(非重叠)窗口。第二个调用方法对于每一个唯一关键字指定每个窗口片聚合转换。 +在本例中,我们从("",0L)初始值开始,并将每个用户编辑的字节添加到该时间窗口中。对于每个用户来说,结果流现在包含的元素为Tuple2<String, Long>,它每5秒发出一次。 ️ Suggested change -在本例中,我们从("",0L)初始值开始,并将每个用户编辑的字节添加到该时间窗口中。对于每个用户来说,结果流现在包含的元素为Tuple2<String, Long>,它每5秒发出一次。 +在本例中,我们从("",0L)初始值开始,并将每个用户编辑的字节添加到该时间窗口中。对于每个用户来说,结果流现在包含的元素为Tuple2<String, Long>,它每5秒发出一次。
In docs/getting-started/tutorials/datastream_api.zh.md:
-The only thing left to do is print the stream to the console and start execution: +唯一剩下要做的就是将打印流输出到控制台并开始执行: ️ Suggested change -唯一剩下要做的就是将打印流输出到控制台并开始执行: +唯一剩下的就是将结果输出到控制台并开始执行:
In docs/getting-started/tutorials/datastream_api.zh.md:
-This should get you started with writing your own Flink programs. To learn more -you can check out our guides -about [basic concepts]({{ site.baseurl }}/dev/api_concepts.html) and the -[DataStream API]({{ site.baseurl }}/dev/datastream_api.html). Stick -around for the bonus exercise if you want to learn about setting up a Flink cluster on -your own machine and writing results to Kafka. +这可以让你开始创建你自己的 Flink 项目。你可以查看[基本概念]({{ site.baseurl }}/zh/dev/api_concepts.html)和[DataStream API] +({{ site.baseurl }}/zh/dev/datastream_api.html)指南。如果你想学习了解更多关于 Flink 集群安装以及写入数据到 Kafka,
[DataStream API] and ({{ site. baseurl }}..... have to be on the same line.
In docs/getting-started/tutorials/datastream_api.zh.md:
-Please follow our local setup tutorial for setting up a Flink distribution -on your machine and refer to the Kafka quickstart -for setting up a Kafka installation before we proceed. +请按照我们的本地安装教程在你的机器上构建一个Flink分布式环境,同时参考Kafka快速指南安装一个我们需要使用的Kafka环境。 ️ Suggested change -请按照我们的本地安装教程在你的机器上构建一个Flink分布式环境,同时参考Kafka快速指南安装一个我们需要使用的Kafka环境。 +请按照我们的本地安装教程在你的机器上构建一个Flink分布式环境,同时参考 Kafka快速指南安装一个我们需要使用的Kafka环境。
In docs/getting-started/tutorials/datastream_api.zh.md:
{% highlight bash %} bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wiki-result {% endhighlight %} -You can also check out the Flink dashboard which should be running at http://localhost:8081. -You get an overview of your cluster resources and running jobs: +你还可以查看运行在http://localhost:8081上的 Flink 作业仪表盘。你可以概览集群资源以及正在运行的作业: ️ Suggested change -你还可以查看运行在http://localhost:8081上的 Flink 作业仪表盘。你可以概览集群资源以及正在运行的作业: +你还可以查看运行在 http://localhost:8081 上的 Flink 作业仪表盘。你可以概览集群资源以及正在运行的作业:
In docs/getting-started/tutorials/datastream_api.zh.md:
@@ -168,12 +149,8 @@ KeyedStream<WikipediaEditEvent, String> keyedEdits = edits }); {% endhighlight %} -This gives us a Stream of
WikipediaEditEventthat has aStringkey, the user name. -We can now specify that we want to have windows imposed on this stream and compute a -result based on elements in these windows. A window specifies a slice of a Stream -on which to perform a computation. Windows are required when computing aggregations -on an infinite stream of elements. In our example we will say -that we want to aggregate the sum of edited bytes for every five seconds: +这给了我们一个WikipediaEditEvent数据流,它有一个String键,即用户名。
maybe we can have a better translation for this paragraph.
In docs/getting-started/tutorials/datastream_api.zh.md:
-This should get you started with writing your own Flink programs. To learn more -you can check out our guides -about [basic concepts]({{ site.baseurl }}/dev/api_concepts.html) and the -[DataStream API]({{ site.baseurl }}/dev/datastream_api.html). Stick -around for the bonus exercise if you want to learn about setting up a Flink cluster on -your own machine and writing results to Kafka. +这可以让你开始创建你自己的 Flink 项目。你可以查看[基本概念]({{ site.baseurl }}/zh/dev/api_concepts.html)和[DataStream API] +({{ site.baseurl }}/zh/dev/datastream_api.html)指南。如果你想学习了解更多关于 Flink 集群安装以及写入数据到 Kafka, +你可以自己多加以练习尝试。
where is the source of this translation?
In docs/getting-started/tutorials/datastream_api.zh.md:
@@ -309,24 +279,17 @@ similar to this: 4> (KasparBot,-245) {% endhighlight %} -The number in front of each line tells you on which parallel instance of the print sink the output -was produced. +每行数据前面的数字代表着打印接收器在哪个并行实例上产生的输出数据。
每行数据前面的数字代表着打印接收器运行的并实例?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Hi @LakeShen , have you updated the pull request yet?
Hi @Jark,I have updated the pull request just now,thank you to review it.------------------ 原始邮件 ------------------ 发件人: "Jark Wu"[email protected] 发送时间: 2019年7月20日(星期六) 中午12:52 收件人: "apache/flink"[email protected]; 抄送: "lakeshen"[email protected];"Mention"[email protected]; 主题: Re: [apache/flink] [FLINK-11529][docs-zh] Translate the "DataStreamAPI Tutorial" page into Chinese (#9097)
Hi @LakeShen , have you updated the pull request yet?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
You need update your pull request. More information, see: https://www.bilibili.com/video/BV1kK411577L?from=search&seid=16883785088266459960
CI report:
- a4cd0df8a0e827e7839ac00652d56193797846d5 Unknown: FAILURE
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
@LakeShen are you still working on this PR, could you please rebase the new master to resolve conflict
@LakeShen seems this pr related to FLINK-11607, could you please correct the title of the current pr
Hello, I'm LXT, my first day in the flink community, I want to contribute to the community by translating docs, but I found that the path of this project seems to have changed, and I can't find the path of the docs that need to be translated, for example, the path of the last translation: docs/getting-started/tutorials/datastream_api.zh.md I can't find it now, can anyone tell me where the new translation path is please? Thank you very much!
Translated with DeepL.com (free version)
This PR is being marked as stale since it has not had any activity in the last 180 days. If you would like to keep this PR alive, please leave a comment asking for a review. If the PR has merge conflicts, update it with the latest from the base branch.
If you are having difficulty finding a reviewer, please reach out to the [community](https://flink.apache.org/what-is-flink/community/).
If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 90 days, it will be automatically closed.
This PR has been closed since it has not had any activity in 120 days. If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open the PR and ask for a review.

