ErXi

Results 11 issues of ErXi

该 issue 用于记录 HDFS 相关的内容~~

hadoop

当前项目数据源较多,目前各个指标均存放在 MySQL 中,后续可能会同步到 Hive、Hudi 以及 ClickHouse 等数据库中。关于 MySQL 数据全量同步到 Hive 使用的是 DataX,但由于其支持的数据源较少,因此需要调研新的数据集成与同步框架。 在初步对比 flink_cdc 和 seatunnel 之后,考虑使用门槛,先调研 flink_cdc~~

flink_cdc

**FINISH:** - [Spark-Core](https://github.com/QuakeWang/BigData-Notes/tree/main/code/SparkTutorial/spark-core):编写一篇使用 RDD 计算热门商品的博客 - [SparkSQL](https://github.com/QuakeWang/BigData-Notes/tree/main/code/SparkTutorial/spark-sql):结合《Spark 权威指南》完善对于 DataSet 的使用 - [SparkStreaming](https://github.com/QuakeWang/BigData-Notes/tree/main/code/SparkTutorial/spark-streaming)

spark

修改运行Jupyter notebook命令行,由 `jupter notebook` 更改为 `jupyter notenook`

when #27 is ready we can add [Catalog](https://github.com/apache/paimon/blob/release-0.8.2/paimon-core/src/main/java/org/apache/paimon/catalog/Catalog.java) API. Maybe we can add struct with definition first, the function and the detail behavior can be implemented for the further PRs.

https://paimon.apache.org/docs/master/concepts/specification/ After #11 have been merged, I will start the `Changelog` task.

- [ ] Doris 概述 - [ ] Doris 编译及部署

Fixed inconsistent table names for routine load, iceberg_meta linking errors, LakeHouse QuickStart linking errors, and more. # Versions - [ ] dev - [ ] 3.0 - [ ] 2.1...

bug
version-2.1
version-dev
version-2.0
version-3.0