bitsail icon indicating copy to clipboard operation
bitsail copied to clipboard

[Discuss][RoadMap]BitSail 2023Q1 RoadMap

Open lichang-bd opened this issue 2 years ago • 2 comments
trafficstars

Hi everyone. The new year is coming, Looking forward to working with you this year to build the BitSail community better and bring convenience to more data developers Here we can discuss the roadmap of BitSail in 2023Q1, Welcome to discuss and feel free to express your ideas

BitSail Connector

  • [ ] More Source & Sink Connector #279
  • [ ] Complete the migration of connector to the V1 interface #278

BitSail Basic Capacity building

  • [ ] Support Metric collection and real-time monitoring #124
  • [ ] Support K8S runtime mode #132 #266
  • [ ] Completed the tech design of CDC solutions and supported real-time collection of MySQL incremental data #160
  • [ ] Improve test coverage and provide more end-to-end test cases #281

BitSail Architecture Compatibility Improvement

  • [ ] Supports multiple versions of Flink #108
  • [ ] Improve compatibility, such as runtime environment, hive version, hadoop version, etc.
  • [ ] Architecture optimization, connector and framework layer, decoupling from engine

BitSail Product Usability Optimization

  • [ ] Start to integrate with open source development platforms to provide front-end product pages
  • [ ] Explore more convenient access methods to apply BitSail to existing systems with low cost, such as API/SDK

BitSail Multi-Engine Architecture

  • [ ] Investigate multi-engine solutions and complete technical solution design

lichang-bd avatar Dec 23 '22 04:12 lichang-bd

hi, I have some ideas, just for reference:

1.We usually use a batch job to initialize the table first, and then use a stream job to do incremental synchronization. Can we start only one bitsail job to switch between two jobs ? using Batch/Streaming Unification or something else?

2.At present, the reader and writer of Bitsail are one-to-one. In some requirements, it may be one-to-many. For example, a changelog contains change records of multiple tables, and the writer may be multiple hudi tables. In order to save computing resources, there are many scenarios for synchronizing many tables with one job. So I think it is necessary to support this feature.

3.columns is a mandatory parameter in the configuration of reader and writer. Can we generate it by querying metadata. In most cases, the field names of the data source and target are the same, but some field types are converted. In this way, we can start tasks through a temporary configuration file, reducing the maintenance work of a large number of configuration files.

zeliu avatar Dec 27 '22 11:12 zeliu

hello, I have some suggestions,just for reference: about BitSail Product Usability Optimization,how about integration with streampark,it is an easy-to-use stream processing application development framework and one-stop stream processing operation platform。

Kick156 avatar Jan 06 '23 11:01 Kick156