horaedb icon indicating copy to clipboard operation
horaedb copied to clipboard

Release v0.3

Open waynexia opened this issue 3 years ago • 6 comments

Description

We prepare to release v0.3 at the end of Aug. Here is the feature list:

  • Release multi-language client. Include Java, Rust and Python.
  • Support static cluster mode. And keep pushing toward a full-featured dynamic distributed version (related project: Distributed CeresDB).
  • Extend supported SQLs (tag: A-SQL).
  • Implement the hybrid storage format. And support reading from two formats.

Feel free to suggest or discuss other features you would like to add :heart:

waynexia avatar Jul 27 '22 08:07 waynexia

Will ceresdb support multiple data sources? e.g. read records from mysql's REDO log and structure them into ceresdb's data structure storage

dust1 avatar Aug 01 '22 05:08 dust1

Will ceresdb support multiple data sources?

This sounds like data ingest, are you meaning bulk load?

jiacai2050 avatar Aug 01 '22 06:08 jiacai2050

Will ceresdb support multiple data sources?

This sounds like data ingest, are you meaning bulk load?

yes, which means that ceresdb can import data from other existing commercial database files. I don't know much about this, so i not sure the terminology.

dust1 avatar Aug 01 '22 06:08 dust1

I think bulk ingest is an important feature for easy adoption, prometheus/influxdb all support this, so will we.

jiacai2050 avatar Aug 01 '22 06:08 jiacai2050

This might cover three scenarios. Let's narrow our discussion:

  • For offline data migration, our persistent format is relatively straightforward -- only a few metadata and data in the parquet format, all stored in OSS. We can achieve this in a few ways. And for some common formats like CSV or standard parquet generated in other systems, we can also support them directly.
  • Online data ingesting, on the other hand, would be a little more complicated. Maybe we need to add support for consuming data from streaming systems like Kafka, Flink, Pulsar or others. They have splendid ecosystems. By supporting them we can easily be integrated into various systems as a downstream warehouse.
  • The last one is querying from other databases. This may be a little off-topic but let me mention it as well. CeresDB is only a query frontend in this situation. In some cases I can imagine there are other projects that can do this. So I'll assign a low priority to this.

Offline migrating implementations are different case by case. We can support needed upstream on demand. Online ingesting also has a few candidating upstream, but I believe there is a common pattern among them. We can choose one to support at first if we decide to work on this. It can take a lot of effort and we need to discuss it further.

waynexia avatar Aug 02 '22 03:08 waynexia

Thanks for the summary @waynexia . I will give some additional comments on these scenarios.

  1. For data migration or data initialization from external data source, there could be some tools. But as far as I know, demands of this scenario is not so frequent. This feature can be implemented as an independent binary, like tools in mysql ecosystem. We can discuss this feature later.
  2. Online data ingestion, this is a much more complex topic. If we start working on this, we should consider latency, consistency, transformation and other aspects in real-time computing. These requirements are commonly implemented using stream-computing framework like Apache Flink. So, in my opinion, the CeresDB project will be more focusing on core features of time-series database its own.
  3. For the scenario: querying from other databases, there is a better choice, presto. So, we will not work on this direction.

archerny avatar Aug 07 '22 09:08 archerny

Released https://github.com/CeresDB/ceresdb/releases/tag/v0.3.0

jiacai2050 avatar Aug 29 '22 12:08 jiacai2050