hertzbeat icon indicating copy to clipboard operation
hertzbeat copied to clipboard

[Question] PromQL comparison operator implementation is inconsistent with the official Prometheus definition

Open bigcyy opened this issue 6 months ago • 16 comments

Question

While recently reviewing the official Prometheus documentation, I noticed the behavior of PromQL comparison operators (>, <, ==, >=, <=, !=) and found two main inconsistencies when comparing it with the implementation in Hertzbeat.

  1. Comparison Logic for Instant Vector vs. Scalar

    According to the official Prometheus documentation, when an instant vector is compared to a scalar, the operation acts as a filter. Specifically, PromQL iterates through each time series in the vector, performs the comparison, and any time series for which the expression is true is kept in the output. Series for which the result is false are removed.

    However, I noticed that the behavior demonstrated in the calculate1 test case in hertzbeat.warehouse.service.DataSourceServiceTest does not align with this standard.

    void calculate1() {
        List<Map<String, Object>> prometheusData = List.of(
                new HashMap<>(Map.of("__value__", 100.0, "timestamp", 1343554, "instance", "node1")),
                new HashMap<>(Map.of("__value__", 200.0, "timestamp", 1343555, "instance", "node2"))
        );
    
        QueryExecutor mockExecutor = Mockito.mock(QueryExecutor.class);
        Mockito.when(mockExecutor.support("promql")).thenReturn(true);
        Mockito.when(mockExecutor.execute(Mockito.anyString())).thenReturn(prometheusData);
        dataSourceService.setExecutors(List.of(mockExecutor));
    
        List<Map<String, Object>> result = dataSourceService.calculate("promql", "node_cpu_seconds_total > 150");
        assertEquals(2, result.size());
        assertNull(result.get(0).get("__value__"));
        assertEquals(200.0, result.get(1).get("__value__"));
    }
    

    Analysis:

    Input: An instant vector containing two time series with values 100.0 and 200.0. Query: node_cpu_seconds_total > 150. Expected PromQL Behavior: Only the series with the value 200.0 satisfies the condition. Therefore, the result should be a new vector containing only that single series, and its size should be 1. Behavior in Hertzbeat Test: The test asserts that result.size() is 2. Instead of filtering out the non-matching series (value of 100.0), it keeps the series in the result set but sets its value to null. Core Question: This "keep and nullify" behavior deviates from PromQL's standard filtering model. Is this an intentional design choice in Hertzbeat? My main question is whether the plan is to fully align with the Prometheus specification in this regard. If not, it could cause confusion for users familiar with standard PromQL.

  2. Support for Comparison Operations on Range Vectors

    The Prometheus documentation clearly states that logical and comparison operators are only defined to work between instant vectors. They do not support range vectors.

    However, I noticed that DataSourceServiceTest contains test cases involving comparisons between a range vector and a scalar, such as: node_cpu_seconds_total{mode="user"}[4m] <= 100

    This is an invalid expression in standard PromQL. This seems to suggest that Hertzbeat is extending PromQL syntax to support comparisons on range vectors.

@tomsun28 @MasamiYui @Duansg

bigcyy avatar Jun 23 '25 08:06 bigcyy

Hi, @bigcyy , thank you for the invitation

What a coincidence, I also discovered this issue and am currently refactoring the comparison logic in visitComparisonExpr.

Regarding the issue of hertzbeat supporting promql parsing, I have a few points to discuss with you:

  1. Should the parsing rules for promql and sql be separated, for example, into PromqlParser.g4 and SqlParser.g4? This way, responsibilities would be clearer, making development easier when handling or extending them.
  2. Should we support more complex expression rules? I see that what we currently support is limited.

If you have better ideas, I am happy to join you in handling this part of the logic.

Duansg avatar Jun 23 '25 09:06 Duansg

Hi, @bigcyy , thank you for the invitation

What a coincidence, I also discovered this issue and am currently refactoring the comparison logic in visitComparisonExpr.

Regarding the issue of hertzbeat supporting promql parsing, I have a few points to discuss with you:

  1. Should the parsing rules for promql and sql be separated, for example, into PromqlParser.g4 and SqlParser.g4? This way, responsibilities would be clearer, making development easier when handling or extending them.
  2. Should we support more complex expression rules? I see that what we currently support is limited.

If you have better ideas, I am happy to join you in handling this part of the logic.

Thanks for bringing this up and for your willingness to collaborate! I've also been thinking about this, and here are my thoughts on your points:

  1. On separating into PromqlParser.g4 and SqlParser.g4: I see your point about clearer responsibilities. However, I have a concern about potential code duplication. Our primary use case is for alert-triggering expressions. If we create two separate parsers, we would need to implement and maintain identical comparison logic for both. For example, promql > number would have a parallel sql > number, and promql and promql would mirror sql and sql. The current parser is designed to handle these simple, common structures for both PromQL and SQL queries in a unified way, which avoids this redundancy.

  2. On supporting more complex expression rules: That's a great question. To be honest, my own use cases have been relatively straightforward, so I haven't yet encountered a strong need for more complex rules. I'm definitely open to exploring this further. Could you perhaps share some examples of the more complex expressions you have in mind? That would help us better understand the requirements.

  3. An issue I found with the current PromQL lexer: I also wanted to mention that I've identified a separate but related issue with the current PromQL lexer. When parsing an expression like promql > promql, the lexer incorrectly tokensizes the entire string as a single PromQL entity. While this might seem logical at a high level, it prevents the expression from being processed by our intended expr > expr comparison logic. I'm working on a fix for this and will submit a PR shortly.

I'm very happy to collaborate with you on refining this part of the logic. Let's continue the discussion!

bigcyy avatar Jun 23 '25 10:06 bigcyy

Hi, here are my thoughts: Alert Expression = (PromQL) Metric Query + Alert Condition.

For example: node_cpu_seconds_total > 150, node_cpu_seconds_total is the PromQL Metric Query, and > 150 is the Alert Condition.

1.About Comparison Logic for Instant Vector vs. Scalar: Keep the series in the result set but sets its value to null, It used for alert recovery.

if (result.get(VALUE) == null) {
    // recovery the alert
    handleRecoveredAlert(fingerPrints);
    continue;
}

2.About Support for Comparison Operations on Range Vectors: Accoding to the composition of alert expressions, it only requires that node_cpu_seconds_total{mode="user"}[4m] satisfies the PromQL expression.

PS: As @bigcyy said, Alert expressions are not standard PromQL expressions,and include extensions modified by Hertzbeat, how about add support for alert expression validation when front creating alert rules.

The above are my guesses and interpretations. If there are any mistakes, please point them out and discuss.

MasamiYui avatar Jun 23 '25 11:06 MasamiYui

Hi, @bigcyy , thank you for the invitation What a coincidence, I also discovered this issue and am currently refactoring the comparison logic in visitComparisonExpr. Regarding the issue of hertzbeat supporting promql parsing, I have a few points to discuss with you:

  1. Should the parsing rules for promql and sql be separated, for example, into PromqlParser.g4 and SqlParser.g4? This way, responsibilities would be clearer, making development easier when handling or extending them.
  2. Should we support more complex expression rules? I see that what we currently support is limited.

If you have better ideas, I am happy to join you in handling this part of the logic.

Thanks for bringing this up and for your willingness to collaborate! I've also been thinking about this, and here are my thoughts on your points:

  1. On separating into PromqlParser.g4 and SqlParser.g4: I see your point about clearer responsibilities. However, I have a concern about potential code duplication. Our primary use case is for alert-triggering expressions. If we create two separate parsers, we would need to implement and maintain identical comparison logic for both. For example, promql > number would have a parallel sql > number, and promql and promql would mirror sql and sql. The current parser is designed to handle these simple, common structures for both PromQL and SQL queries in a unified way, which avoids this redundancy.
  2. On supporting more complex expression rules: That's a great question. To be honest, my own use cases have been relatively straightforward, so I haven't yet encountered a strong need for more complex rules. I'm definitely open to exploring this further. Could you perhaps share some examples of the more complex expressions you have in mind? That would help us better understand the requirements.
  3. An issue I found with the current PromQL lexer: I also wanted to mention that I've identified a separate but related issue with the current PromQL lexer. When parsing an expression like promql > promql, the lexer incorrectly tokensizes the entire string as a single PromQL entity. While this might seem logical at a high level, it prevents the expression from being processed by our intended expr > expr comparison logic. I'm working on a fix for this and will submit a PR shortly.

I'm very happy to collaborate with you on refining this part of the logic. Let's continue the discussion!

Hi, sorry just had time to see your reply, first of all thank you for your reply, I have the following thoughts in response to the points you raised:

  1. I think the parser can be 2 different sets of rules, which does not affect the alarm triggered, we are only the parser to do processing, different query rules corresponding to a different parser I feel normal, as you described there is no need to develop 2 different judgment logic, we just need to do the rectification of the entrance in the following can be
org.apache.hertzbeat.alert.service.impl.DataSourceServiceImpl#evaluate

Or sink down to QueryExecutor, so that parsing and querying can be executed as a single process. The current Sql and Promql parsing rules together affect the ease of maintenance and secondary development, and you often need to worry about whether you are breaking another parsing rule.

2.in the scenario of the example, in addition to the operation and maintenance aspects of the scenario, there are even business development to configure a thousand and one expressions, but this does not violate the rules of the promql expression, for example:

# Memory utilization
(memory_total_bytes - memory_available_bytes) / memory_total_bytes > 0.8

#Connections exceeded maximum limit
stat_activity_count / stat_activity_max_connections > 0.8 or rate(stat_database_tup_returned[5m]) / rate(stat_database_tup_fetched[5m]) < 0.5

#Overall system health score
((1 - rate(process_cpu_seconds_total[5m])) * 0.3 + (1 - (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes) * 0.3 + (1 - (node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes) * 0.2 +(1 - rate(node_network_transmit_errs_total[5m]) / rate(node_network_transmit_packets_total[5m])) * 0.2) * 100 < 80

#Customize the business class of the
a: sum(increase(item_dubbo_provider_counter_total{app_name=~"$app",instance=~"$instance"}[1m])) by (dubbo_service,dubbo_method,app_name ,type,traceName)
b: sum({__name__=~"item_thread_pool_gauge.+",app_name=~"$app",instance=~"$instance"}) by (__name__,instance)
..... A lot.

  1. You are right about the problems. The current defined rules are not friendly, normal rules parsing are mistaken for identifiers, I think it should be defined as separate tokens. secondly the processing logic of Visitor, in the official description it is that we only support vectors and scalars, like scalars and vectors, scalar and scalar we don't support it very well. scalar and scalar` we don't support very well, also compareOp is not handled very well, it's not compatible with the semantics of Promql at the moment.

Even though I've fixed that part of the logic, I've decided to let this PR go to see if the PR you submitted will open my mind to new ideas. :muscle:

Duansg avatar Jun 23 '25 14:06 Duansg

As @bigcyy said, Alert expressions are not standard PromQL expressions,and include extensions modified by Hertzbeat, how about add support for alert expression validation when front creating alert rules.

hi, understand you very well.

1, I think it should be judging whether the result is null or not, not judging the value, which breaks the semantics of promql, don't you think? Between instantaneous vectors and scalars, these operators are applied to the value of each data sample in the vector, and vector elements with a comparison result of false are deleted from the result vector, the current logic is to retain the Null value, which is extremely unfriendly to subsequent additions to the preview function.

2, the actual threshold judgment of the scene there are many, range-vector just in the implementation of Some functions necessary conditions, rather than only support for querying range-vector, but currently can be extended through the promql(xxx), this I think is not the primary focus.

3, I don't think it's about providing a calibration function, but rather providing a preview, which is often useful when writing complex expressions to determine the correctness and accuracy of the threshold, don't you think?

To summarize, I think the primary focus is as @bigcyy mentioned, fixing the semantic issues with comparison and providing preview functionality is sufficient, and subsequent issues can be expanded with real usage scenarios in additional.

Duansg avatar Jun 23 '25 14:06 Duansg

Everyone,

Thank you for the in-depth discussion. By combining everyone's viewpoints with my own thoughts, I'd like to propose a more complete solution to help us align our thinking.

My core idea is this: we should not confine ourselves to the goal of "100% PromQL compatibility." Instead, we should clearly define what we are building: a domain-specific language (DSL) tailored for Hertzbeat's alerting scenarios. As @MasamiYui pointed out, an alert expression can be clearly broken down into a Metric Query + Alert Condition. The Metric Query part can (and should) use standard PromQL or SQL. For the Alert Condition part, however, we can define our own, better-optimized semantics to cater to specific alerting features, such as alert recovery.

1. Regarding Comparison Operators

  • Core Supported Operations:

    • Instant Vector vs. Scalar: This must be supported as it's the most common type of threshold alert. This part is already supported.
    • Instant Vector vs. Instant Vector: This should be supported. It's crucial for scenarios involving dynamic thresholds or comparing two related metrics. This part is not yet supported.
  • Operations That Must Be Prohibited:

    • Direct comparison and operations on a Range Vector: This is the most ambiguous and dangerous part of the current implementation. An expression like metric[5m] > 100 has unclear semantics. I strongly recommend following PromQL's best practice: enforce that users must use aggregation functions (e.g., avg_over_time, rate, max_over_time) to convert a range vector into an instant vector before any comparison or logical operations can be performed. This will completely eliminate semantic ambiguity and ensure that the alerting behavior is entirely predictable and aligned with user expectations. We need to modify this part to prohibit it.
  • Regarding Retaining and Setting to Null for Alert Recovery:

    • I completely agree with @MasamiYui's perspective and believe this is a huge advantage for Hertzbeat as a stateful alerting system. Retaining the series and setting its value to null for alert recovery is a smart and necessary design that serves a core function.
    • In response to @Duansg's concern about this being unfriendly to the "preview" feature, I believe this is a problem that can be solved on the frontend. During a preview, we can mark series where value != null as "Triggered" and those where value == null as "Normal/Recovered." The user experience can be made very clear.

2. Regarding Logical Operators

Logical operators should be used to connect the results of two or more "alert conditions." However, we need to carefully consider how to handle null values. We can understand it this way: a non-null value on either the left or right side represents true, while null represents false. There's also the case where the left side has a value, but the corresponding instant vector cannot be found on the right side; we'll treat the missing one as false. The final result merges all instant vectors from both sides. For example:

Expression: (a > 1) and (b > 2)

Let's assume we have the following raw data:

  • Metric a
    • {instance="A", os="linux"}: 5
    • {instance="B", os="linux"}: 0 (does not satisfy a > 1)
    • {instance="C", os="windows"}: 10
  • Metric b
    • {instance="A", os="linux"}: 3
    • {instance="B", os="linux"}: 5
    • {instance="D", os="windows"}: 20 (this is a new instance)

Step 1: Calculate the left expression (a > 1)

Based on our previously defined "retain and set to null" logic, the result of this expression is a new instant vector, which we'll call Result_A:

  • {instance="A", os="linux"}: 5 > 1 is true, so the value is retained. __value__: 5
  • {instance="B", os="linux"}: 0 > 1 is false, so the value is set to null. __value__: null
  • {instance="C", os="windows"}: 10 > 1 is true, so the value is retained. __value__: 10

Step 2: Calculate the right expression (b > 2)

Similarly, we calculate (b > 2) and get the result vector Result_B:

  • {instance="A", os="linux"}: 3 > 2 is true, so the value is retained. __value__: 3
  • {instance="B", os="linux"}: 5 > 2 is true, so the value is retained. __value__: 5
  • {instance="D", os="windows"}: 20 > 2 is true, so the value is retained. __value__: 20

Step 3: Execute Result_A AND Result_B

Now we come to the most critical step. The AND operator will iterate over all instances that have appeared in either Result_A or Result_B (i.e., A, B, C, D) and then determine the final result based on the following logic:

  • For instance="A":

    • In Result_A, its value is 5 (non-null, representing True).
    • In Result_B, its value is 3 (non-null, representing True).
    • The result of True AND True is True. The final result is non-null. Retain __value__ (typically, the value from the left side is kept, which is 5).
  • For instance="B":

    • In Result_A, its value is null (representing False).
    • In Result_B, its value is 5 (non-null, representing True).
    • The result of False AND True is False. The final result is __value__: null.
  • For instance="C":

    • In Result_A, its value is 10 (non-null, representing True).
    • In Result_B, it does not exist. A non-existent series in a logical comparison should be treated as False.
    • The result of True AND False is False. The final result is __value__: null.
  • For instance="D":

    • In Result_A, it does not exist (representing False).
    • In Result_B, its value is 20 (non-null, representing True).
    • The result of False AND True is False. The final result is __value__: null.

Final Result

After executing (a > 1) and (b > 2), the final vector we get is:

  • {instance="A", os="linux"}: __value__: 5 <-- The only series that triggers an alert
  • {instance="B", os="linux"}: __value__: null
  • {instance="C", os="windows"}: __value__: null
  • {instance="D", os="windows"}: __value__: null

This result perfectly aligns with the logical expectation: an alert is triggered only when an instance satisfies both conditions simultaneously.

bigcyy avatar Jun 23 '25 16:06 bigcyy

In addition, I also support @Duansg's suggestion to split it into two .g4 files; I recommend we can add this as a future development goal. Then, regarding the complex syntax @Duansg mentioned earlier, such as arithmetic expressions, which we indeed do not currently support—should its priority be set higher than splitting the .g4 files, but lower than the solution I proposed above?

bigcyy avatar Jun 23 '25 16:06 bigcyy

In addition, I also support @Duansg's suggestion to split it into two .g4 files; I recommend we can add this as a future development goal. Then, regarding the complex syntax @Duansg mentioned earlier, such as arithmetic expressions, which we indeed do not currently support—should its priority be set higher than splitting the .g4 files, but lower than the solution I proposed above?那么,关于前面提到的复杂语法,比如算术表达式,我们目前确实不支持--它的优先级是否应高于拆分 .g4 文件,但低于我上面提出的解决方案?

I think splitting .g4 files should be done before supporting complex expressions, it will be good for extension and maintenance, at this stage we only deal with the current problem. But just make sure the syntax parsing is independent at this stage.

There's been a lot of discussion about value==null, but I don't think that's the focus of this installment. There are a lot of ways to handle it, and even with the new preview, it can still be filtered from the front and back end, so we just need to focus on parsing it correctly and accurately.

Duansg avatar Jun 24 '25 01:06 Duansg

As @bigcyy said, Alert expressions are not standard PromQL expressions,and include extensions modified by Hertzbeat, how about add support for alert expression validation when front creating alert rules.

hi, understand you very well.

1, I think it should be judging whether the result is null or not, not judging the value, which breaks the semantics of promql, don't you think? Between instantaneous vectors and scalars, these operators are applied to the value of each data sample in the vector, and vector elements with a comparison result of false are deleted from the result vector, the current logic is to retain the Null value, which is extremely unfriendly to subsequent additions to the preview function.

2, the actual threshold judgment of the scene there are many, range-vector just in the implementation of Some functions necessary conditions, rather than only support for querying range-vector, but currently can be extended through the promql(xxx), this I think is not the primary focus.

3, I don't think it's about providing a calibration function, but rather providing a preview, which is often useful when writing complex expressions to determine the correctness and accuracy of the threshold, don't you think?

To summarize, I think the primary focus is as @bigcyy mentioned, fixing the semantic issues with comparison and providing preview functionality is sufficient, and subsequent issues can be expanded with real usage scenarios in additional.

@Duansg Thanks for the suggestion.

1.Keeping null is not the most elegant solution, but it’s acceptable for now. If a better design becomes available later, we can refactor and it without affect existing configurations.

2.I agree with @bigcyy the principle that prohibiting direct comparisons and operations on Range Vectors helps eliminate semantic ambiguity. That said, disallowing this might affect some existing configurations (though I wonder—does anyone actually use this expression in production?)

3.The preview feature looks better.

MasamiYui avatar Jun 24 '25 10:06 MasamiYui

As @bigcyy said, Alert expressions are not standard PromQL expressions,and include extensions modified by Hertzbeat, how about add support for alert expression validation when front creating alert rules.

hi, understand you very well. 1, I think it should be judging whether the result is null or not, not judging the value, which breaks the semantics of promql, don't you think? Between instantaneous vectors and scalars, these operators are applied to the value of each data sample in the vector, and vector elements with a comparison result of false are deleted from the result vector, the current logic is to retain the Null value, which is extremely unfriendly to subsequent additions to the preview function. 2, the actual threshold judgment of the scene there are many, range-vector just in the implementation of Some functions necessary conditions, rather than only support for querying range-vector, but currently can be extended through the promql(xxx), this I think is not the primary focus. 3, I don't think it's about providing a calibration function, but rather providing a preview, which is often useful when writing complex expressions to determine the correctness and accuracy of the threshold, don't you think? To summarize, I think the primary focus is as @bigcyy mentioned, fixing the semantic issues with comparison and providing preview functionality is sufficient, and subsequent issues can be expanded with real usage scenarios in additional.

@Duansg Thanks for the suggestion.

1.Keeping null is not the most elegant solution, but it’s acceptable for now. If a better design becomes available later, we can refactor and it without affect existing configurations.

2.I agree with @bigcyy the principle that prohibiting direct comparisons and operations on Range Vectors helps eliminate semantic ambiguity. That said, disallowing this might affect some existing configurations (though I wonder—does anyone actually use this expression in production?)

3.The preview feature looks better.

Real business scenarios are only more complicated than that. :)

I can provide some contextualization

An example of a real business environment: Image Because of the business architecture, there are currently 10+ prometheus nodes like this one, and the rules on each node are also different

A simple metric Image

As you can see above, if you don't utilize the promql related features wisely, you are likely to get into unnecessary trouble.

However, despite the many issues that have not been dealt with, it is because of us interesting people that hertzbeat will also explore more endless possibilities in the years to come. :muscle:

Duansg avatar Jun 24 '25 11:06 Duansg

It seems that full PromQL support is necessary, then? If so, the approach of setting the value to null will need to be changed. Perhaps we can take the result of each periodic threshold expression evaluation and compare it against the metrics that are currently in an alerting state. If a metric for an active alert does not appear in the expression evaluation result, we can conclude that it has recovered. This would eliminate the need to use null for this determination.

@Duansg Additionally, regarding the minor bug mentioned yesterday, I simply modified the PromQL grammar in the .g4 file: I deleted the line promql binaryOperator promql and then regenerated the corresponding lexer and parser code. If you have already modified this part of the code, we can just use your version. I'll wait for your next PR.

bigcyy avatar Jun 24 '25 13:06 bigcyy

@Duansg Additionally, regarding the minor bug mentioned yesterday, I simply modified the PromQL grammar in the .g4 file: I deleted the line promql binaryOperator promql and then regenerated the corresponding lexer and parser code. If you have already modified this part of the code, we can just use your version. I'll wait for your next PR.

Okay, that's fine, go ahead and submit the PR then, if you change that part then I'll most likely have to add changes to your revisions.

Duansg avatar Jun 24 '25 13:06 Duansg

hi @Duansg, I have a question about splitting G4 files. For implementation, should we split and parse it into two separate grammars, and then implement distinct Visitor for each to traverse the parse tree? Or should we use Antlr's priority mechanism to make the grammar rules clearer? Since I lack experience with complex expressions, I'm not sure if parsing different grammar rules within them will cause conflicts during development. Can Antlr's precedence mechanism completely handle situations with conflicts? After all, it seems a bit redundant in code that an expression needs to be parsed into two separate parse trees.

Cyanty avatar Jun 27 '25 08:06 Cyanty

hi @Duansg, I have a question about splitting G4 files. For implementation, should we split and parse it into two separate grammars, and then implement distinct Visitor for each to traverse the parse tree? Or should we use Antlr's priority mechanism to make the grammar rules clearer? Since I lack experience with complex expressions, I'm not sure if parsing different grammar rules within them will cause conflicts during development. Can Antlr's precedence mechanism completely handle situations with conflicts? After all, it seems a bit redundant in code that an expression needs to be parsed into two separate parse trees.

Hi, sorry, I don't know why your reply was ignored, causing me to just see your message now, and in response to the question you posed, I'll tell you my experience and opinion so far.

1、 First of all: I agree with you that prioritizing ANTLR4 expressions according to their characteristics and priorities will ensure the convenience of subsequent debugging or type checking, semantic analysis, etc. But: as far as my use and understanding goes, there is no strong recommendation, either in the official instructions or in the community, to do a fusion of multiple different syntax rules.

2、 I think the prioritization rules you propose are for a single syntax, and there are serious limitations in practice when it comes to expressions in mixed syntax, or at least I think the risk of doing so is high, and I would strongly discourage it for the following reasons:

1): severe lexical conflicts, difficult to handle the processing of different structures nested together.
2): Prioritization only applies to expressions, not to syntax rule switching, I think it's sufficient to use entry rule assignment, nested parsing, which are not inside the ANTLR prioritization mechanism.
3): readability is extremely poor, in a rule mixed into two grammars, the AST syntax tree will appear mixed nested structure, not only not "clear", but more "confusing", difficult to deal with the subsequent troubleshooting and type checking, semantic analysis and so on.

3、 What are the considerations for your question about an expression parsing into two separate parse trees seeming a bit redundant? Or is it possible to expand on this, after all, my experience is limited as well, and I'd be happy to talk to and listen to your ideas.

Summarize my opinion in one sentence: ANTLR4's prioritization mechanism I think can make the expression syntax within the same syntax parsing clearer, but it is not suitable for combining two complex syntax rules (SQL and PromQL) into one .g4 file and directly mixing the parsing, or you can try to build on the existing foundation of Hertzbeat by adding a mix of standard (SQL and PromQL) syntax rules and then think about other processing logic such as subsequent maintainability in conjunction with the logical context of Hertzbeat. Or you can try to add a mix of standard (SQL and PromQL) syntax rules to hertzbeat's existing base, and then think about the maintainability and other processing logic in the context of hertzbeat's logic.

Duansg avatar Jul 04 '25 14:07 Duansg

hi @Duansg, I have a question about splitting G4 files. For implementation, should we split and parse it into two separate grammars, and then implement distinct Visitor for each to traverse the parse tree? Or should we use Antlr's priority mechanism to make the grammar rules clearer? Since I lack experience with complex expressions, I'm not sure if parsing different grammar rules within them will cause conflicts during development. Can Antlr's precedence mechanism completely handle situations with conflicts? After all, it seems a bit redundant in code that an expression needs to be parsed into two separate parse trees.

Hi, sorry, I don't know why your reply was ignored, causing me to just see your message now, and in response to the question you posed, I'll tell you my experience and opinion so far.

1、 First of all: I agree with you that prioritizing ANTLR4 expressions according to their characteristics and priorities will ensure the convenience of subsequent debugging or type checking, semantic analysis, etc. But: as far as my use and understanding goes, there is no strong recommendation, either in the official instructions or in the community, to do a fusion of multiple different syntax rules.

2、 I think the prioritization rules you propose are for a single syntax, and there are serious limitations in practice when it comes to expressions in mixed syntax, or at least I think the risk of doing so is high, and I would strongly discourage it for the following reasons:

1): severe lexical conflicts, difficult to handle the processing of different structures nested together.
2): Prioritization only applies to expressions, not to syntax rule switching, I think it's sufficient to use entry rule assignment, nested parsing, which are not inside the ANTLR prioritization mechanism.
3): readability is extremely poor, in a rule mixed into two grammars, the AST syntax tree will appear mixed nested structure, not only not "clear", but more "confusing", difficult to deal with the subsequent troubleshooting and type checking, semantic analysis and so on.

3、 What are the considerations for your question about an expression parsing into two separate parse trees seeming a bit redundant? Or is it possible to expand on this, after all, my experience is limited as well, and I'd be happy to talk to and listen to your ideas.

Summarize my opinion in one sentence: ANTLR4's prioritization mechanism I think can make the expression syntax within the same syntax parsing clearer, but it is not suitable for combining two complex syntax rules (SQL and PromQL) into one .g4 file and directly mixing the parsing, or you can try to build on the existing foundation of Hertzbeat by adding a mix of standard (SQL and PromQL) syntax rules and then think about other processing logic such as subsequent maintainability in conjunction with the logical context of Hertzbeat. Or you can try to add a mix of standard (SQL and PromQL) syntax rules to hertzbeat's existing base, and then think about the maintainability and other processing logic in the context of hertzbeat's logic.

Understood. Considering that HertzBeat has already integrated various time-series databases, which may have different SQL-like syntaxes, I can imagine the complexity and the chaotic readability of parsing with a single parse tree. As support for more data source syntaxes is added, splitting the logic is indeed necessary. Of course, the work of adapting to multiple parse trees is time and energy consuming, this will likely be a long process. In the future, HertzBeat can achieve a smooth transition in this regard to ensure compatibility for upgrades from older versions. Thank you for your answer.

Cyanty avatar Jul 04 '25 15:07 Cyanty

hi @Duansg, I have a question about splitting G4 files. For implementation, should we split and parse it into two separate grammars, and then implement distinct Visitor for each to traverse the parse tree? Or should we use Antlr's priority mechanism to make the grammar rules clearer? Since I lack experience with complex expressions, I'm not sure if parsing different grammar rules within them will cause conflicts during development. Can Antlr's precedence mechanism completely handle situations with conflicts? After all, it seems a bit redundant in code that an expression needs to be parsed into two separate parse trees.

Hi, sorry, I don't know why your reply was ignored, causing me to just see your message now, and in response to the question you posed, I'll tell you my experience and opinion so far. 1、 First of all: I agree with you that prioritizing ANTLR4 expressions according to their characteristics and priorities will ensure the convenience of subsequent debugging or type checking, semantic analysis, etc. But: as far as my use and understanding goes, there is no strong recommendation, either in the official instructions or in the community, to do a fusion of multiple different syntax rules. 2、 I think the prioritization rules you propose are for a single syntax, and there are serious limitations in practice when it comes to expressions in mixed syntax, or at least I think the risk of doing so is high, and I would strongly discourage it for the following reasons:

1): severe lexical conflicts, difficult to handle the processing of different structures nested together.
2): Prioritization only applies to expressions, not to syntax rule switching, I think it's sufficient to use entry rule assignment, nested parsing, which are not inside the ANTLR prioritization mechanism.
3): readability is extremely poor, in a rule mixed into two grammars, the AST syntax tree will appear mixed nested structure, not only not "clear", but more "confusing", difficult to deal with the subsequent troubleshooting and type checking, semantic analysis and so on.

3、 What are the considerations for your question about an expression parsing into two separate parse trees seeming a bit redundant? Or is it possible to expand on this, after all, my experience is limited as well, and I'd be happy to talk to and listen to your ideas. Summarize my opinion in one sentence: ANTLR4's prioritization mechanism I think can make the expression syntax within the same syntax parsing clearer, but it is not suitable for combining two complex syntax rules (SQL and PromQL) into one .g4 file and directly mixing the parsing, or you can try to build on the existing foundation of Hertzbeat by adding a mix of standard (SQL and PromQL) syntax rules and then think about other processing logic such as subsequent maintainability in conjunction with the logical context of Hertzbeat. Or you can try to add a mix of standard (SQL and PromQL) syntax rules to hertzbeat's existing base, and then think about the maintainability and other processing logic in the context of hertzbeat's logic.

Understood. Considering that HertzBeat has already integrated various time-series databases, which may have different SQL-like syntaxes, I can imagine the complexity and the chaotic readability of parsing with a single parse tree. As support for more data source syntaxes is added, splitting the logic is indeed necessary. Of course, the work of adapting to multiple parse trees is time and energy consuming, this will likely be a long process. In the future, HertzBeat can achieve a smooth transition in this regard to ensure compatibility for upgrades from older versions. Thank you for your answer.

@Cyanty Yes, you're right, it is indeed a long and complicated process :)

But thank you very much for your answer, you can follow the movement of the community, so that when we have a follow-up related to the implementation of the same, and also participate in our construction :muscle:

Duansg avatar Jul 04 '25 15:07 Duansg