superset icon indicating copy to clipboard operation
superset copied to clipboard

Starrocks executing a specific query will cause the Query history page to report an error and not load the data

Open kainchow opened this issue 1 year ago • 4 comments

Bug description

Starrocks executing a specific query will cause the Query history page to report an error and not load the data. Error msg: An error occurred while fetching Query historys: Fatal error Snipaste_2024-08-22_15-14-36

https://github.com/user-attachments/assets/dc4bf4da-4725-47ac-9545-ceaac1e429ae

How to reproduce the bug

  1. Create a data source with mysql, fill in the Starrocks cluster address and account secret.
  2. Goto Query history page(/sqllab/history/), at this point, you can see the query record normally.
  3. Goto Sql Lab, select the Starrocks data source you just created.
  4. Execute the following sql: select date_add(current_date, -1) as yst_date.
  5. Return to the Query history page, at this point the page reported an error, can not browse the query history.

Screenshots/recordings

superset_app container logs: 2024-08-22 06:53:04,605:ERROR:flask_appbuilder.api:list index out of range Traceback (most recent call last): File "/app/superset/sql_parse.py", line 297, in _extract_tables_from_sql statements = parse(self.stripped(), dialect=self._dialect) File "/usr/local/lib/python3.10/site-packages/sqlglot/init.py", line 87, in parse return Dialect.get_or_raise(read or dialect).parse(sql, **opts) File "/usr/local/lib/python3.10/site-packages/sqlglot/dialects/dialect.py", line 490, in parse return self.parser(**opts).parse(self.tokenize(sql), sql) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 1153, in parse return self._parse( File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 1219, in _parse expressions.append(parse_method(self)) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 1427, in _parse_statement expression = self._parse_set_operations(expression) if expression else self._parse_select() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2486, in parse_select from = self._parse_from() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2693, in _parse_from exp.From, comments=self._prev_comments, this=self._parse_table(joins=joins) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3067, in _parse_table subquery = self._parse_select(table=True) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2501, in _parse_select self._parse_table() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3067, in _parse_table subquery = self._parse_select(table=True) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2491, in _parse_select this = self._parse_query_modifiers(this) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2639, in _parse_query_modifiers key, expression = parser(self) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 942, in TokenType.WHERE: lambda self: ("where", self._parse_where()), File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3394, in _parse_where exp.Where, comments=self._prev_comments, this=self._parse_conjunction() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3704, in _parse_conjunction return self._parse_tokens(self._parse_equality, self.CONJUNCTION) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5534, in _parse_tokens this = parse_method() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3707, in _parse_equality return self._parse_tokens(self._parse_comparison, self.EQUALITY) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5541, in _parse_tokens expression=parse_method(), File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3710, in _parse_comparison return self._parse_tokens(self._parse_range, self.COMPARISON) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5534, in _parse_tokens this = parse_method() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3713, in _parse_range this = this or self._parse_bitwise() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3832, in _parse_bitwise this = self._parse_term() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3864, in _parse_term return self._parse_tokens(self._parse_factor, self.TERM) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5534, in _parse_tokens this = parse_method() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3868, in _parse_factor this = parse_method() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3889, in _parse_unary return self._parse_at_time_zone(self._parse_type()) File "/usr/local/lib/python3.10/site-packages/sqlglot/dialects/mysql.py", line 602, in _parse_type return super()._parse_type(parse_interval=parse_interval) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3910, in _parse_type data_type = self._parse_types(check_func=True, allow_identifiers=False) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4005, in _parse_types expressions = self._parse_csv(self._parse_type_size) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5520, in _parse_csv parse_result = parse_method() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3927, in _parse_type_size this = self._parse_type() File "/usr/local/lib/python3.10/site-packages/sqlglot/dialects/mysql.py", line 602, in _parse_type return super()._parse_type(parse_interval=parse_interval) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3911, in _parse_type this = self._parse_column() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4113, in _parse_column this = self._parse_column_reference() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4117, in _parse_column_reference this = self._parse_field() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4232, in _parse_field or self._parse_function(anonymous=anonymous_func) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4253, in _parse_function func = self._parse_function_call( File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4319, in _parse_function_call func = function(args) File "/usr/local/lib/python3.10/site-packages/sqlglot/dialects/dialect.py", line 707, in _builder raise ParseError(f"INTERVAL expression expected but got '{interval}'") sqlglot.errors.ParseError: INTERVAL expression expected but got '-1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/init.py", line 110, in wraps return f(self, *args, **kwargs) File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/init.py", line 182, in wraps return f(self, *args, **kwargs) File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/init.py", line 1711, in get_list return self.get_list_headless(**kwargs) File "/app/superset/queries/api.py", line 340, in get_list_headless response[flask_appbuilder.const.API_RESULT_RES_KEY] = list_model_schema.dump(lst, many=True) File "/usr/local/lib/python3.10/site-packages/marshmallow/schema.py", line 557, in dump result = self._serialize(processed_obj, many=many) File "/usr/local/lib/python3.10/site-packages/marshmallow/schema.py", line 519, in _serialize return [ File "/usr/local/lib/python3.10/site-packages/marshmallow/schema.py", line 520, in self._serialize(d, many=False) File "/usr/local/lib/python3.10/site-packages/marshmallow/schema.py", line 525, in _serialize value = field_obj.serialize(attr_name, obj, accessor=self.get_attribute) File "/usr/local/lib/python3.10/site-packages/marshmallow/fields.py", line 344, in serialize return self._serialize(value, attr, obj, **kwargs) File "/usr/local/lib/python3.10/site-packages/marshmallow/fields.py", line 1991, in _serialize return self._serialize_method(obj) File "/app/superset/queries/schemas.py", line 76, in get_sql_tables return obj.sql_tables File "/app/superset/models/sql_lab.py", line 75, in sql_tables extract_tables_from_jinja_sql( File "/app/superset/sql_parse.py", line 1126, in extract_tables_from_jinja_sql ).tables File "/app/superset/sql_parse.py", line 287, in tables self._tables = self._extract_tables_from_sql() File "/app/superset/sql_parse.py", line 303, in _extract_tables_from_sql **ex.errors[0] IndexError: list index out of range 2024-08-22 06:53:04,613:INFO:werkzeug:192.168.10.1 - - [22/Aug/2024 06:53:04] "GET /api/v1/query/?q=(filters:!((col:database,opr:rel_o_m,value:2)),order_column:start_time,order_direction:desc,page:0,page_size:25) HTTP/1.1" 500 -

Superset version

4.0.2

Python version

3.10

Node version

16

Browser

Firefox

Additional context

No response

Checklist

  • [X] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [X] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [X] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

kainchow avatar Aug 22 '24 07:08 kainchow

To resolve the "list index out of range" error in the Query history page when executing a specific query in Starrocks on Apache Superset, you should check the handling of the statements list in the execute_sql_statements function. Specifically, ensure that the list of statements is not empty before attempting to access its elements.

Here is a potential fix:

  1. Add a check to ensure that statements is not empty before proceeding with the execution loop.
def execute_sql_statements(
    # pylint: disable=too-many-arguments, too-many-locals, too-many-statements, too-many-branches
    query_id: int,
    rendered_query: str,
    return_results: bool,
    store_results: bool,
    start_time: Optional[float],
    expand_data: bool,
    log_params: Optional[dict[str, Any]],
) -> Optional[dict[str, Any]]:
    """Executes the sql query returns the results."""
    if store_results and start_time:
        # only asynchronous queries
        stats_logger.timing("sqllab.query.time_pending", now_as_float() - start_time)

    query = get_query(query_id)
    payload: dict[str, Any] = {"query_id": query_id}
    database = query.database
    db_engine_spec = database.db_engine_spec
    db_engine_spec.patch()

    if database.allow_run_async and not results_backend:
        raise SupersetErrorException(
            SupersetError(
                message=__("Results backend is not configured."),
                error_type=SupersetErrorType.RESULTS_BACKEND_NOT_CONFIGURED_ERROR,
                level=ErrorLevel.ERROR,
            )
        )

    # Breaking down into multiple statements
    parsed_query = ParsedQuery(
        rendered_query,
        engine=db_engine_spec.engine,
    )
    if not db_engine_spec.run_multiple_statements_as_one:
        statements = parsed_query.get_statements()
        logger.info(
            "Query %s: Executing %i statement(s)", str(query_id), len(statements)
        )
    else:
        statements = [rendered_query]
        logger.info("Query %s: Executing query as a single statement", str(query_id))

    # Check if statements list is empty
    if not statements:
        raise SupersetErrorException(
            SupersetError(
                message=__("No valid SQL statements found."),
                error_type=SupersetErrorType.INVALID_SQL_STATEMENT_ERROR,
                level=ErrorLevel.ERROR,
            )
        )

    logger.info("Query %s: Set query to 'running'", str(query_id))
    query.status = QueryStatus.RUNNING
    query.start_running_time = now_as_float()
    db.session.commit()

    # Rest of the function...

This check ensures that the statements list is not empty before attempting to execute any SQL statements, which should prevent the "list index out of range" error [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Aug 22 '24 07:08 dosubot[bot]

Maybe @miomiocat has some insight here?

rusackas avatar Aug 22 '24 18:08 rusackas

I made a temporary change to superset/sql_parse.py which avoids this issue, but it may cause other problems, so someone still needs to fix the bug!

def _extract_tables_from_sql(self) -> set[Table]:
    """
    Extract all table references in a query.

    Note: this uses sqlglot, since it's better at catching more edge cases.
    """
    try:
        statements = parse(self.stripped(), dialect=self._dialect)
    except ParseError as ex:
        statements = []
    except SqlglotError as ex:
        ...

kainchow avatar Aug 23 '24 07:08 kainchow

Hi, does anyone fix it? I got same problem in 4.0.2 version, it happened in Saved Queries page

nvn01234 avatar Oct 15 '24 03:10 nvn01234

Maybe @miomiocat has some insight here?

Thanks.

We should use a specialized StarRocks Python client to connect to StarRocks instead of using a MySQL client and this issue does not occur with the StarRocks client.

We will continue to focus on developing and maintaining this StarRocks Python client.

Please refer to this usage link: https://docs.starrocks.io/docs/integrations/BI_integrations/Superset/

miomiocat avatar Oct 30 '24 07:10 miomiocat

Seems that the above answer should suffice, and we can call this resolved. If anyone has further issues here, we can revisit/reopen if needed.

rusackas avatar Apr 21 '25 22:04 rusackas