databricks-sdk-go icon indicating copy to clipboard operation
databricks-sdk-go copied to clipboard

[ISSUE] QueryHistory.List, ListAll spins until context canceled

Open MichaelUrman opened this issue 1 year ago • 3 comments

Description The underlying QueryHistory.List response for a page by token can return its own token. Thus its getNextReq loops on that page token.

Reproduction

statementId := os.Args[1]
w := databricks.Must(databricks.NewWorkspaceClient())
all, err := w.QueryHistory.ListAll(ctx, sql.ListQueryHistoryRequest{FilterBy: &sql.QueryFilter{StatementIds:[]string{statementId}}})

Expected behavior The API would not return the misleading token, or the client would respect ListQueriesResponse.HasNextPage=false, thus the call would terminate normally.

Is it a regression? Unknown; I've only tried github.com/databricks/databricks-sdk-go v0.29.1, specifically SHA 32643d652c60ee4ce5ec1c43e4580efa569e33f6

Debug Logs

Extracted logs. In this example I execute a query (omitted), and then attempt to query its information by a known statementId.

The code that generated these logs was more like the following:

qis := w.QueryHistory.List(ctx, sql.ListQueryHistoryRequest{FilterBy: &sql.QueryFilter{StatementIds: []string{id}}})
qi, err := qis.Next(ctx)
if err == nil {
	qi, err = qis.Next(ctx)
}
[... calling List, then its Next once works. Of note, has_next_page is false.]
2024/01/23 14:05:21 [DEBUG] GET /api/2.0/sql/history/queries?filter_by.statement_ids=01eeba2a-bba0-113c-af74-f644430fc1f2
< HTTP/2.0 200 OK
< {
<   "has_next_page": false,
<   "next_page_token": "CkoKJDAxZWViYTJhLWJiYTAtMTEzYy1hZjc0LWY2NDQ0MzBmYzFmMhCevLW/0zEYouOrxufSoQEiEDQ1Nzg3ZmRhMjY1MmNk... (64 more bytes)",
<   "res": [
<     {
<       "canSubscribeToLiveQuery": true,
<       "channel_used": {
<         "dbsql_version": "2023.50",
<         "name": "CHANNEL_NAME_CURRENT"
<       },
<       "duration": 0,
<       "endpoint_id": "45787fda2652cdc8",
<       "executed_as_user_id": 710930783203746,
<       "executed_as_user_name": "<redacted>",
<       "is_final": false,
<       "lookup_key": "CiQwMWVlYmEyYS1iYmEwLTExM2MtYWY3NC1mNjQ0NDMwZmMxZjIQnry1v9MxGhA0NTc4N2ZkYTI2NTJjZGM4IKLjq8bn0qEB",
<       "plans_state": "EMPTY",
<       "query_end_time_ms": 1706040320922,
<       "query_id": "01eeba2a-bba0-113c-af74-f644430fc1f2",
<       "query_start_time_ms": 1706040319518,
<       "query_text": "SELECT <redacted>",
<       "statement_type": "SELECT",
<       "status": "FINISHED",
<       "user_id": 710930783203746,
<       "user_name": "<redacted>",
<       "warehouse_id": "45787fda2652cdc8"
<     }
<   ]
< }
[ ... calling Next again begins an unending loop. Of note, the specified page_token repeats. ]
2024/01/23 14:05:21 [DEBUG] GET /api/2.0/sql/history/queries?filter_by.statement_ids=01eeba2a-bba0-113c-af74-f644430fc1f2&page_token=CkoKJDAxZWViYTJhLWJiYTAtMTEzYy1hZjc0LWY2NDQ0MzBmYzFmMhCevLW/0zEYouOrxufSoQEiEDQ1Nzg3ZmRhMjY1MmNkYzgoABInigEkMDFlZWJhMmEtYmJhMC0xMTNjLWFmNzQtZjY0NDQzMGZjMWYyGAE=
< HTTP/2.0 200 OK
< {
<   "has_next_page": false,
<   "next_page_token": "CkoKJDAxZWViYTJhLWJiYTAtMTEzYy1hZjc0LWY2NDQ0MzBmYzFmMhCevLW/0zEYouOrxufSoQEiEDQ1Nzg3ZmRhMjY1MmNk... (64 more bytes)"
< }
2024/01/23 14:05:21 [DEBUG] GET /api/2.0/sql/history/queries?filter_by.statement_ids=01eeba2a-bba0-113c-af74-f644430fc1f2&page_token=CkoKJDAxZWViYTJhLWJiYTAtMTEzYy1hZjc0LWY2NDQ0MzBmYzFmMhCevLW/0zEYouOrxufSoQEiEDQ1Nzg3ZmRhMjY1MmNkYzgoABInigEkMDFlZWJhMmEtYmJhMC0xMTNjLWFmNzQtZjY0NDQzMGZjMWYyGAE=
< HTTP/2.0 200 OK
< {
<   "has_next_page": false,
<   "next_page_token": "CkoKJDAxZWViYTJhLWJiYTAtMTEzYy1hZjc0LWY2NDQ0MzBmYzFmMhCevLW/0zEYouOrxufSoQEiEDQ1Nzg3ZmRhMjY1MmNk... (64 more bytes)"
< }
2024/01/23 14:05:21 [DEBUG] GET /api/2.0/sql/history/queries?filter_by.statement_ids=01eeba2a-bba0-113c-af74-f644430fc1f2&page_token=CkoKJDAxZWViYTJhLWJiYTAtMTEzYy1hZjc0LWY2NDQ0MzBmYzFmMhCevLW/0zEYouOrxufSoQEiEDQ1Nzg3ZmRhMjY1MmNkYzgoABInigEkMDFlZWJhMmEtYmJhMC0xMTNjLWFmNzQtZjY0NDQzMGZjMWYyGAE=
< HTTP/2.0 200 OK
< {
<   "has_next_page": false,
<   "next_page_token": "CkoKJDAxZWViYTJhLWJiYTAtMTEzYy1hZjc0LWY2NDQ0MzBmYzFmMhCevLW/0zEYouOrxufSoQEiEDQ1Nzg3ZmRhMjY1MmNk... (64 more bytes)"
< }
[ ... continues until canceled or, in this case, pipe breaks ]

Other Information Forking locally, using a go.mod replace, and changing the following line to if resp.NextPageToken == "" || !resp.HasNextPage { addresses the symptom. This is obviously not the correct way to implement the fix. https://github.com/databricks/databricks-sdk-go/blob/32643d652c60ee4ce5ec1c43e4580efa569e33f6/service/sql/api.go#L1188

Additional context I encountered this while exploring ways to look up row counts for a query executed via databricks-sql-go. I captured its "queryId" and used it equivalently to statementId above. Perhaps there's a better approach for that. :)

MichaelUrman avatar Jan 23 '24 20:01 MichaelUrman

Thanks for reporting this! We're treating this as a bug on the API side and work with the downstream team to not return a page token on the last page (in the spirit of https://google.aip.dev/132).

mgyucht avatar Jan 29 '24 15:01 mgyucht

@MichaelUrman Are you still seeing this issue?

mgyucht avatar Mar 22 '24 17:03 mgyucht

@mgyucht I had to go dig up my repro and make sure it wasn't using my workaround. But I believe it's good now. I no longer see a next_page_token in the traced response.

2024/03/22 13:28:24 [DEBUG] GET /api/2.0/sql/history/queries?filter_by.statement_ids=01eee879-f768-1e26-8dc8-59e5e640439f
< HTTP/2.0 200 OK
< {
<   "has_next_page": false,
<   "res": [

MichaelUrman avatar Mar 22 '24 18:03 MichaelUrman