sql icon indicating copy to clipboard operation
sql copied to clipboard

[BUG] Deprecation of OpenSearch DSL format has affected the behavior of the explain API

Open chet5619 opened this issue 3 months ago • 4 comments

What is the bug? Deprecation of OpenSearch DSL format has affected the behavior of the explain API https://github.com/opensearch-project/sql/issues/3280

Prior to OpenSearch 3.0 we could request DSL syntax using the following query:

POST /_plugins/_sql/_explain?format=json
{"query":"SELECT Shape, _id FROM railroads LIMIT 0,10000"}

result:

{
  "from": 0,
  "size": 10000,
  "_source": {
    "includes": [
      "Shape",
      "_id"
    ],
    "excludes": []
  }
}

With 3.0 we have noticed a couple of issues with our workflow.

  1. using format=json returns an error when used with explain API
{
  "error": {
    "reason": "Invalid SQL query",
    "details": "Failed to create executor due to unknown response format: json",
    "type": "IllegalArgumentException"
  },
  "status": 400
}
  1. if we dont use format=json then we run into issues with including a geoshape column
POST /_plugins/_sql/_explain
{"query":"SELECT Shape, _id FROM railroads LIMIT 0,10000"}
{
  "error": {
    "reason": "Invalid SQL query",
    "details": "can't resolve Symbol(namespace=FIELD_NAME, name=Shape) in type env",
    "type": "SemanticCheckException"
  },
  "status": 400
}

chet5619 avatar Sep 24 '25 18:09 chet5619

Hi @chet5619 , thanks for filing this issue.

Regarding to the format=json issue in _explain API

using format=json returns an error when used with explain API

{
  "error": {
    "reason": "Invalid SQL query",
    "details": "Failed to create executor due to unknown response format: json",
    "type": "IllegalArgumentException"
  },
  "status": 400
}

The behavior you're seeing is expected in OpenSearch 3.0. The format=json parameter was specifically for the query execution endpoint (/_plugins/_sql), not for the explain API. The explain API has always returned the DSL query plan without requiring a format parameter. Could you help us understand your use case better?

Regarding to the GeoShape field causing SemanticCheckException

This appears to be a separate bug. The error suggests that the SQL engine cannot resolve the Shape field (which is a geo_shape type). This is likely because:

  • GeoShape fields may not be fully supported in SQL SELECT statements
  • There may be a regression in field type resolution

Can you provide:

  • The mapping of your railroads index (specifically the Shape field definition)? (You can sanitize off some of the sensitive fields)
  • The OpenSearch version you upgraded from? Or like was this issue exist in 2.x?
  • Does the query work without the Shape field (e.g., SELECT _id FROM railroads LIMIT 10000)?

Regarding to the Explain API Deprecation

And yes in general, we do have a plan for do the enhancement of the explain API reference to issue https://github.com/opensearch-project/sql/issues/4351. However, that issue is about enhancing the API, not deprecating the DSL format. The DSL format is still the primary output of the explain API.

RyanL1997 avatar Oct 01 '25 02:10 RyanL1997

The behavior you're seeing is expected in OpenSearch 3.0. The format=json parameter was specifically for the query execution endpoint (/_plugins/_sql), not for the explain API. The explain API has always returned the DSL query plan without requiring a format parameter. Could you help us understand your use case better?

as i explained in the original comment the result of the following query:

POST /_plugins/_sql/_explain?format=json
{"query":"SELECT * FROM railroads LIMIT 0,10000"}

...was this - only the transalted DSL query:

{
  "from": 0,
  "size": 10000,
  "_source": {
    "includes": [ <array of fields>],
    "excludes": []
  }
}

but now since we cannot use the format=json we get a result like this:

{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[<array of fields>]"
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": "OpenSearchQueryRequest(indexName=railroads, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"_source\":{\"includes\":[\"array\",\"of\",\"fields\"],\"excludes\":[]}}, needClean=true, searchDone=false, pitId=null, cursorKeepAlive=null, searchAfter=null, searchResponse=null)"
        },
        "children": []
      }
    ]
  }
}

It still includes the json DSL, but we now have to parse the information from the result.

chet5619 avatar Oct 01 '25 13:10 chet5619

Can you provide:

  • The mapping of your railroads index (specifically the Shape field definition)? (You can sanitize off some of the sensitive fields)
"mappings": {
   "properties": {
      "OBJECTID": {
        "type": "long"
      },
      "Shape": {
        "type": "geo_shape"
      },
      "Shape_Length": {
        "type": "double"
      },
      "name": {
        "type": "keyword"
      }
   }
}
  • The OpenSearch version you upgraded from? Or like was this issue exist in 2.x?

The issue doesnt exist for OpenSearch 2.9.x

  • Does the query work without the Shape field (e.g., SELECT _id FROM railroads LIMIT 10000)?

Yes the query works if the geo_shape column is not included... and the query works if usig a geo_point mapping instead of geo_shape.

chet5619 avatar Oct 01 '25 13:10 chet5619

Recently tested OpenSearch 3.4 and we are still unable to use any 3.x release for our application. Issues that we are experiencing:

  1. JSON format is no longer supported for /_plugins/_sql/_explain this query:
    POST /_plugins/_sql/_explain?format=json
    {"query":"SELECT * FROM railroads LIMIT 0,10000"}
    
    returns an error
    {
      "error": {
        "reason": "Invalid SQL query",
        "details": "Failed to create executor due to unknown response format: json",
        "type": "IllegalArgumentException"
      },
      "status": 400
    }
    
    used to return before 3.x:
    {
    "from": 0,
    "size": 10000
    }
    
    without format=json we get a result that must be parsed before we can use it
    {
      "root": {
        "name": "ProjectOperator",
        "description": {
          "fields": "[OBJECTID, name, Shape_Length]"
        },
        "children": [
          {
            "name": "OpenSearchIndexScan",
            "description": {
              "request": "OpenSearchQueryRequest(indexName=railroads, sourceBuilder={\"from\":0,\"size\":10000,\"timeout\":\"1m\",\"_source\":{\"includes\":[\"Shape_Length\",\"OBJECTID\",\"name\"],\"excludes\":[]}}, searchDone=false)"
            },
            "children": []
          }
        ]
      }
    }
    
  2. geopoint fields are supported with /_plugins/_sql/_explain but geoshape columns are not. our application uses /_plugins/_sql/_explain?format=json to retrieve the DSL query and it would work for both geoshape and geojson, but now since format=json is gone, the _explain fails on geoshape fields

Until these issues can be resolved our application will be unable to support opensearch instances 3.x or higher

chet5619 avatar Dec 18 '25 15:12 chet5619