druid icon indicating copy to clipboard operation
druid copied to clipboard

Druid Lookups introspect keys and values endpoints do not return valid JSON

Open teyeheimans opened this issue 1 year ago • 6 comments

Description

While analyzing the Lookup features of druid, I noticed that the keys and values endpoints for lookups do not return valid JSON.

https://druid.apache.org/docs/latest/querying/lookups#introspect-a-lookup

Example response:

"[20416, 20404, 20415, 02F440, 02F461, 20420, 02F402, 02F480, 20408, 20409, 20410, 20412, 20402, 02F421, 02F420, 20601, 02F601, 02F620, VODAFONE, CLARO]

It seems that all keys or values are just joined with , and wrapped between two square brackets.

Finally, the documentation seems incorrect on this page: https://druid.apache.org/docs/latest/querying/lookups-cached-global/#introspection

It states:

Introspection to / returns the entire map. Introspection to /version returns the version indicator for the lookup.

However, /version does not seem to work and returns an 404.

Motivation

For as far as I know, all API endpoints return valid JSON. However, the introspect keys and values do not. This is incorrect in my opinion.

teyeheimans avatar Oct 16 '24 08:10 teyeheimans

Hi @teyeheimans, What type of lookup are you creating?

Map Lookup

  • With the following configuration,
{
  "type": "map",
  "map": {
    "1": "One",
    "2": "Two",
    "3": "Three"
  }
}

I do see the key-value pairs, keys and values correctly, and formatted as a JSON

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/    
{"1":"One","2":"Two","3":"Three"}

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/keys   
[1, 2, 3]        

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/values     
[One, Two, Three]

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/version
-- Does not return anything                                                                                                                                         
  • /version endpoint is not implemented in MapLookupIntrospectionHandler ; hence, we do not see the response.

cachedNamespace Lookup

  • With the following configuration
{
  "type": "cachedNamespace",
  "extractionNamespace": {
    "type": "uri",
    "uri": "file:/tmp/sampleCSV.csv",
    "namespaceParseSpec": {
      "format": "csv",
      "columns": [
        "key",
        "value"
      ],
      "skipHeaderRows": 1
    },
    "pollPeriod": "PT30S"
  },
  "firstCacheTimeout": 0
}

I see all the endpoints returning responses:

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/        
{"20":"Twenty","10":"Ten","30":"Thirty"}    
                                                                                                                                    
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/keys
["20","10","30"]

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/values
["Twenty","Ten","Thirty"]

$ $ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/version
{"version":"1729184323236"}
  • One caveat to call out here is /version endpoint does not return the version which was set manually when lookup was being created, but the epoch time. I see version as v1 on the Console, but 1729184323236 on the Introspect API response. image

Thanks!

ashwintumma23 avatar Oct 17 '24 18:10 ashwintumma23

I am using a map lookup, just like you. Your example shows the problem already:

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/    
{"1":"One","2":"Two","3":"Three"}

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/keys   
[1, 2, 3]        

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/values     
[One, Two, Three]

The values returned in your example is NOT valid JSON. The values are not quoted. The correct response would be:

["One", "Two", "Three"]

Also, to check if it is valid JSON you could use jq:

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/values | jq '.'

This also happens when the keys are strings. So the keys and values endpoints of the introspect API's are NOT returning valid JSON.

Finally, the version endpoint does not seem to work (indeed). However, it is documented that it should be there, so the documentation seems to be incorrect. See this page at the bottom: https://druid.apache.org/docs/latest/querying/lookups-cached-global/#introspection

teyeheimans avatar Oct 18 '24 07:10 teyeheimans

@teyeheimans, that does look like a bug. This is the relevant introspection code for map lookups: https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/query/lookup/MapLookupExtractorFactory.java#L156.

I think getValues() response should just be map.values() instead of map.values().toString(), which would result in a String representation on the underlying collection. The same would apply to getKeys(). If that sounds about right, please feel free to raise a PR.

abhishekrb19 avatar Oct 18 '24 08:10 abhishekrb19

Btw, you can directly query a map lookup in SQL: SELECT "k", "v" FROM "lookup"."mapLookup". This should return the keys and values in the correct string form. The Druid web-console uses SQL instead of API to introspect values when you open the lookup modal.

abhishekrb19 avatar Oct 18 '24 08:10 abhishekrb19

Hi @abhishekrb19,

For the /version endpoint:

Documentation-wise

  • It is indeed specified on the lookups-cached-global page, but, I think we should update the documentation to explicitly state that it is available only for the lookups of type cachedNamespace. I can create a PR for this item.

Functionality-wise

  • The introspection endpoint returns the internal version from CacheScheduler here https://github.com/apache/druid/blob/master/extensions-core/lookups-cached-global/src/main/java/org/apache/druid/query/lookup/NamespaceLookupIntrospectHandler.java#L78 ; but it is different from the version that gets specified when creating the lookup.
  • For instance, on the console, I see that lookup version is v3; but on the /version endpoint, it is shows: {"version":"1729180607928"}. [Attached screenshot below]. Currently, the API always returns the epoch value of the lookup creation time.
  • Similar behavior is observed when the lookup is created by specifying tier, lookup_name, version and lookupExtractorFactory via the API endpoint: /druid/coordinator/v1/lookups/config/
  • Impact: This might confuse the users, as to which version is the correct version of the lookup.
  • Behavior of other lookup endpoints: All other non-introspection endpoints work correctly by fetching the LookupExtractorFactoryMapContainer map that contains version and lookupExtractorFactory separately.
  • Example: https://druid.apache.org/docs/latest/api-reference/lookups-api/#get-lookup endpoint correctly fetches the version and lookupExtractorFactory from LookupExtractorFactoryMapContainer here: https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/http/LookupCoordinatorResource.java#L304
  • As far as I know, the Introspection class (https://github.com/apache/druid/blob/master/extensions-core/lookups-cached-global/src/main/java/org/apache/druid/query/lookup/NamespaceLookupIntrospectHandler.java) does not have access to the LookupExtractorFactoryMapContainer object, hence the discrepancy in the return values of the /version endpoint.
  • Discussion: What do you think can be done in this case?
  • [Attachment] Screenshot showing lookup version as v3: image

ashwintumma23 avatar Oct 18 '24 21:10 ashwintumma23

I agree on what you describe. However, I am not familiar with the java-side of druid. We have created an PHP client for druid, see https://github.com/level23/druid-client.

Recently I have integrated support for lookup management. There I found out that the response of the keys and values endpoints do not return valid JSON (at least for the MAP lookup). If I just use the introspect endpoint, it does give me valid JSON. So this is wrong and is the reason why I started this topic.

Also, I find it strange that it is not possible to specify for all different types of lookups if the data is injective or not. Also strange is that the same injective functionality is called oneToOne in the kafka lookup.

teyeheimans avatar Oct 19 '24 08:10 teyeheimans

Sorry for the delay. It looks like there are at least two separate issues here:

  1. The /introspect endpoint not returning valid JSON (this issue). The fix for this should be straightforward.

  2. A separate issue with the /version endpoint. It seems like @ashwintumma23 has identified a documentation gap, which could be worth addressing. I agree that the version returned in the /version endpoint is confusing—it currently returns the cache scheduler’s internal version rather than the user-facing version. We could clarify this unambiguously in the documentation to avoid any confusion: https://druid.apache.org/docs/latest/querying/lookups-cached-global/#introspection.

If we also want to address the discrepancy between the multiple versions, we could expose the user-facing version in addition to the cache scheduler’s version by adding a new field to the response map for compatibility: https://github.com/apache/druid/blob/master/extensions-core/lookups-cached-global/src/main/java/org/apache/druid/query/lookup/NamespaceLookupIntrospectHandler.java#L79. To retrieve the user-facing lookup version, we may need to access that information from LookupExtractorFactoryContainer.

If there's more discussion required for the second issue, I'd suggest creating a separate targeted issue so it's easier to track.

Please let me know if you'd like to take a stab at it.

abhishekrb19 avatar Oct 29 '24 17:10 abhishekrb19

Thanks for your response, @abhishekrb19! It does make sense to update the documentation to clear the ambiguity. Will create a PR for the fix and the documentation update, and log separate issue for the discrepancy in /version endpoint issue.

ashwintumma23 avatar Oct 29 '24 18:10 ashwintumma23