Druid Lookups introspect keys and values endpoints do not return valid JSON
Description
While analyzing the Lookup features of druid, I noticed that the keys and values endpoints for lookups do not return valid JSON.
https://druid.apache.org/docs/latest/querying/lookups#introspect-a-lookup
Example response:
"[20416, 20404, 20415, 02F440, 02F461, 20420, 02F402, 02F480, 20408, 20409, 20410, 20412, 20402, 02F421, 02F420, 20601, 02F601, 02F620, VODAFONE, CLARO]
It seems that all keys or values are just joined with , and wrapped between two square brackets.
Finally, the documentation seems incorrect on this page: https://druid.apache.org/docs/latest/querying/lookups-cached-global/#introspection
It states:
Introspection to / returns the entire map. Introspection to /version returns the version indicator for the lookup.
However, /version does not seem to work and returns an 404.
Motivation
For as far as I know, all API endpoints return valid JSON. However, the introspect keys and values do not. This is incorrect in my opinion.
Hi @teyeheimans, What type of lookup are you creating?
Map Lookup
- With the following configuration,
{
"type": "map",
"map": {
"1": "One",
"2": "Two",
"3": "Three"
}
}
I do see the key-value pairs, keys and values correctly, and formatted as a JSON
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/
{"1":"One","2":"Two","3":"Three"}
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/keys
[1, 2, 3]
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/values
[One, Two, Three]
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/version
-- Does not return anything
/versionendpoint is not implemented inMapLookupIntrospectionHandler; hence, we do not see the response.
cachedNamespace Lookup
- With the following configuration
{
"type": "cachedNamespace",
"extractionNamespace": {
"type": "uri",
"uri": "file:/tmp/sampleCSV.csv",
"namespaceParseSpec": {
"format": "csv",
"columns": [
"key",
"value"
],
"skipHeaderRows": 1
},
"pollPeriod": "PT30S"
},
"firstCacheTimeout": 0
}
I see all the endpoints returning responses:
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/
{"20":"Twenty","10":"Ten","30":"Thirty"}
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/keys
["20","10","30"]
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/values
["Twenty","Ten","Thirty"]
$ $ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/version
{"version":"1729184323236"}
- One caveat to call out here is
/versionendpoint does not return the version which was set manually when lookup was being created, but the epoch time. I see version asv1on the Console, but1729184323236on the Introspect API response.
Thanks!
I am using a map lookup, just like you. Your example shows the problem already:
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/
{"1":"One","2":"Two","3":"Three"}
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/keys
[1, 2, 3]
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/values
[One, Two, Three]
The values returned in your example is NOT valid JSON. The values are not quoted. The correct response would be:
["One", "Two", "Three"]
Also, to check if it is valid JSON you could use jq:
$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/values | jq '.'
This also happens when the keys are strings. So the keys and values endpoints of the introspect API's are NOT returning valid JSON.
Finally, the version endpoint does not seem to work (indeed). However, it is documented that it should be there, so the documentation seems to be incorrect. See this page at the bottom: https://druid.apache.org/docs/latest/querying/lookups-cached-global/#introspection
@teyeheimans, that does look like a bug. This is the relevant introspection code for map lookups: https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/query/lookup/MapLookupExtractorFactory.java#L156.
I think getValues() response should just be map.values() instead of map.values().toString(), which would result in a String representation on the underlying collection. The same would apply to getKeys(). If that sounds about right, please feel free to raise a PR.
Btw, you can directly query a map lookup in SQL: SELECT "k", "v" FROM "lookup"."mapLookup". This should return the keys and values in the correct string form. The Druid web-console uses SQL instead of API to introspect values when you open the lookup modal.
Hi @abhishekrb19,
For the /version endpoint:
Documentation-wise
- It is indeed specified on the lookups-cached-global page, but, I think we should update the documentation to explicitly state that it is available only for the lookups of type
cachedNamespace. I can create a PR for this item.
Functionality-wise
- The introspection endpoint returns the internal version from CacheScheduler here https://github.com/apache/druid/blob/master/extensions-core/lookups-cached-global/src/main/java/org/apache/druid/query/lookup/NamespaceLookupIntrospectHandler.java#L78 ; but it is different from the version that gets specified when creating the lookup.
- For instance, on the console, I see that lookup version is
v3; but on the/versionendpoint, it is shows:{"version":"1729180607928"}. [Attached screenshot below]. Currently, the API always returns the epoch value of the lookup creation time. - Similar behavior is observed when the lookup is created by specifying
tier,lookup_name,versionandlookupExtractorFactoryvia the API endpoint:/druid/coordinator/v1/lookups/config/ - Impact: This might confuse the users, as to which version is the correct version of the lookup.
- Behavior of other lookup endpoints: All other non-introspection endpoints work correctly by fetching the
LookupExtractorFactoryMapContainermap that containsversionandlookupExtractorFactoryseparately. - Example: https://druid.apache.org/docs/latest/api-reference/lookups-api/#get-lookup endpoint correctly fetches the
versionandlookupExtractorFactoryfromLookupExtractorFactoryMapContainerhere: https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/http/LookupCoordinatorResource.java#L304 - As far as I know, the Introspection class (https://github.com/apache/druid/blob/master/extensions-core/lookups-cached-global/src/main/java/org/apache/druid/query/lookup/NamespaceLookupIntrospectHandler.java) does not have access to the
LookupExtractorFactoryMapContainerobject, hence the discrepancy in the return values of the/versionendpoint. - Discussion: What do you think can be done in this case?
- [Attachment] Screenshot showing lookup version as
v3:
I agree on what you describe. However, I am not familiar with the java-side of druid. We have created an PHP client for druid, see https://github.com/level23/druid-client.
Recently I have integrated support for lookup management. There I found out that the response of the keys and values endpoints do not return valid JSON (at least for the MAP lookup). If I just use the introspect endpoint, it does give me valid JSON. So this is wrong and is the reason why I started this topic.
Also, I find it strange that it is not possible to specify for all different types of lookups if the data is injective or not. Also strange is that the same injective functionality is called oneToOne in the kafka lookup.
Sorry for the delay. It looks like there are at least two separate issues here:
-
The
/introspectendpoint not returning valid JSON (this issue). The fix for this should be straightforward. -
A separate issue with the
/versionendpoint. It seems like @ashwintumma23 has identified a documentation gap, which could be worth addressing. I agree that the version returned in the/versionendpoint is confusing—it currently returns the cache scheduler’s internal version rather than the user-facing version. We could clarify this unambiguously in the documentation to avoid any confusion: https://druid.apache.org/docs/latest/querying/lookups-cached-global/#introspection.
If we also want to address the discrepancy between the multiple versions, we could expose the user-facing version in addition to the cache scheduler’s version by adding a new field to the response map for compatibility: https://github.com/apache/druid/blob/master/extensions-core/lookups-cached-global/src/main/java/org/apache/druid/query/lookup/NamespaceLookupIntrospectHandler.java#L79. To retrieve the user-facing lookup version, we may need to access that information from LookupExtractorFactoryContainer.
If there's more discussion required for the second issue, I'd suggest creating a separate targeted issue so it's easier to track.
Please let me know if you'd like to take a stab at it.
Thanks for your response, @abhishekrb19! It does make sense to update the documentation to clear the ambiguity. Will create a PR for the fix and the documentation update, and log separate issue for the discrepancy in /version endpoint issue.