datahub
datahub copied to clipboard
fix(elasticsearch_index): create datahub_usage_event index where `datahub_analytics_enabled` set to `false`
Checklist
- [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [ ] Links to related issues (if applicable)
- [ ] Tests for the changes have been added/updated (if applicable)
- [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
Setting global.datahub_analytics_enabled
to false
means not using the feature of DataHub Usage Analytics
. However, If global.datahub_analytics_enabled
is set to false
, elasticsearch-setup
job doesn't create any index. Not creating any index means you cannot use not only DataHub Usage Analytics
but also Data Landscape Summary
. This is because when you click Analytics
tab, GMS check whether
ElasticSearch has the index named datahub_usage_event
(precisely data_stream index).
So, developers who want to turn off only DataHub Usage Analytics
feature and use Data Landscape Summary
have no choice. They should turn both of them off or turn them on.
If elasticsearch-setup
job could create index(datahub_usage_event
, but not data_stream index) where global.datahub_analytics_enabled
is set to false
, We can keep Data Landscape Summary
alive.
Thank you.
Unit Test Results (build & test)
584 tests ±0 580 :heavy_check_mark: ±0 13m 18s :stopwatch: +25s 143 suites ±0 4 :zzz: ±0 143 files ±0 0 :x: ±0
Results for commit 19263ba2. ± Comparison against base commit 325b959e.
:recycle: This comment has been updated with latest results.
@jjoyce0510 Please chime in here for next steps
Hello @GyuhoonK
Could you explain the reasoning behind the PR? Why would you want Data Landscape Summary but not DataHub Analytics?
@pedro93 ElasticSearch connected to DataHub cannot use data_stream feature in my case. X-pack is not installed into my ElasticSearch. So I have to turn off DataHub Analytics.
Can you clarify what you mean by:
ElasticSearch connected to DataHub cannot use data_stream feature in my case.
X-pack is not installed into my ElasticSearch. So I have to turn off DataHub Analytics.
X-pack is a security and monitoring module for Elastic, why does not having it present mean you have to turn off DataHub's Analytics page?
Sorry, I make you confused. I thought ElasticSearch can use data_stream only if X-pack installed. data_stream is basic feature in current version(8.2) as you told. My ElasticSearch is 7.10 version, and this version doesn't include data_stream as basic feature. X-pack is needed(ElasticSearch Guide[7.10]).
How does wanting to use Data Landscape Summary and not DataHub Analytics relate to wanting to use ElasticSearch's data_stream
?
Acutally it is related to datahub_usage_event
index.
Data Landscape Summary and DataHub Analytics are included in Analytics Tab. It makes me to turn both of them off.
I agree that if global.datahub_analytics_enabled
is set to false
, datahub_usage_event
index is not created and DataHub Analytics is disabled.
I think Data Landscape Summary is not related to global.datahub_analytics_enabled
. It is just showing summary, not user's log.
However, when I click Analytics in the situation where global.datahub_analytics_enabled
is set to false
, which means ElasticSearch doesn't have datahub_usage_event
, web UI shows error log.
I want to turn off only DataHub Analytics. However, I cannot use Data Landscape Summary also. This is problem.
And I found this error log from GMS pods.
│ java.lang.RuntimeException: Search query failed:
│ at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:265)
│ at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getTimeseriesChart(AnalyticsService.java:99)
│ at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.getProductAnalyticsCharts(GetChartsResolver.java:77)
│ at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:50)
│ at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:37)
│ at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
│ at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
│ at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
│ at graphql.execution.Execution.executeOperation(Execution.java:165)
│ at graphql.execution.Execution.execute(Execution.java:104)
│ at graphql.GraphQL.execute(GraphQL.java:557)
│ at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
│ at graphql.GraphQL.executeAsync(GraphQL.java:446)
│ at graphql.GraphQL.execute(GraphQL.java:377)
│ at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:90)
│ at com.datahub.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:94)
│ at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
│ at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)
│ at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
│ at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
│ at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
│ at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
│ Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event]]
│ at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
│ at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
│ at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
│ at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
│ at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
│ at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
│ at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
│ at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:260)
│ ... 21 common frames omitted
│ Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://elasticsearch-master:9200], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query
│ Warnings: [Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.16/security-minimal-setup.html to enable security., [ignore_throttled] parameter is deprecated because froz
│ {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahub_usage_event]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"}],"type":"index_not_found_exception","reason":"no such index [datahub_usage_even
│ at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
│ at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
│ at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
│ at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
│ ... 25 common frames omitted
│ 15:32:49.579 [ForkJoinPool.commonPool-worker-10] ERROR c.datahub.graphql.GraphQLController:98 - Errors while executing graphQL query: "query getAnalyticsCharts {\n getAnalyticsCharts {\n groupId\n title\n charts {\n ...analyticsChart\n __typename\n }\n __typename\n }\n}\n\nfragm
It happens because ElasticSearch doesn't have datahub_usage_event
index(bc elasticsearch-setup
didn't create template). If ElasticSearch has datahub_usage_event
index, I can see Data Landscape summary, so I suggest creating datahub_usage_event
index (not created from template using data_stream
, just index) where global.datahub_analytics_enabled
set to false
.
And data_stream
is related to this question.
Q. Why do you want not to use DataHub Analytics?
A. I cannot use data_stream
on my ElasticSearch, because its version is 7.10 and X-pack is not installed. I have no choice.
So if I understand you correctly… you can not use DataHub Analytics because your Elastic cluster does not support x pack but still want to have Data Summary Landscape by changing the way the datahub_usage_event index works?
That is my interpretation. I think this PR makes sense. Landscape is indeed separate from usage tracking.
Going to approve!
@pedro93 Yes. Exactly what I mean.
@shirshanka I add insecure mode. please check it!
@pedro93 It didn't pass smoke test..
Unable to run quickstart - the following issues were detected:
- kafka-setup is still running
If you think something went wrong, please file an issue at https://github.com/datahub-project/datahub/issues
or send a message in our Slack https://slack.datahubproject.io/
Be sure to attach the logs from /tmp/tmpqri39__g.log
Error: Process completed with exit code 1.
is there any issue on kafka setup?