datahub icon indicating copy to clipboard operation
datahub copied to clipboard

fix(elasticsearch_index): create datahub_usage_event index where `datahub_analytics_enabled` set to `false`

Open GyuhoonK opened this issue 2 years ago • 9 comments

Checklist

  • [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [ ] Links to related issues (if applicable)
  • [ ] Tests for the changes have been added/updated (if applicable)
  • [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Setting global.datahub_analytics_enabled to false means not using the feature of DataHub Usage Analytics. However, If global.datahub_analytics_enabled is set to false, elasticsearch-setup job doesn't create any index. Not creating any index means you cannot use not only DataHub Usage Analytics but also Data Landscape Summary. This is because when you click Analytics tab, GMS check whether ElasticSearch has the index named datahub_usage_event(precisely data_stream index).

So, developers who want to turn off only DataHub Usage Analytics feature and use Data Landscape Summary have no choice. They should turn both of them off or turn them on.

If elasticsearch-setup job could create index(datahub_usage_event, but not data_stream index) where global.datahub_analytics_enabled is set to false, We can keep Data Landscape Summary alive.

Thank you.

GyuhoonK avatar Sep 18 '22 06:09 GyuhoonK

Unit Test Results (build & test)

584 tests  ±0   580 :heavy_check_mark: ±0   13m 18s :stopwatch: +25s 143 suites ±0       4 :zzz: ±0  143 files   ±0       0 :x: ±0 

Results for commit 19263ba2. ± Comparison against base commit 325b959e.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Sep 18 '22 07:09 github-actions[bot]

@jjoyce0510 Please chime in here for next steps

swaroopjagadish avatar Sep 20 '22 17:09 swaroopjagadish

Hello @GyuhoonK

Could you explain the reasoning behind the PR? Why would you want Data Landscape Summary but not DataHub Analytics?

pedro93 avatar Sep 20 '22 17:09 pedro93

@pedro93 ElasticSearch connected to DataHub cannot use data_stream feature in my case. X-pack is not installed into my ElasticSearch. So I have to turn off DataHub Analytics.

GyuhoonK avatar Sep 21 '22 10:09 GyuhoonK

Can you clarify what you mean by: ElasticSearch connected to DataHub cannot use data_stream feature in my case.

X-pack is not installed into my ElasticSearch. So I have to turn off DataHub Analytics. X-pack is a security and monitoring module for Elastic, why does not having it present mean you have to turn off DataHub's Analytics page?

pedro93 avatar Sep 21 '22 10:09 pedro93

Sorry, I make you confused. I thought ElasticSearch can use data_stream only if X-pack installed. data_stream is basic feature in current version(8.2) as you told. My ElasticSearch is 7.10 version, and this version doesn't include data_stream as basic feature. X-pack is needed(ElasticSearch Guide[7.10]).

GyuhoonK avatar Sep 21 '22 11:09 GyuhoonK

How does wanting to use Data Landscape Summary and not DataHub Analytics relate to wanting to use ElasticSearch's data_stream?

pedro93 avatar Sep 21 '22 14:09 pedro93

Acutally it is related to datahub_usage_event index. Data Landscape Summary and DataHub Analytics are included in Analytics Tab. It makes me to turn both of them off. I agree that if global.datahub_analytics_enabled is set to false, datahub_usage_event index is not created and DataHub Analytics is disabled. I think Data Landscape Summary is not related to global.datahub_analytics_enabled. It is just showing summary, not user's log. However, when I click Analytics in the situation where global.datahub_analytics_enabled is set to false, which means ElasticSearch doesn't have datahub_usage_event, web UI shows error log. image I want to turn off only DataHub Analytics. However, I cannot use Data Landscape Summary also. This is problem. And I found this error log from GMS pods.

│ java.lang.RuntimeException: Search query failed:                                                                                                                                                                                                                                                                    
│     at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:265)                                                                                                                                                                                                 
│     at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getTimeseriesChart(AnalyticsService.java:99)                                                                                                                                                                                                 
│     at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.getProductAnalyticsCharts(GetChartsResolver.java:77)                                                                                                                                                                                       
│     at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:50)                                                                                                                                                                                                             
│     at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:37)                                                                                                                                                                                                             
│     at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)                                                                                                                                                                                                                                   
│     at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)                                                                                                                                                                                                                         
│     at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)                                                                                                                                                                                                                             
│     at graphql.execution.Execution.executeOperation(Execution.java:165)                                                                                                                                                                                                                                             
│     at graphql.execution.Execution.execute(Execution.java:104)                                                                                                                                                                                                                                                      
│     at graphql.GraphQL.execute(GraphQL.java:557)                                                                                                                                                                                                                                                                    
│     at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)                                                                                                                                                                                                                                                    
│     at graphql.GraphQL.executeAsync(GraphQL.java:446)                                                                                                                                                                                                                                                               
│     at graphql.GraphQL.execute(GraphQL.java:377)                                                                                                                                                                                                                                                                    
│     at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:90)                                                                                                                                                                                                                                    
│     at com.datahub.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:94)                                                                                                                                                                                                                        
│     at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)                                                                                                                                                                                                                          
│     at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)                                                                                                                                                                                                                         
│     at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)                                                                                                                                                                                                                                              
│     at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)                                                                                                                                                                                                                                  
│     at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)                                                                                                                                                                                                                                          
│     at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)                                                                                                                                                                                                                                 
│ Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event]]                                                                                                                                                     
│     at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)                                                                                                                                                                                                                       
│     at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)                                                                                                                                                                                                                      
│     at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)                                                                                                                                                                                                           
│     at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)                                                                                                                                                                                                           
│     at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)                                                                                                                                                                                                                   
│     at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)                                                                                                                                                                                                     
│     at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)                                                                                                                                                                                                                           
│     at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:260)                                                                                                                                                                                                 
│     ... 21 common frames omitted                                                                                                                                                                                                                                                                                    
│     Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://elasticsearch-master:9200], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query  
│ Warnings: [Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.16/security-minimal-setup.html to enable security., [ignore_throttled] parameter is deprecated because froz  
│ {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahub_usage_event]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"}],"type":"index_not_found_exception","reason":"no such index [datahub_usage_even  
│         at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)                                                                                                                                                                                                                                 
│         at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)                                                                                                                                                                                                                                  
│         at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)                                                                                                                                                                                                                                  
│         at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)                                                                                                                                                                                                       
│         ... 25 common frames omitted                                                                                                                                                                                                                                                                                
│ 15:32:49.579 [ForkJoinPool.commonPool-worker-10] ERROR c.datahub.graphql.GraphQLController:98 - Errors while executing graphQL query: "query getAnalyticsCharts {\n  getAnalyticsCharts {\n    groupId\n    title\n    charts {\n      ...analyticsChart\n      __typename\n    }\n    __typename\n  }\n}\n\nfragm

It happens because ElasticSearch doesn't have datahub_usage_event index(bc elasticsearch-setup didn't create template). If ElasticSearch has datahub_usage_event index, I can see Data Landscape summary, so I suggest creating datahub_usage_event index (not created from template using data_stream, just index) where global.datahub_analytics_enabled set to false.

And data_stream is related to this question. Q. Why do you want not to use DataHub Analytics? A. I cannot use data_stream on my ElasticSearch, because its version is 7.10 and X-pack is not installed. I have no choice.

GyuhoonK avatar Sep 21 '22 15:09 GyuhoonK

So if I understand you correctly… you can not use DataHub Analytics because your Elastic cluster does not support x pack but still want to have Data Summary Landscape by changing the way the datahub_usage_event index works?

pedro93 avatar Sep 21 '22 17:09 pedro93

That is my interpretation. I think this PR makes sense. Landscape is indeed separate from usage tracking.

Going to approve!

jjoyce0510 avatar Sep 21 '22 21:09 jjoyce0510

@pedro93 Yes. Exactly what I mean.

GyuhoonK avatar Sep 22 '22 01:09 GyuhoonK

@shirshanka I add insecure mode. please check it!

GyuhoonK avatar Sep 23 '22 12:09 GyuhoonK

@pedro93 It didn't pass smoke test..

Unable to run quickstart - the following issues were detected:
- kafka-setup is still running

If you think something went wrong, please file an issue at https://github.com/datahub-project/datahub/issues
or send a message in our Slack https://slack.datahubproject.io/
Be sure to attach the logs from /tmp/tmpqri39__g.log
Error: Process completed with exit code 1.

is there any issue on kafka setup?

GyuhoonK avatar Sep 24 '22 06:09 GyuhoonK