datahub
datahub copied to clipboard
Analytics tab fails with "Charts failed to load" and "An unknown error occurred"
Hi, I have just setup DataHub for a very first time and I am having some issues.
The first thing I noticed is: just after the pod for "datahub-elasticsearch-setup-job" finished execution, its log contained the following message:
datahub_usage_event_policy exists
creating datahub_usage_event_index_template
{
"index_patterns": ["*datahub_usage_event*"],
"data_stream": { },
"priority": 500,
"template": {
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"type": {
"type": "keyword"
},
"timestamp": {
"type": "date"
},
"userAgent": {
"type": "keyword"
},
"browserId": {
"type": "keyword"
}
}
},
"settings": {
"index.lifecycle.name": "datahub_usage_event_policy"
}
}
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1295 100 775 100 520 3395 2278 --:--:-- --:--:-- --:--:-- 5679
2022/08/26 15:48:04 Command finished successfully.
}{"error":{"root_cause":[{"type":"invalid_index_template_exception","reason":"index_template [datahub_usage_event_index_template] invalid, cause [Validation Failed: 1: unknown setting [index.lifecycle.name] please check that any required plugins are installed, or check the breaking changes documentation for removed settings;2: expected [index.lifecycle.name] to be private but it was not;]"}],"type":"invalid_index_template_exception","reason":"index_template [datahub_usage_event_index_template] invalid, cause [Validation Failed: 1: unknown setting [index.lifecycle.name] please check that any required plugins are installed, or check the breaking changes documentation for removed settings;2: expected [index.lifecycle.name] to be private but it was not;]"},"status":400}
Notice that the pod execution ended with success.
As I am totally new to DataHub, I don't know if it's safe to ignore this message. I worry that it will impair the system when I start ingesting data.
Also, when I try to open the Analytics tab, it displays errors in the form of "Charts failed to load" and "An unknown error occurred." and the gms log says:
16:00:42.402 [Thread-160] ERROR c.l.d.g.a.service.AnalyticsService:264 - Search query failed: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
16:00:42.403 [Thread-158] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher
java.lang.RuntimeException: Search query failed:
at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:265)
at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getTimeseriesChart(AnalyticsService.java:99)
at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.getProductAnalyticsCharts(GetChartsResolver.java:77)
at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:50)
at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:37)
at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
at graphql.execution.Execution.executeOperation(Execution.java:165)
at graphql.execution.Execution.execute(Execution.java:104)
at graphql.GraphQL.execute(GraphQL.java:557)
at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
at graphql.GraphQL.executeAsync(GraphQL.java:446)
at graphql.GraphQL.execute(GraphQL.java:377)
at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:90)
at com.datahub.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:94)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:260)
... 17 common frames omitted
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [MY-HOST:443], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"datahub_usage_event","node":"my_LArYFT729cetNwpLiVw","reason":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.","caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}},"status":400}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
... 21 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179)
... 24 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
... 28 common frames omitted
16:00:42.403 [Thread-160] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher
java.lang.RuntimeException: Search query failed:
at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:265)
at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getHighlights(AnalyticsService.java:236)
at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.getHighlights(GetHighlightsResolver.java:50)
at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:29)
at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:22)
at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
at graphql.execution.Execution.executeOperation(Execution.java:165)
at graphql.execution.Execution.execute(Execution.java:104)
at graphql.GraphQL.execute(GraphQL.java:557)
at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
at graphql.GraphQL.executeAsync(GraphQL.java:446)
at graphql.GraphQL.execute(GraphQL.java:377)
at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:90)
at com.datahub.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:94)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:260)
... 17 common frames omitted
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [https://MY-HOST:443], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"datahub_usage_event","node":"my_LArYFT729cetNwpLiVw","reason":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.","caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}},"status":400}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
... 21 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179)
... 24 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
... 28 common frames omitted
16:00:42.404 [Thread-160] ERROR c.datahub.graphql.GraphQLController:98 - Errors while executing graphQL query: "query getHighlights {\n getHighlights {\n value\n title\n body\n __typename\n }\n}\n", result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[getHighlights], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}], data=null, extensions={tracing={version=1, startTime=2022-08-26T16:00:42.323Z, endTime=2022-08-26T16:00:42.403Z, duration=80287287, parsing={startOffset=252816, duration=221068}, validation={startOffset=380983, duration=97001}, execution={resolvers=[{path=[getHighlights], parentType=Query, returnType=[Highlight!]!, fieldName=getHighlights, startOffset=468743, duration=78851298}]}}}}, errors: [DataHubGraphQLError{path=[getHighlights], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}]
16:00:42.404 [Thread-158] ERROR c.datahub.graphql.GraphQLController:98 - Errors while executing graphQL query: "query getAnalyticsCharts {\n getAnalyticsCharts {\n groupId\n title\n charts {\n ...analyticsChart\n __typename\n }\n __typename\n }\n}\n\nfragment analyticsChart on AnalyticsChart {\n ... on TimeSeriesChart {\n title\n lines {\n name\n data {\n x\n y\n __typename\n }\n __typename\n }\n dateRange {\n start\n end\n __typename\n }\n interval\n __typename\n }\n ... on BarChart {\n title\n bars {\n name\n segments {\n label\n value\n __typename\n }\n __typename\n }\n __typename\n }\n ... on TableChart {\n title\n columns\n rows {\n values\n cells {\n value\n linkParams {\n searchParams {\n types\n query\n filters {\n field\n value\n __typename\n }\n __typename\n }\n entityProfileParams {\n urn\n type\n __typename\n }\n __typename\n }\n __typename\n }\n __typename\n }\n __typename\n }\n __typename\n}\n", result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[getAnalyticsCharts], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}], data=null, extensions={tracing={version=1, startTime=2022-08-26T16:00:42.329Z, endTime=2022-08-26T16:00:42.403Z, duration=74142630, parsing={startOffset=598442, duration=569182}, validation={startOffset=1080652, duration=455337}, execution={resolvers=[{path=[getAnalyticsCharts], parentType=Query, returnType=[AnalyticsChartGroup!]!, fieldName=getAnalyticsCharts, startOffset=1193263, duration=72027568}]}}}}, errors: [DataHubGraphQLError{path=[getAnalyticsCharts], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}]
16:00:42.417 [I/O dispatcher 1] INFO c.l.m.k.e.ElasticsearchConnector:41 - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
16:19:22.433 [pool-11-thread-1] INFO c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 5ms
16:19:22.504 [I/O dispatcher 1] INFO c.l.m.k.e.ElasticsearchConnector:41 - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
16:19:42.680 [pool-11-thread-1] INFO c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 5ms
16:19:42.713 [I/O dispatcher 1] INFO c.l.m.k.e.ElasticsearchConnector:41 - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
I think that these two issues (errors during setup job and errors when accessing the "Analytics" tab) can be related.
Some details about my setup:
- OpenSearch Service on AWS with OpenSearch_1_3_R20220715-P1
- MySQL hosted on AWS Aurora
- DataHub runs on EKS cluster with Kubernetes 1.21.
- Access to the service is through AWS ALB private
Setup for prerequisites:
elasticsearch:
enabled: false
neo4j:
enabled: false
neo4j-community:
enabled: false
mysql:
enabled: false
cp-helm-charts:
# Schema registry is under the community license
cp-schema-registry:
enabled: true
kafka:
bootstrapServers: "datahub-dependency-kafka:9092"
cp-kafka:
enabled: false
cp-zookeeper:
enabled: false
cp-kafka-rest:
enabled: false
cp-kafka-connect:
enabled: false
cp-ksql-server:
enabled: false
cp-control-center:
enabled: false
# Bitnami version of Kafka that deploys open source Kafka https://artifacthub.io/packages/helm/bitnami/kafka
kafka:
enabled: true
Setup for DataHub:
# Values to start up datahub after starting up the datahub-prerequisites chart with "prerequisites" release name
# Copy this chart and change configuration as needed.
datahub-gms:
enabled: true
image:
repository: linkedin/datahub-gms
tag: "v0.8.43"
service:
type: ClusterIP
datahub-frontend:
enabled: true
image:
repository: linkedin/datahub-frontend-react
tag: "v0.8.43"
service:
type: NodePort
port: 80
targetPort: http
protocol: TCP
name: http
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/subnets: <VPC SUBNETS>
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]'
hosts:
- paths:
- /*
acryl-datahub-actions:
enabled: true
image:
repository: acryldata/datahub-actions
tag: "v0.0.4"
resources:
limits:
memory: 512Mi
requests:
cpu: 300m
memory: 256Mi
datahub-mae-consumer:
image:
repository: linkedin/datahub-mae-consumer
tag: "v0.8.43"
datahub-mce-consumer:
image:
repository: linkedin/datahub-mce-consumer
tag: "v0.8.43"
datahub-ingestion-cron:
enabled: false
image:
repository: acryldata/datahub-ingestion
tag: "v0.8.43"
elasticsearchSetupJob:
enabled: true
image:
repository: linkedin/datahub-elasticsearch-setup
tag: "v0.8.43"
podSecurityContext:
fsGroup: 1000
securityContext:
runAsUser: 1000
podAnnotations: {}
kafkaSetupJob:
enabled: true
image:
repository: linkedin/datahub-kafka-setup
tag: "v0.8.43"
podSecurityContext:
fsGroup: 1000
securityContext:
runAsUser: 1000
podAnnotations: {}
mysqlSetupJob:
enabled: true
image:
repository: acryldata/datahub-mysql-setup
tag: "v0.8.43"
podSecurityContext:
fsGroup: 1000
securityContext:
runAsUser: 1000
podAnnotations: {}
postgresqlSetupJob:
enabled: false
image:
repository: acryldata/datahub-postgres-setup
tag: "v0.8.43"
podSecurityContext:
fsGroup: 1000
securityContext:
runAsUser: 1000
podAnnotations: {}
datahubUpgrade:
enabled: false
image:
repository: acryldata/datahub-upgrade
tag: "v0.8.43"
noCodeDataMigration:
sqlDbType: "MYSQL"
podSecurityContext: {}
# fsGroup: 1000
securityContext: {}
# runAsUser: 1000
podAnnotations: {}
global:
graph_service_impl: elasticsearch
datahub_analytics_enabled: true
datahub_standalone_consumers_enabled: false
elasticsearch:
host: <MY-OPENSEARCH-HOST>
port: 443
useSSL: true
auth:
username: admin
password:
secretRef: elasticsearch-secrets
secretKey: elasticsearch-root-password
kafka:
bootstrap:
server: "datahub-dependency-kafka:9092"
zookeeper:
server: "datahub-dependency-zookeeper:2181"
schemaregistry:
url: "http://datahub-dependency-cp-schema-registry:8081"
# type: AWS_GLUE
# glue:
# region: us-east-1
# registry: datahub
sql:
datasource:
host: "<MY-AURORA-HOST>:3306"
hostForMysqlClient: "<MY-AURORA-HOST>
port: "3306"
url: "jdbc:mysql://<MY-AURORA-HOST>:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8&enabledTLSProtocols=TLSv1.2"
driver: "com.mysql.cj.jdbc.Driver"
username: "admin"
password:
secretRef: mysql-secrets
secretKey: mysql-root-password
datahub:
gms:
port: "8080"
nodePort: "30001"
mae_consumer:
port: "9091"
nodePort: "30002"
appVersion: "1.0"
encryptionKey:
secretRef: "datahub-encryption-secrets"
secretKey: "encryption_key_secret"
# Set to false if you'd like to provide your own secret.
provisionSecret: true
managed_ingestion:
enabled: true
defaultCliVersion: "0.8.43"
metadata_service_authentication:
enabled: false
systemClientId: "__datahub_system"
systemClientSecret:
secretRef: "datahub-auth-secrets"
secretKey: "token_service_signing_key"
tokenService:
signingKey:
secretRef: "datahub-auth-secrets"
secretKey: "token_service_signing_key"
salt:
secretRef: "datahub-auth-secrets"
secretKey: "token_service_salt"
# Set to false if you'd like to provide your own auth secrets
provisionSecrets: true
I would be greateful for any tips and hints, and even though I am having these issues, I must admit that as for so complex system, the installation process is very well automated, so: well done!
I am also seeing same Issue
I managed to fix the issue with errors in "Analytics" tab.
First, I managed to replicate the issue with a bit of Python code:
from opensearchpy import OpenSearch
host = '<YOUR_OPEN_SEARCH_HOST>'
port = 443
auth = ('admin', '<YOUR_ADMIN_PASSWORD>')
client = OpenSearch(
hosts=[{'host': host, 'port': port}],
http_compress=True,
http_auth=auth,
use_ssl=True
)
response = client.search(
body={
"aggs": {
"by_browserId": {
"terms": {
"field": "browserId",
},
}
}
},
index='datahub_usage_event'
)
print(response)
It returned the exact same error as reported, and the error message was suggesting the answer:
Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
The OpenSearch documentation is clear about it:
By default, OpenSearch doesn’t support aggregations on a text field. Because text fields are tokenized, an aggregation on a text field has to reverse the tokenization process back to its original string and then formulate an aggregation based on that. This kind of an operation consumes significant memory and degrades cluster performance.
While you can enable aggregations on text fields by setting the fielddata parameter to true in the mapping, the aggregations are still based on the tokenized words and not on the raw text.
So, I decided to turn fielddata to true for browserId in the following way:
curl -XPUT --insecure -u 'admin:<PASSWORD>' '<OPENSEARCH_HOST_AND_PORT>/datahub_usage_event/_mapping' -H 'Content-Type: application/json' -d '{ "properties": { "browserId": { "type": "text", "fielddata": true } } }'
The answer to it was:
{"acknowledged":true}
and after this action the "Analytics" tab started to work without errors.
The only question now is: do we want to fix it in DataHub, so it works for OpenSearch users out of the box?
datahub-elasticsearch-setup-job worked for me.
- I deleted index from aws open search
atul.atri@C02FD3A3MD6M iac-datahub-db % curl -X DELETE "https://<opensearch-url>/datahub_usage_event?pretty"
{
"acknowledged" : true
}%
- Set the following in datahub helm chart values.yaml. And re install the datahub helm chart
elasticsearchSetupJob:
extraEnvs:
- name: USE_AWS_ELASTICSEARCH
value: "true"
Although, analytics tab is still failing to load.
I also had set fielddata to true. and after that analytics tab started working
atul.atri@C02FD3A3MD6M iac-datahub % curl -XPUT 'https://<opensearch-url>/datahub_usage_event/_mapping' -H 'Content-Type: application/json' -d '{ "properties": { "browserId": { "type": "text", "fielddata": true } } }'
{"acknowledged":true}%
Hello @wojtekwalczak
Hopefully this PR: https://github.com/datahub-project/datahub/pull/5502 will solve the situation in the future. We will be merging it soon!
Related to https://github.com/datahub-project/datahub/issues/5376
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
This issue was closed because it has been inactive for 30 days since being marked as stale.