datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Analytics tab fails with "Charts failed to load" and "An unknown error occurred"

Open wojtekwalczak opened this issue 3 years ago • 4 comments

Hi, I have just setup DataHub for a very first time and I am having some issues.

The first thing I noticed is: just after the pod for "datahub-elasticsearch-setup-job" finished execution, its log contained the following message:

datahub_usage_event_policy exists

creating datahub_usage_event_index_template
{
  "index_patterns": ["*datahub_usage_event*"],
  "data_stream": { },
  "priority": 500,
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "type": {
          "type": "keyword"
        },
        "timestamp": {
          "type": "date"
        },
        "userAgent": {
          "type": "keyword"
        },
        "browserId": {
          "type": "keyword"
        }
      }
    },
    "settings": {
      "index.lifecycle.name": "datahub_usage_event_policy"
    }
  }
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1295  100   775  100   520   3395   2278 --:--:-- --:--:-- --:--:--  5679
2022/08/26 15:48:04 Command finished successfully.
}{"error":{"root_cause":[{"type":"invalid_index_template_exception","reason":"index_template [datahub_usage_event_index_template] invalid, cause [Validation Failed: 1: unknown setting [index.lifecycle.name] please check that any required plugins are installed, or check the breaking changes documentation for removed settings;2: expected [index.lifecycle.name] to be private but it was not;]"}],"type":"invalid_index_template_exception","reason":"index_template [datahub_usage_event_index_template] invalid, cause [Validation Failed: 1: unknown setting [index.lifecycle.name] please check that any required plugins are installed, or check the breaking changes documentation for removed settings;2: expected [index.lifecycle.name] to be private but it was not;]"},"status":400}

Notice that the pod execution ended with success.

As I am totally new to DataHub, I don't know if it's safe to ignore this message. I worry that it will impair the system when I start ingesting data.

Also, when I try to open the Analytics tab, it displays errors in the form of "Charts failed to load" and "An unknown error occurred." and the gms log says:

16:00:42.402 [Thread-160] ERROR c.l.d.g.a.service.AnalyticsService:264 - Search query failed: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
16:00:42.403 [Thread-158] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher
java.lang.RuntimeException: Search query failed:
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:265)
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getTimeseriesChart(AnalyticsService.java:99)
        at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.getProductAnalyticsCharts(GetChartsResolver.java:77)
        at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:50)
        at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:37)
        at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
        at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
        at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
        at graphql.execution.Execution.executeOperation(Execution.java:165)
        at graphql.execution.Execution.execute(Execution.java:104)
        at graphql.GraphQL.execute(GraphQL.java:557)
        at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
        at graphql.GraphQL.executeAsync(GraphQL.java:446)
        at graphql.GraphQL.execute(GraphQL.java:377)
        at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:90)
        at com.datahub.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:94)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
        at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
        at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:260)
        ... 17 common frames omitted
        Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [MY-HOST:443], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"datahub_usage_event","node":"my_LArYFT729cetNwpLiVw","reason":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.","caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}},"status":400}
                at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
                at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
                at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
                at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
                ... 21 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
        at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
        at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179)
        ... 24 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
        at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
        ... 28 common frames omitted
16:00:42.403 [Thread-160] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher
java.lang.RuntimeException: Search query failed:
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:265)
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getHighlights(AnalyticsService.java:236)
        at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.getHighlights(GetHighlightsResolver.java:50)
        at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:29)
        at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:22)
        at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
        at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
        at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
        at graphql.execution.Execution.executeOperation(Execution.java:165)
        at graphql.execution.Execution.execute(Execution.java:104)
        at graphql.GraphQL.execute(GraphQL.java:557)
        at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
        at graphql.GraphQL.executeAsync(GraphQL.java:446)
        at graphql.GraphQL.execute(GraphQL.java:377)
        at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:90)
        at com.datahub.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:94)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
        at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
        at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:260)
        ... 17 common frames omitted
        Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [https://MY-HOST:443], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"datahub_usage_event","node":"my_LArYFT729cetNwpLiVw","reason":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.","caused_by":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}}},"status":400}
                at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
                at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
                at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
                at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
                ... 21 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
        at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
        at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179)
        ... 24 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
        at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
        ... 28 common frames omitted
16:00:42.404 [Thread-160] ERROR c.datahub.graphql.GraphQLController:98 - Errors while executing graphQL query: "query getHighlights {\n  getHighlights {\n    value\n    title\n    body\n    __typename\n  }\n}\n", result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[getHighlights], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}], data=null, extensions={tracing={version=1, startTime=2022-08-26T16:00:42.323Z, endTime=2022-08-26T16:00:42.403Z, duration=80287287, parsing={startOffset=252816, duration=221068}, validation={startOffset=380983, duration=97001}, execution={resolvers=[{path=[getHighlights], parentType=Query, returnType=[Highlight!]!, fieldName=getHighlights, startOffset=468743, duration=78851298}]}}}}, errors: [DataHubGraphQLError{path=[getHighlights], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}]
16:00:42.404 [Thread-158] ERROR c.datahub.graphql.GraphQLController:98 - Errors while executing graphQL query: "query getAnalyticsCharts {\n  getAnalyticsCharts {\n    groupId\n    title\n    charts {\n      ...analyticsChart\n      __typename\n    }\n    __typename\n  }\n}\n\nfragment analyticsChart on AnalyticsChart {\n  ... on TimeSeriesChart {\n    title\n    lines {\n      name\n      data {\n        x\n        y\n        __typename\n      }\n      __typename\n    }\n    dateRange {\n      start\n      end\n      __typename\n    }\n    interval\n    __typename\n  }\n  ... on BarChart {\n    title\n    bars {\n      name\n      segments {\n        label\n        value\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n  ... on TableChart {\n    title\n    columns\n    rows {\n      values\n      cells {\n        value\n        linkParams {\n          searchParams {\n            types\n            query\n            filters {\n              field\n              value\n              __typename\n            }\n            __typename\n          }\n          entityProfileParams {\n            urn\n            type\n            __typename\n          }\n          __typename\n        }\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n  __typename\n}\n", result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[getAnalyticsCharts], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}], data=null, extensions={tracing={version=1, startTime=2022-08-26T16:00:42.329Z, endTime=2022-08-26T16:00:42.403Z, duration=74142630, parsing={startOffset=598442, duration=569182}, validation={startOffset=1080652, duration=455337}, execution={resolvers=[{path=[getAnalyticsCharts], parentType=Query, returnType=[AnalyticsChartGroup!]!, fieldName=getAnalyticsCharts, startOffset=1193263, duration=72027568}]}}}}, errors: [DataHubGraphQLError{path=[getAnalyticsCharts], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}]
16:00:42.417 [I/O dispatcher 1] INFO  c.l.m.k.e.ElasticsearchConnector:41 - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
16:19:22.433 [pool-11-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 5ms
16:19:22.504 [I/O dispatcher 1] INFO  c.l.m.k.e.ElasticsearchConnector:41 - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
16:19:42.680 [pool-11-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 5ms
16:19:42.713 [I/O dispatcher 1] INFO  c.l.m.k.e.ElasticsearchConnector:41 - Successfully feeded bulk request. Number of events: 1 Took time ms: -1

I think that these two issues (errors during setup job and errors when accessing the "Analytics" tab) can be related.

Some details about my setup:

  • OpenSearch Service on AWS with OpenSearch_1_3_R20220715-P1
  • MySQL hosted on AWS Aurora
  • DataHub runs on EKS cluster with Kubernetes 1.21.
  • Access to the service is through AWS ALB private

Setup for prerequisites:

elasticsearch:
  enabled: false

neo4j:
  enabled: false

neo4j-community:
  enabled: false

mysql:
  enabled: false

cp-helm-charts:
  # Schema registry is under the community license
  cp-schema-registry:
    enabled: true
    kafka:
      bootstrapServers: "datahub-dependency-kafka:9092"
  cp-kafka:
    enabled: false
  cp-zookeeper:
    enabled: false
  cp-kafka-rest:
    enabled: false
  cp-kafka-connect:
    enabled: false
  cp-ksql-server:
    enabled: false
  cp-control-center:
    enabled: false

# Bitnami version of Kafka that deploys open source Kafka https://artifacthub.io/packages/helm/bitnami/kafka
kafka:
  enabled: true

Setup for DataHub:

# Values to start up datahub after starting up the datahub-prerequisites chart with "prerequisites" release name
# Copy this chart and change configuration as needed.
datahub-gms:
  enabled: true
  image:
    repository: linkedin/datahub-gms
    tag: "v0.8.43"
  service:
    type: ClusterIP

datahub-frontend:
  enabled: true
  image:
    repository: linkedin/datahub-frontend-react
    tag: "v0.8.43"
  service:
    type: NodePort
    port: 80
    targetPort: http
    protocol: TCP
    name: http
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"

  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: alb
      alb.ingress.kubernetes.io/scheme: internal
      alb.ingress.kubernetes.io/target-type: ip
      alb.ingress.kubernetes.io/subnets: <VPC SUBNETS>
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]'
    hosts:
      - paths:
          - /*

acryl-datahub-actions:
  enabled: true
  image:
    repository: acryldata/datahub-actions
    tag: "v0.0.4"
  resources:
    limits:
      memory: 512Mi
    requests:
      cpu: 300m
      memory: 256Mi

datahub-mae-consumer:
  image:
    repository: linkedin/datahub-mae-consumer
    tag: "v0.8.43"

datahub-mce-consumer:
  image:
    repository: linkedin/datahub-mce-consumer
    tag: "v0.8.43"

datahub-ingestion-cron:
  enabled: false
  image:
    repository: acryldata/datahub-ingestion
    tag: "v0.8.43"

elasticsearchSetupJob:
  enabled: true
  image:
    repository: linkedin/datahub-elasticsearch-setup
    tag: "v0.8.43"
  podSecurityContext:
    fsGroup: 1000
  securityContext:
    runAsUser: 1000
  podAnnotations: {}

kafkaSetupJob:
  enabled: true
  image:
    repository: linkedin/datahub-kafka-setup
    tag: "v0.8.43"
  podSecurityContext:
    fsGroup: 1000
  securityContext:
    runAsUser: 1000
  podAnnotations: {}

mysqlSetupJob:
  enabled: true
  image:
    repository: acryldata/datahub-mysql-setup
    tag: "v0.8.43"
  podSecurityContext:
    fsGroup: 1000
  securityContext:
    runAsUser: 1000
  podAnnotations: {}

postgresqlSetupJob:
  enabled: false
  image:
    repository: acryldata/datahub-postgres-setup
    tag: "v0.8.43"
  podSecurityContext:
    fsGroup: 1000
  securityContext:
    runAsUser: 1000
  podAnnotations: {}

datahubUpgrade:
  enabled: false
  image:
    repository: acryldata/datahub-upgrade
    tag: "v0.8.43"
  noCodeDataMigration:
    sqlDbType: "MYSQL"
  podSecurityContext: {}
    # fsGroup: 1000
  securityContext: {}
    # runAsUser: 1000
  podAnnotations: {}

global:
  graph_service_impl: elasticsearch
  datahub_analytics_enabled: true
  datahub_standalone_consumers_enabled: false

  elasticsearch:
    host: <MY-OPENSEARCH-HOST>
    port: 443
    useSSL: true
    auth:
      username: admin
      password:
        secretRef: elasticsearch-secrets
        secretKey: elasticsearch-root-password

  kafka:
    bootstrap:
      server: "datahub-dependency-kafka:9092"
    zookeeper:
      server: "datahub-dependency-zookeeper:2181"
    schemaregistry:
      url: "http://datahub-dependency-cp-schema-registry:8081"
      # type: AWS_GLUE
      # glue:
      #   region: us-east-1
      #   registry: datahub

  sql:
    datasource:
      host: "<MY-AURORA-HOST>:3306"
      hostForMysqlClient: "<MY-AURORA-HOST>
      port: "3306"
      url: "jdbc:mysql://<MY-AURORA-HOST>:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8&enabledTLSProtocols=TLSv1.2"
      driver: "com.mysql.cj.jdbc.Driver"
      username: "admin"
      password:
        secretRef: mysql-secrets
        secretKey: mysql-root-password

  datahub:
    gms:
      port: "8080"
      nodePort: "30001"
    mae_consumer:
      port: "9091"
      nodePort: "30002"
    appVersion: "1.0"

    encryptionKey:
      secretRef: "datahub-encryption-secrets"
      secretKey: "encryption_key_secret"
      # Set to false if you'd like to provide your own secret.
      provisionSecret: true

    managed_ingestion:
      enabled: true
      defaultCliVersion: "0.8.43"

    metadata_service_authentication:
      enabled: false
      systemClientId: "__datahub_system"
      systemClientSecret:
        secretRef: "datahub-auth-secrets"
        secretKey: "token_service_signing_key"
      tokenService:
        signingKey:
          secretRef: "datahub-auth-secrets"
          secretKey: "token_service_signing_key"
        salt:
          secretRef: "datahub-auth-secrets"
          secretKey: "token_service_salt"
      # Set to false if you'd like to provide your own auth secrets
      provisionSecrets: true

I would be greateful for any tips and hints, and even though I am having these issues, I must admit that as for so complex system, the installation process is very well automated, so: well done!

wojtekwalczak avatar Aug 26 '22 16:08 wojtekwalczak

I am also seeing same Issue

atul-chegg avatar Aug 29 '22 18:08 atul-chegg

I managed to fix the issue with errors in "Analytics" tab.

First, I managed to replicate the issue with a bit of Python code:

from opensearchpy import OpenSearch
host = '<YOUR_OPEN_SEARCH_HOST>'
port = 443
auth = ('admin', '<YOUR_ADMIN_PASSWORD>')
client = OpenSearch(
    hosts=[{'host': host, 'port': port}],
    http_compress=True,
    http_auth=auth,
    use_ssl=True
)
response = client.search(
    body={
        "aggs": {
            "by_browserId": {
                "terms": {
                    "field": "browserId",
                },
            }
        }
    },
    index='datahub_usage_event'
)
print(response)

It returned the exact same error as reported, and the error message was suggesting the answer:

Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]

The OpenSearch documentation is clear about it:

By default, OpenSearch doesn’t support aggregations on a text field. Because text fields are tokenized, an aggregation on a text field has to reverse the tokenization process back to its original string and then formulate an aggregation based on that. This kind of an operation consumes significant memory and degrades cluster performance.

While you can enable aggregations on text fields by setting the fielddata parameter to true in the mapping, the aggregations are still based on the tokenized words and not on the raw text.

So, I decided to turn fielddata to true for browserId in the following way:

curl -XPUT --insecure -u 'admin:<PASSWORD>' '<OPENSEARCH_HOST_AND_PORT>/datahub_usage_event/_mapping' -H 'Content-Type: application/json' -d '{ "properties": { "browserId": { "type": "text", "fielddata": true } } }'

The answer to it was:

{"acknowledged":true}

and after this action the "Analytics" tab started to work without errors.

The only question now is: do we want to fix it in DataHub, so it works for OpenSearch users out of the box?

wojtekwalczak avatar Aug 30 '22 14:08 wojtekwalczak

datahub-elasticsearch-setup-job worked for me.

  1. I deleted index from aws open search
atul.atri@C02FD3A3MD6M iac-datahub-db % curl -X DELETE "https://<opensearch-url>/datahub_usage_event?pretty"
{
  "acknowledged" : true
}% 
  1. Set the following in datahub helm chart values.yaml. And re install the datahub helm chart
elasticsearchSetupJob:
  extraEnvs:
    - name: USE_AWS_ELASTICSEARCH
      value: "true"

Although, analytics tab is still failing to load.

I also had set fielddata to true. and after that analytics tab started working

atul.atri@C02FD3A3MD6M iac-datahub % curl -XPUT 'https://<opensearch-url>/datahub_usage_event/_mapping' -H 'Content-Type: application/json' -d '{ "properties": { "browserId": { "type": "text", "fielddata": true } } }'
{"acknowledged":true}% 

atul-chegg avatar Aug 31 '22 08:08 atul-chegg

Hello @wojtekwalczak

Hopefully this PR: https://github.com/datahub-project/datahub/pull/5502 will solve the situation in the future. We will be merging it soon!

Related to https://github.com/datahub-project/datahub/issues/5376

pedro93 avatar Sep 19 '22 15:09 pedro93

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Oct 20 '22 02:10 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

github-actions[bot] avatar Nov 20 '22 02:11 github-actions[bot]