datahub icon indicating copy to clipboard operation
datahub copied to clipboard

"total" value does not match the actual number of objects in GraphQL query result

Open Stefs-2142 opened this issue 1 year ago • 8 comments

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Go to 'https://datahub-instance/api/graphiql'
  2. Run next query
query listIngestionSources($input: ListIngestionSourcesInput!) {
                  listIngestionSources(input: $input) {
                    start
                    count
                    total
                    ingestionSources {
                      urn
                      name
                      schedule {
                        interval
                        timezone
                      }
                      platform {
                        name
                      }
                      type
                      config {
                        version
                        executorId
                        recipe
                      }
                    }
                  }
                }

with variables: {"input": {"start": 0, "count": 1000}}

  1. Check length of resulted collection - ingestionSources. Compare it with value from total
  2. In my case In my case, the total differs by 1 more than the actual number of elements. 808 in total, 807 - actual

Expected behavior total == actual

Screenshots ingestions

Desktop (please complete the following information):

  • Version DataHub 0.12

Stefs-2142 avatar Feb 05 '24 10:02 Stefs-2142

+1

YuriyGavrilov avatar Feb 05 '24 11:02 YuriyGavrilov

Also noticed mismatch in result with next query properties:

result = datahub_prod._get_ingestion_sources(start=0, count=753)
len(result['listIngestionSources']['ingestionSources'])
752

Stefs-2142 avatar Feb 05 '24 11:02 Stefs-2142

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Mar 07 '24 01:03 github-actions[bot]

It is actual

YuriyGavrilov avatar Mar 07 '24 07:03 YuriyGavrilov

@Stefs-2142 @YuriyGavrilov Your problem may be due to data inconsistency between elasticsearch and ebean. You need to compare the document count in datahubingestionsourceindex_v2 index with the result of this select:

select count(*)
from metadata_aspect_v2
where aspect = 'dataHubIngestionSourceKey' and version = 0;

If select count(*) returns a smaller number, it will mean a few dataHubIngestionSource entities has been deleted from your database.

trialiya avatar Mar 26 '24 06:03 trialiya

Thx 🙏 @trialiya

how can I fix it later if the values do not match?

YuriyGavrilov avatar Mar 26 '24 06:03 YuriyGavrilov

@YuriyGavrilov There are two options:

  1. If you can identify the excess documents in datahubingestionsourceindex_v2, you can simply delete them, for example using Delete API.
  2. !!!Before using this option, make sure you know how to use Restore Indices!!!

Clear datahubingestionsourceindex_v2 by Delete by query API:

POST datahubingestionsourceindex_v2/_delete_by_query
{
  "query": {
     "match_all" : {}
  }
}

and then use Restore Indices with this body:

{
	"urnLike": "urn:li:dataHubIngestionSource:%",
	"start": 0,
	"batchSize": 10000
}

trialiya avatar Mar 26 '24 09:03 trialiya

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Apr 26 '24 01:04 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

github-actions[bot] avatar May 26 '24 01:05 github-actions[bot]