datahub
datahub copied to clipboard
"total" value does not match the actual number of objects in GraphQL query result
Describe the bug A clear and concise description of what the bug is.
To Reproduce Steps to reproduce the behavior:
- Go to 'https://datahub-instance/api/graphiql'
- Run next query
query listIngestionSources($input: ListIngestionSourcesInput!) {
listIngestionSources(input: $input) {
start
count
total
ingestionSources {
urn
name
schedule {
interval
timezone
}
platform {
name
}
type
config {
version
executorId
recipe
}
}
}
}
with variables:
{"input": {"start": 0, "count": 1000}}
- Check length of resulted collection - ingestionSources. Compare it with value from total
- In my case In my case, the total differs by 1 more than the actual number of elements. 808 in total, 807 - actual
Expected behavior total == actual
Screenshots
Desktop (please complete the following information):
- Version DataHub 0.12
+1
Also noticed mismatch in result with next query properties:
result = datahub_prod._get_ingestion_sources(start=0, count=753)
len(result['listIngestionSources']['ingestionSources'])
752
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
It is actual
@Stefs-2142 @YuriyGavrilov Your problem may be due to data inconsistency between elasticsearch and ebean. You need to compare the document count in datahubingestionsourceindex_v2 index with the result of this select:
select count(*)
from metadata_aspect_v2
where aspect = 'dataHubIngestionSourceKey' and version = 0;
If select count(*)
returns a smaller number, it will mean a few dataHubIngestionSource entities has been deleted from your database.
Thx 🙏 @trialiya
how can I fix it later if the values do not match?
@YuriyGavrilov There are two options:
- If you can identify the excess documents in datahubingestionsourceindex_v2, you can simply delete them, for example using
Delete API
. - !!!Before using this option, make sure you know how to use Restore Indices!!!
Clear datahubingestionsourceindex_v2 by Delete by query API
:
POST datahubingestionsourceindex_v2/_delete_by_query
{
"query": {
"match_all" : {}
}
}
and then use Restore Indices with this body:
{
"urnLike": "urn:li:dataHubIngestionSource:%",
"start": 0,
"batchSize": 10000
}
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
This issue was closed because it has been inactive for 30 days since being marked as stale.