dgraph icon indicating copy to clipboard operation
dgraph copied to clipboard

fix(ApolloFederation): '_entity' query now returns correct result for nonexistent entities

Open missbach opened this issue 2 years ago • 15 comments

Problem

The previous implementation of the '_entity' query resolver didn't properly follow the Apollo Federation Subgraph Specification (neither v2, nor v1, for that matter).

In particular, after receiving the intermediate results from the executor, it would:

  1. Sort the entity representations, i.e. the query arguments, by their keys in ascending order and discard any duplicates, so as to match the order in which the results are presumably returned by the executor. (However not properly handling hex ids, it seems. See #8120.)
  2. Check if we have the same number of results as unique entity representations and abort if not, assuming that the query would lead to an error at the Apollo gateway anyway. This assumption is false.
  3. Match the results to their entity representations and generate a properly ordered result list, with duplicates if so requested.

There are few issues with that, but the primary problem that I encountered was with handling entities that don't exist in Dgraph.

As an example, imagine two subgraphs, one being ultimately responsible for storing products, the other one, implemented via Dgraph, storing an additional string field on these products. It's quite possible, and explicitly allowed by the spec, that there could be a product that Dgraph doesn't know about. In this case, Dgraph should be returning a list with null in place of the missing elements.

Since the previous implementation didn't do that and instead would just return a shorter list, the results would be merged wrongly by the Apollo gateway, offset by the missing elements – e.g. the additional field for product A would get merged onto product B. That's quite catastrophic for my application.

This PR also fixes #8120.

Solution

For context, let me give you a quick overview of how I arrived at the current solution:

First I tried to fix the function by skipping the key sorting step and instead building the result list by creating a hash map of the results by their keys and then iterating over the entity representations argument and mapping them to their respective entities. This seemed to fix all issues.

However, it quickly fell apart when tested with the Apollo gateway, because it doesn't work when the key itself is not included in the selection set, and that's precisely what the gateway does. For example, when requesting a product with id=3, it wouldn't include the id in the selection set again.

The solution then is, of course, rather simple: Just add the entity key itself to the selection set. However, implementing it wasn't as straightforward – that's why there are so many changes.

I also tried cleaning up the code as I was touching it, according to the project contribution guidelines. For example, I started moving all the code relating to Federation queries to a new file.

Remarks

I implemented these changes about 1-2 months ago, and have been using them so far successfully. Before proceeding with the PR and spending more time on it, I wanted to check with you guys if this is the right direction and if it has any chance of getting merged soon (rebasing took some time today).

There are of course still some tasks left, namely adding regression tests, addressing "TODO WIP" comments and doing some final cleanup.

And lastly, this is my first time working with Golang, so please be lenient :) Feedback would definitely be appreciated!

missbach avatar Feb 01 '23 10:02 missbach

Hello, Dgraph team!

Have you seen my PR? Could you please let me know if have any plans to merge it?

Thank you for your time!

missbach avatar Mar 06 '23 09:03 missbach

Hi @missbach. Thank you for the PR, please provide us with a few more weeks. We are busy with our upcoming release and should get to this PR as soon as we can.

mangalaman93 avatar Mar 18 '23 16:03 mangalaman93

Hi @mangalaman93, thank you for your reply!

Alright, I'll ping you after the release (if that's okay with you).

Good luck with the upcoming release :)

missbach avatar Mar 20 '23 11:03 missbach

Any updates? :)

missbach avatar Jun 12 '23 09:06 missbach

I will review it as soon as I can now. We should be able to merge this change since v23 is out as long as it is not a breaking change. Give me a few more days. Thanks

mangalaman93 avatar Jun 17 '23 01:06 mangalaman93

Just found this. I was originally going to try to revive my PR that fixed #8120 found here: https://github.com/dgraph-io/dgraph/pull/8123

I assume though that if this fixes it along with other things I probably shouldn't bother? I noticed the approach appears very different to mine (and probably much better).

EDIT: I tested the PR. It solved the ordering issue described in #8120 and also correctly returns null for IDs that do not exist in dgraph.

JLaferri avatar Jul 15 '23 04:07 JLaferri

Both @mangalaman93 and @JLaferri: Good to hear, thank you :)

I just wish the PR process was a little bit faster... Otherwise, what's the point of the project being open-source?

missbach avatar Jul 17 '23 08:07 missbach

Both @mangalaman93 and @JLaferri: Good to hear, thank you :)

I just wish the PR process was a little bit faster... Otherwise, what's the point of the project being open-source?

Well... I expect they'll get to it eventually.

But to answer your question directly from my side: the nice thing about it being open source is I can see and find your fix, pull it into my own branch, and deploy a version of 23.0.1 that includes your fix to my production environment without having to wait!

So even if it isn't being merged quickly for everyone, it's still helping me in the immediate term. Thanks!

JLaferri avatar Jul 17 '23 16:07 JLaferri

I am looking into this now. This is my top priority, please gimme a few days to understand the issue and the suggested a solution. If you get a chance to rebase this on latest main, that'd be great. I can see that it's already pretty up to date and rebase seems straight forward.

mangalaman93 avatar Jul 19 '23 19:07 mangalaman93

@JLaferri: Glad it helped you :)

@mangalaman93: Great news, thanks!

Rebase: Sure, but unfortunately I'm gonna be super busy the next 14 days. I also wanted to add some regression tests and resolve the "TODO WIP" comments that I left behind...

Would it be possible for you to do a (preliminary) review to see if it generally looks fine? And then I'll get back to it in one or two weeks.

missbach avatar Jul 19 '23 21:07 missbach

@mangalaman93 Did you get a chance to take a preliminary look at it? I've got more time now – as soon as you review it, I'll rebase it, add some tests, and address any comments you may have.

missbach avatar Aug 01 '23 10:08 missbach

@missbach will it be possible for you to get on a call with me? Either email me at [email protected] or ping me on community slack.

mangalaman93 avatar Aug 03 '23 14:08 mangalaman93

Sure, @aman-bansal :) I just emailed you from my personal email address jp******@gmail.com.

Thank you for your attention.

missbach avatar Aug 03 '23 15:08 missbach

Hi @aman-bansal, thanks again for the nice talk today :)

As discussed, I created a little example repo for you: https://github.com/missbach/dgraph-bug-example

missbach avatar Aug 08 '23 19:08 missbach

Hi @aman-bansal, any updates?

missbach avatar Aug 25 '23 08:08 missbach

This PR has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open.

github-actions[bot] avatar Jul 25 '24 14:07 github-actions[bot]