dataall icon indicating copy to clipboard operation
dataall copied to clipboard

Migrate to AWS AppSync

Open anmolsgandhi opened this issue 1 year ago • 3 comments

Description:

The proposal entails migrating data.all's GraphQL endpoint from the current implementation via API Gateway to AWS AppSync, a managed service specifically designed for serverless GraphQL. The current reliance on API Gateway, intended primarily for REST protocols, necessitates the implementation of custom logic deviating from standard GraphQL definitions. This departure poses challenges for GraphQL developers working within the data.all environment, as it diverges from conventional GraphQL practices. The migration to AppSync is to align with standard GraphQL conventions, providing a more seamless experience for developers and addressing troubleshooting difficulties associated with a custom wrapper. This change aims to enhance the compatibility and user-friendliness of data.all for GraphQL developers.

Details:

  • Schema: Define GraphQL schemas as standard GraphQL schema files. Use a single schema file in AppSync.
  • Resolvers: Migrate the GraphQL resolvers to AppSync pipeline resolvers.
  • Datasources: Associate the datasources (e.g. Aurora serverless, OpenSearch...) to the AppSync endpoint. Simplify the datasource when possible (e.g. leverage the AWS data API for Aurora or deprecate Lambda proxys).
  • Out of the box features: Take advantage of the out of the box features offered by AppSync including caching, authentication/authorization options, logging and monitoring.
  • Infrastructure: Replace the current API Gateway infrastructure by AppSync resources while maintaining data.all modular principles
  • Testing: Modify current integration tests to ensure API calls are tested.
  • Local development: Implement the necessary changes to preserve the local developer experience that allows developers to run data.all on Docker containers.

Benefits:

  • Simplified developer experience: AppSync is a managed AWS service which is purpose built for GraphQL API hosting. Adopting it would simplify the overall developer experience as responsibilities are delegated to the service instead of being self-managed. Examples include out of the box features such as caching
  • Standardization: Our current contributors are surprised by the API Gateway setup which they find unconventional and convoluted. This migration will ensure that our backend follows a standardized, best-practice, approach adopted by most other web app projects
  • Improved debugging/testing: With this migration developers would be able to tap into a vast array of debugging/testing tools that were unavailable to them previously

@dlpzx @petrkalos

anmolsgandhi avatar Jan 09 '24 19:01 anmolsgandhi

For the migration to AppSync we are proposing a 2-phase approach to reduce risks and complexity of the changes.

Phase 1: Lift and Shift + Code First (Planned)

Phase 1 Description

Migrate to AppSync, leaving backend logic as-is. This phase represents a "lift and shift" strategy, transitioning from data.all's current API Gateway REST endpoints to AppSync GraphQL endpoints without significant changes to the backend logic. It involves limited refactoring of the backend/frontend code to make it compatible with AppSync endpoints and removing custom wrapper pieces.

Scope of Refactoring

  • API handler must remove all logic around HTTP errors/authentication, as they are no longer required for AppSync.
    • also rework the ReAuth logic
  • API handler will no longer need to build GQL schema and AST; those are handled and passed as Lambda payload by AppSync.
  • Decorate all the resolvers as required by the AppSync Direct Lambda Resolver library (mostly automated process see the migration script that generates a patch from the runtime)
  • Drop the custom gql module library.
  • Adjustments in CDK
    • Write the schema using CodeFirst library (mostly manual work but we can automate parts of it)
    • In this first approach we will only need a Lambda Direct resolver which we will bind it to all resolvable fields of the schema

Code Sample Example

There is a PoC branch that you can see some of the changes required

  • Using Lambda PowerTools to dispatch AppSync requests to the resolvers, see here
  • Register a resolver for Query.listOrganizations with a decorator here
  • Encode resolver responses to dict (they can be scalars, json, SQA Objects) using fastapi’s jsonable encoder here
  • Defining an Object in CDK using CodeFirst here
  • Defining the Query.listOrganizations using CodeFirst and pointing to the Direct Lambda resolver here

Phase 2 AppSync Optimization (Not Planned)

Phase 2 Description

Optimize Data.all Backend Code using AppSync Native Components. This phase aims at "rearchitecting and optimization" requiring heavy refactoring to leverage AppSync's native components for backend logic improvement, readability, risk minimization, and performance enhancement.

Scope of Optimization

  • Breakdown of single backend lambda unit resolver into its functional components using AppSync Pipeline Resolvers.
  • Improved User Authentication/Authorization using AppSync-native Auth patterns.

Example

  • Breaking down the CreateDataset API from a Unit Resolver (Lambda) to a Pipeline Resolver with steps for various operations and tasks.

Benefits

  • Risk Mitigation: Separating migration and optimization into distinct phases mitigates risk.
  • Fast Incremental Benefits: Quick realization of benefits from moving to AppSync while limiting impact on backward compatibility.
  • Flexibility: Ability to adapt to changing designs/requirements during the optimization phase.
  • Iterative Improvements: Enables iterative improvements based on feedback from the open-source community and internal team.
  • Prioritization: Focuses on addressing migration challenges first and optimization challenges in a separate phase.

Call Outs (Risks)

  • Extended Overall Development Time: Longer migration timeline due to the phased approach.
  • Potential for Obsolete Code: Risk that code developed in the initial phase may become obsolete.
  • Incompatibility with local development: Local development might become incompatible once migrated to AppSync
  • Schema and code in different places: schema will be defined in CDK and execution logic will be in the backend lambda which might lead to some confusion.

petrkalos avatar Mar 15 '24 19:03 petrkalos

@noah-paige @dlpzx @petrkalos I'm not familiar with AppSync and something I need to learn more about but I'll take you for your word that this is a cleaner solution to implement graphql endpoints on AWS

My only concerns for this design are:

  1. There's no mention of pricing in the design. How is AppSync priced and how can we calculate how much operation costs would increase after switching to AppSync?

  2. A very important bit for us is that we do not use Cognito and integrate directly to OKTA without using Cognito as a proxy. The way this works today is with react-oidc and a custom authorizer on the API gateway. Design should mention how this pathway of authentication will continue to be supported.

  3. You already called this out but local development support is critical. I would like to ask for more details in the design how the local development experience will change with AppSync integration.

  4. Re-auth flow is very important. At the moment design only mentions "rework the ReAuth logic". We should make sure that we have a good idea if we will be able to make that work.

zsaltys avatar Mar 28 '24 15:03 zsaltys

Hi @zsaltys and thanks for your comments

pricing

ApiGateway is 3.5$ per request AppSync is 4$ per request It makes AppSync ~15% more expensive, we will be able to close this gap (see points below) or even make it cheaper.

  • Lambda will have to make significant less work (remember all the ApolloGL logic for initialisation but also for resolving which code to execute)
  • AppSync natively supports pub/sub, hence we can migrate away from polling and save $$$
  • In Phase2 we will start using the integration of AppSync with RDS and OpenSearch. Currently we do ApiGateway -> Lambda -> RDS/OpenSearch, with AppSync we can do AppSync -> RDS/OpenSearch hence we will avoid the lambda costs all together. For more details on this have a look here at the supported data sources.

custom authorizer

AppSync supports OpenID custom authorizers out of the box, so I think there won't be any issue with OKTA. There is already a blogpost talking about this integration. I'd appreciate if you could take a look and let me know if it seems reasonable.

local dev support

I will dive deeper into it but on my first few iterations it seems that we won't be able to have a local backend like we currently do with Docker. In my testing I had to redirect my local frontend to the AWS backend and to minimize the time it takes to test changes I was just making updates to specific stacks or just the lambda.

ReAuth workflow

The reason we need to rework it is because the way it currently works is by returning HTTP specifics from the Lambda (302 etc). If we move to AppSync the Lambda won't know anything about HTTP, it returns only a dict with the requested data. I didn't write any implementation details wrt ReAuth because there are a few ways we can achieve it and we need to take a close look to find which one is the best.

petrkalos avatar Apr 02 '24 09:04 petrkalos

@petrkalos @anmolsgandhi Should we close this issue for the time being? We can always re-open and come back to all the exploration that we did

dlpzx avatar Jul 12 '24 06:07 dlpzx

For now we decided not proceed with AppSync for the following reasons...

  • Users are used to develop using the local environemnt (docker compose) and with AppSync there is no easy way to continue supporting this.
  • One of the reasons why one would want to use AppSync is use data sources directly without going through a lambda, dataall have quite complex logic and the amount of queries that can be implement as a resolvers are very few. As a result we will still end up having lambda's for the complex parts and resolvers which will make tracking the logic very hard (some business logic code will leave in CDK and some in Lambda code).

petrkalos avatar Sep 05 '24 13:09 petrkalos