ocean icon indicating copy to clipboard operation
ocean copied to clipboard

[Integration][Github] added github integration to ocean

Open Lanrey opened this issue 8 months ago • 3 comments

User description

Description

What - Added github integration to ocean

Why - To sync some github resources via the ocean framework

How - Used webhooks to listen for github events, and ocean's httpx async custom package to fetch the data from github api

Type of change

Please leave one option from the following and delete the rest:

  • [x ] New Integration (non-breaking change which adds a new integration)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

  • [ x] Integration able to create all default resources from scratch
  • [x ] Resync finishes successfully
  • [x ] Resync able to create entities
  • [ x] Resync able to update entities
  • [x ] Resync able to detect and delete entities
  • [ x] Scheduled resync able to abort existing resync and start a new one
  • [x ] Tested with at least 2 integrations from scratch
  • [ x] Tested with Kafka and Polling event listeners
  • [ x] Tested deletion of entities that don't pass the selector

Integration testing checklist

  • [x ] Integration able to create all default resources from scratch
  • [x ] Resync able to create entities
  • [x ] Resync able to update entities
  • [ x] Resync able to detect and delete entities
  • [ x] Resync finishes successfully
  • [x ] If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
  • [ x] If resource kind is updated, run the integration with the example data and check if the expected result is achieved
  • [x ] If new resource kind is added or updated, validate that live-events for that resource are working as expected
  • [x ] Docs PR link here

Preflight checklist

  • [x ] Handled rate limiting
  • [ x] Handled pagination
  • [x ] Implemented the code in async
  • [ x] Support Multi account

Screenshots

Include screenshots from your environment showing how the resources of the integration will look.

Screenshot-2025-04-16-at-13-55-54-2 Screenshot-2025-04-16-at-13-56-00-2 Screenshot-2025-04-16-at-13-56-05-2

API Documentation

Provide links to the API documentation used for this integration.


PR Type

New Integration, Enhancement, Tests


Description

  • Added a new GitHub integration to Ocean for syncing GitHub resources.

  • Implemented webhook processors for GitHub events (issues, pull requests, teams, workflows, etc.).

  • Developed extensive test cases for the GitHub client and webhook processors.

  • Created configuration files for integration setup, including .env.example, blueprints.json, and port-app-config.yaml.


Changes walkthrough 📝

Relevant files
Enhancement
13 files
debug.py
Add debug entry point for GitHub integration                         
+4/-0     
client.py
Implement GitHub client for API communication                       
+298/-0 
overrides.py
Define resource configurations and selectors for GitHub   
+87/-0   
initialize_client.py
Add helper to initialize GitHub client                                     
+16/-0   
integration.py
Define base integration class for GitHub                                 
+9/-0     
kinds.py
Define enumeration for GitHub entity kinds                             
+15/-0   
main.py
Main entry point for GitHub integration                                   
+114/-0 
issue_webhook_processor.py
Add processor for GitHub issue webhook events                       
+86/-0   
pull_request_webhook_processor.py
Add processor for GitHub pull request webhook events         
+83/-0   
push_webhook_processor.py
Add processor for GitHub push webhook events                         
+150/-0 
repository_webhook_processor.py
Add processor for GitHub repository webhook events             
+95/-0   
team_webhook_processor.py
Add processor for GitHub team webhook events                         
+80/-0   
workflow_webhook_processor.py
Add processor for GitHub workflow run webhook events         
+78/-0   
Tests
3 files
test_client.py
Add test cases for GitHub client                                                 
+479/-0 
test_processor.py
Add test cases for webhook processors                                       
+249/-0 
test_sample.py
Add placeholder test example                                                         
+2/-0     
Configuration changes
6 files
.env.example
Provide example environment variables for GitHub integration
+6/-0     
blueprints.json
Define blueprints for GitHub resources in Port                     
+429/-0 
port-app-config.yaml
Add Port app configuration for GitHub integration               
+179/-0 
spec.yaml
Add specification file for GitHub integration                       
+58/-0   
Makefile
Add Makefile for GitHub integration setup                               
+1/-0     
sonar-project.properties
Add SonarQube configuration for GitHub integration             
+2/-0     
Documentation
3 files
CHANGELOG.md
Add changelog for GitHub integration                                         
+42/-0   
CONTRIBUTING.md
Add contributing guidelines for GitHub integration             
+7/-0     
README.md
Add README for GitHub integration                                               
+7/-0     
Dependencies
2 files
poetry.toml
Add Poetry configuration for virtual environments               
+3/-0     
pyproject.toml
Define project dependencies and configurations                     
+113/-0 
Additional files
3 files
__init__.py [link]   
__init__.py [link]   
__init__.py [link]   

Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Lanrey avatar Apr 11 '25 06:04 Lanrey

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 Security concerns

    Sensitive information exposure:
    The GitHub webhook creation in client.py (line 123) directly uses the GITHUB_WEBHOOK_SECRET from environment variables without validation. If this secret is not properly set or is exposed in logs, it could compromise webhook security. Additionally, all webhook processors store this secret in memory after loading it from environment variables, which increases the attack surface.

    ⚡ Recommended focus areas for review

    Security Concern

    The _create_events_webhook method uses params parameter incorrectly when making a POST request. The webhook configuration is passed as params instead of as the request body, which could cause the webhook creation to fail.

    # Create the new webhook using a POST request.
    await self._request("POST", self.webhook_url, params=body)
    logger.info("GitHub webhook created")
    
    Error Handling

    The handle_event method catches all exceptions when fetching commit data but continues execution, which could lead to inconsistent or incomplete data processing without proper fallback mechanisms.

    except Exception as e:
        logger.error(f"Error fetching commit data: {str(e)}")
        # Continue with the webhook data we have
    
    Potential Bug

    The create_github_client function doesn't handle the case where required configuration keys might be missing from the integration_config, which could lead to runtime errors.

    def create_github_client() -> GitHubClient:
        config = ocean.integration_config
        return GitHubClient(
            token=config["github_token"],
            org=config["github_org"],
            repo=config["github_repo"]
        )
    

    qodo-code-review[bot] avatar Apr 11 '25 06:04 qodo-code-review[bot]

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Security
    Fix webhook signature verification

    The webhook authentication is using json.dumps(payload) to convert the payload
    to bytes, but GitHub sends the raw payload as the HMAC input. For webhook
    verification, you should use the raw request body bytes that were received, not
    the re-serialized JSON.

    integrations/github/webhook_processors/push_webhook_processor.py [133-147]

    -async def authenticate(self, payload: dict, headers: dict) -> bool:
    +async def authenticate(self, payload: dict, headers: dict, raw_body: bytes = None) -> bool:
         secret = os.getenv("GITHUB_WEBHOOK_SECRET")
         if not secret:
             logger.error("GITHUB_WEBHOOK_SECRET is not set")
             return False
         received_signature = headers.get("x-hub-signature-256")
         if not received_signature:
             logger.error("Missing X-Hub-Signature-256 header")
             return False
    -    payload_bytes = json.dumps(payload, separators=(',', ':')).encode('utf-8')
    +    
    +    # Use raw_body if provided, otherwise fallback to re-serializing
    +    if raw_body:
    +        payload_bytes = raw_body
    +    else:
    +        payload_bytes = json.dumps(payload, separators=(',', ':')).encode('utf-8')
    +        
         computed_signature = "sha256=" + hmac.new(secret.encode(), payload_bytes, hashlib.sha256).hexdigest()
         if not hmac.compare_digest(received_signature, computed_signature):
             logger.error("Signature verification failed for push event")
             return False
         return True
    
    • [ ] Apply this suggestion
    Suggestion importance[1-10]: 10

    __

    Why: This is a critical security fix. The current implementation re-serializes the payload for signature verification, which would fail with GitHub's webhook signatures since GitHub calculates the signature using the raw request body. This would cause all webhook verifications to fail, potentially allowing unauthorized requests.

    High
    Remove hardcoded default URL
    Suggestion Impact:The commit removed the hardcoded ngrok URL from the default configuration as suggested

    code diff:

    -    default: "https://e459-102-89-41-50.ngrok-free.app"
    

    Remove the hardcoded default URL from the configuration specification. This URL
    appears to be a development/testing URL (ngrok) and should not be included as a
    default in production code.

    integrations/github/.port/spec.yaml [43-49]

     - name: base_url
       required: true
       type: url
       description: >
         The publicly accessible base URL where your integration is hosted.
         This URL will be used to form the webhook endpoint (appended with /integration/webhook) so that GitHub can deliver webhook events.
    -  default: "https://e459-102-89-41-50.ngrok-free.app"
    

    [Suggestion has been applied]

    Suggestion importance[1-10]: 9

    __

    Why: Removing the hardcoded ngrok URL from the default configuration prevents potential security issues and confusion for users. Development/testing URLs should never be included in production configuration defaults.

    High
    Remove hardcoded URL
    Suggestion Impact:The commit removed the hardcoded URL and commented-out code exactly as suggested, eliminating the security risk of exposing a potentially sensitive URL in the codebase

    code diff:

    -    base_url = ocean.integration_config["base_url"]#ocean.app.base_url ##'https://e459-102-89-41-50.ngrok-free.app'
    +    base_url = ocean.integration_config["base_url"]
    

    Remove the commented-out code and hardcoded URL. This is a security risk as it
    exposes a potentially sensitive URL in the codebase. Use only the configuration
    value from ocean.integration_config.

    integrations/github/main.py [33]

    -base_url = ocean.integration_config["base_url"]#ocean.app.base_url ##'https://e459-102-89-41-50.ngrok-free.app'
    +base_url = ocean.integration_config["base_url"]
    

    [Suggestion has been applied]

    Suggestion importance[1-10]: 8

    __

    Why: Removing the hardcoded URL and commented code improves security by eliminating potential exposure of sensitive information. It also makes the code cleaner and more maintainable.

    Medium
    Use configuration over environment variables
    Suggestion Impact:The commit implemented the suggestion by changing the code to use ocean.integration_config.get('github_webhook_secret') instead of os.getenv('GITHUB_WEBHOOK_SECRET') for accessing the webhook secret.

    code diff:

    +    async def authenticate(self, payload: EventPayload, headers: EventHeaders) -> bool:
    +        secret = ocean.integration_config.get("github_webhook_secret")
    

    Use the webhook secret from the integration configuration instead of directly
    accessing an environment variable. This ensures consistency with how other
    configuration values are accessed in the integration.

    integrations/github/webhook_processors/repository_webhook_processor.py [73-77]

     async def authenticate(self, payload: EventPayload, headers: dict) -> bool:
    -    secret = os.getenv("GITHUB_WEBHOOK_SECRET")
    +    from port_ocean.context.ocean import ocean
    +    secret = ocean.integration_config.get("github_webhook_secret")
         if not secret:
    -        logger.error("GITHUB_WEBHOOK_SECRET is not set")
    +        logger.error("github_webhook_secret is not set in integration configuration")
             return False
    

    [Suggestion has been applied]

    Suggestion importance[1-10]: 7

    __

    Why: Using the integration configuration instead of environment variables ensures consistency with the rest of the codebase and centralizes configuration management, making the code more maintainable and secure.

    Medium
    Possible issue
    Fix webhook creation request
    Suggestion Impact:The commit implemented the suggestion by changing the params parameter to json parameter in the _request method and all its usages, including the webhook creation function that was specifically mentioned in the suggestion.

    code diff:

    -    async def _request(self, method: str, url: str, params: Optional[Dict[str, str]] = None) -> Any:
    +    async def _request(self, method: str, url: str, json: Optional[Dict[str, str]] = None) -> Any:
             try:
                 async with self._semaphore:
    -                response = await self.client.request(method, url, params=params)
    +                response = await self.client.request(method, url, json=json)
             except httpx.HTTPError as exc:
                 logger.error(f"HTTP request failed: {method} {url} - {exc}")
                 raise
    @@ -56,7 +56,7 @@
             if wait_time:
                 await asyncio.sleep(wait_time)
                 # After waiting, retry the request recursively.
    -            return await self._request(method, url, params)
    +            return await self._request(method, url, json)
     
             try:
                 response.raise_for_status()
    @@ -125,7 +125,7 @@
             }
     
             # Create the new webhook using a POST request.
    -        await self._request("POST", self.webhook_url, params=body)
    +        await self._request("POST", self.webhook_url, json=body)
             logger.info("GitHub webhook created")
    

    The webhook creation is using params parameter for the POST request body, but
    the _request method expects the request body to be passed as a separate
    parameter, not as params. The params parameter is typically used for query
    parameters in URLs, not for the request body.

    integrations/github/github/client.py [100-129]

     async def _create_events_webhook(self, app_host: str) -> None:
         # The target URL for the webhook endpoint.
         webhook_target = f"{app_host}/integration/webhook"
     
         # Retrieve the list of existing webhooks from GitHub.
         webhooks = await self._request("GET", url=self.webhook_url)
     
         # Check if a webhook with the same target URL already exists.
         for hook in webhooks.json():
             if hook.get("config", {}).get("url") == webhook_target:
                 logger.info("GitHub webhook already exists")
                 return
     
         # Prepare the request payload according to GitHub's API.
         # See: https://docs.github.com/en/rest/webhooks/repos?apiVersion=2022-11-28#create-a-repository-webhook
         body = {
             "name": "web",
             "active": True,
             "events": WEBHOOK_EVENTS,  # e.g., ["push", "pull_request"]; define this constant accordingly
             "config": {
                 "url": webhook_target,
                 "content_type": "json",
                 "insecure_ssl": "0",
                 "secret": os.getenv("GITHUB_WEBHOOK_SECRET")
             }
         }
     
         # Create the new webhook using a POST request.
    -    await self._request("POST", self.webhook_url, params=body)
    +    await self._request("POST", self.webhook_url, json=body)
         logger.info("GitHub webhook created")
    

    [Suggestion has been applied]

    Suggestion importance[1-10]: 9

    __

    Why: The suggestion correctly identifies a critical issue where the webhook creation is using the params parameter for the POST request body, which is incorrect. This would cause webhook creation to fail as the body data would be sent as URL query parameters instead of as a JSON body.

    High
    • [ ] Update

    qodo-code-review[bot] avatar Apr 11 '25 06:04 qodo-code-review[bot]

    This pull request is automatically being deployed by Amplify Hosting (learn more).

    Access this pull request here: https://pr-1557.d1ftd8v2gowp8w.amplifyapp.com