User description

Description

What -

Why -

How -

Type of change

Please leave one option from the following and delete the rest:

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] New Integration (non-breaking change which adds a new integration)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Non-breaking change (fix of existing functionality that will not change current behavior)
[ ] Documentation (added/updated documentation)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

[ ] Integration able to create all default resources from scratch
[ ] Resync finishes successfully
[ ] Resync able to create entities
[ ] Resync able to update entities
[ ] Resync able to detect and delete entities
[ ] Scheduled resync able to abort existing resync and start a new one
[ ] Tested with at least 2 integrations from scratch
[ ] Tested with Kafka and Polling event listeners
[ ] Tested deletion of entities that don't pass the selector

Integration testing checklist

[ ] Integration able to create all default resources from scratch
[ ] Resync able to create entities
[ ] Resync able to update entities
[ ] Resync able to detect and delete entities
[ ] Resync finishes successfully
[ ] If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
[ ] If resource kind is updated, run the integration with the example data and check if the expected result is achieved
[ ] If new resource kind is added or updated, validate that live-events for that resource are working as expected
[ ] Docs PR link here

Preflight checklist

[ ] Handled rate limiting
[ ] Handled pagination
[ ] Implemented the code in async
[ ] Support Multi account

Screenshots

Include screenshots from your environment showing how the resources of the integration will look.

API Documentation

Provide links to the API documentation used for this integration.

PR Type

Enhancement, Documentation, Tests

Description

Added a new GitHub integration for Port Ocean.
Implemented async handlers for GitHub API interactions.
Defined blueprints and configurations for GitHub resources.
Included documentation, testing, and development setup files.

Changes walkthrough 📝

Relevant files

Enhancement

4 files

client.py `Implemented GitHub API client with async handlers`	+85/-0
debug.py `Added debug entry point for GitHub integration`	+4/-0
main.py `Defined resync handlers for GitHub resources`	+96/-0
blueprints.json `Defined blueprints for GitHub resources`	+248/-0

Tests

1 files

test_sample.py
Added placeholder test for GitHub integration +2/-0

Configuration changes

7 files

launch.json `Added debug configuration for GitHub integration`	+14/-1
.env.example `Provided example environment variables for GitHub integration`	+8/-0
port-app-config.yml `Configured mappings for GitHub resources`	+86/-0
Makefile `Added Makefile for GitHub integration infrastructure`	+1/-0
poetry.toml `Configured Poetry for GitHub integration`	+3/-0
pyproject.toml `Defined dependencies and tools for GitHub integration`	+113/-0
sonar-project.properties `Added SonarQube configuration for GitHub integration`	+2/-0

Documentation

4 files

spec.yaml `Added specification for GitHub integration`	+24/-0
CHANGELOG.md `Added changelog for GitHub integration`	+8/-0
CONTRIBUTING.md `Added contributing guidelines for GitHub integration`	+7/-0
README.md `Added README for GitHub integration`	+7/-0

Dependencies

1 files

pyproject.toml
Added dotenv dependency for environment variable management +1/-0

Additional files

1 files

__init__.py

[link]

Need help?
Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
Check out the documentation for more information.

Mar 23 '25 23:03 kanmitcode

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪

🧪 No relevant tests

🔒 Security concerns

Sensitive information exposure:
The GitHub access token is stored in the integration configuration and used directly in the Authorization header. While this is a common pattern, there's no validation to ensure the token is properly secured or rotated. Additionally, the token is logged in the headers when debugging, which could expose it in log files.

⚡ Recommended focus areas for review

Error Handling

The client methods don't handle pagination for GitHub API responses, which could lead to incomplete data retrieval for repositories, issues, pull requests, etc. when there are more items than fit in a single response.

async def get_repositories(self):
    await self.check_rate_limit()
    url = f'{self.base_url}/user/repos'
    response = await self.client.get(url, headers=self.headers)
    if response.status_code == 200:
        return response.json()
    else:
        response.raise_for_status()

async def get_issues(self, username: str, repo: str):
    await self.check_rate_limit()
    url = f'{self.base_url}/repos/{username}/{repo}/issues'
    response = await self.client.get(url, headers=self.headers)
    if response.status_code == 200:
        return response.json()
    else:
        response.raise_for_status()

async def get_pull_requests(self, username: str, repo: str):
    await self.check_rate_limit()
    url = f'{self.base_url}/repos/{username}/{repo}/pulls'
    response = await self.client.get(url, headers=self.headers)
    if response.status_code == 200:
        return response.json()
    else:
        response.raise_for_status()

async def get_organizations(self):
    await self.check_rate_limit()
    url = f'{self.base_url}/user/orgs'
    response = await self.client.get(url, headers=self.headers)
    if response.status_code == 200:
        return response.json()
    else:
        response.raise_for_status()

async def get_teams(self, org: str):
    await self.check_rate_limit()
    url = f'{self.base_url}/orgs/{org}/teams'
    response = await self.client.get(url, headers=self.headers)
    if response.status_code == 200:
        return response.json()
    else:
        response.raise_for_status()

async def get_workflows(self, username: str, repo: str):
    await self.check_rate_limit()
    url = f'{self.base_url}/repos/{username}/{repo}/actions/workflows'
    response = await self.client.get(url, headers=self.headers)
    if response.status_code == 200:
        return response.json()
    else:
        response.raise_for_status()

Resource Exhaustion

The resync functions fetch all repositories and then make additional API calls for each repository, which could lead to rate limiting issues or timeouts with large GitHub accounts.

@ocean.on_resync('issue')
async def resync_issues(kind: str) -> list[dict[Any, Any]]:
    try:
        handler = GithubHandler()
        repos = await handler.get_repositories()
        all_issues = []
        for repo in repos:
            username = repo["owner"]["login"]
            repo_name = repo['name']
            issues = await handler.get_issues(username, repo_name)
            all_issues.extend(issues)
        logger.info('Issues: %s', all_issues)
        return all_issues
    except Exception as e:
        logger.error('Failed to resync issues: %s', e)
        return []

@ocean.on_resync('pull_request')
async def resync_pull_requests(kind: str) -> list[dict[Any, Any]]:
    try:
        handler = GithubHandler()
        repos = await handler.get_repositories()
        all_pull_requests = []
        for repo in repos:
            username = repo["owner"]["login"]
            repo_name = repo['name']
            pull_requests = await handler.get_pull_requests(username, repo_name)
            all_pull_requests.extend(pull_requests)
        logger.info('Pull Requests: %s', all_pull_requests)
        return all_pull_requests
    except Exception as e:
        logger.error('Failed to resync pull requests: %s', e)
        return []

@ocean.on_resync('team')
async def resync_teams(kind: str) -> list[dict[Any, Any]]:
    try:
        handler = GithubHandler()
        organizations = await handler.get_organizations()
        all_teams = []
        for org in organizations:
            teams = await handler.get_teams(org["login"])
            all_teams.extend(teams)
        logger.info('Teams: %s', all_teams)
        return all_teams
    except Exception as e:
        logger.error('Failed to resync teams: %s', e)
        return []

@ocean.on_resync('workflow')
async def resync_workflows(kind: str) -> list[dict[Any, Any]]:
    try:
        handler = GithubHandler()
        username = ocean.integration_config["github_username"]
        repos = await handler.get_repositories()
        all_workflows = []
        for repo in repos:
            username = repo["owner"]["login"]
            repo_name = repo['name']
            workflows = await handler.get_workflows(username, repo_name)
            all_workflows.extend(workflows)
        logger.info('Workflows: %s', all_workflows)
        return all_workflows
    except Exception as e:
        logger.error('Failed to resync workflows: %s', e)
        return []

Redundant Rate Limit Checks

Each API method calls check_rate_limit before making a request, which adds unnecessary API calls since GitHub includes rate limit information in every response header.

async def check_rate_limit(self):
    url = f'{self.base_url}/rate_limit'
    response = await self.client.get(url, headers=self.headers)
    if response.status_code == 200:
        rate_limit = response.json()
        logger.info('Rate Limit: %s', rate_limit)
        return rate_limit
    else:
        response.raise_for_status()

Mar 23 '25 23:03 qodo-code-review[bot]

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	✅ ~~Fix API response handling~~ Suggestion Impact: The suggestion was implemented in a different way. Instead of directly returning response.json()["workflows"], the commit completely refactored the client implementation to use pagination and yield individual workflow items. The new implementation in lines 236-244 iterates through the workflows and yields each workflow individually, which effectively addresses the original issue of accessing the nested workflows data. code diff: `+ async def get_workflows(self, username: str, repo: str) -> AsyncGenerator[dict[str, Any], None]: + """Fetch all workflows for a specific repository.""" + url = f"{self.base_url}/repos/{username}/{repo}/actions/workflows" + try: + async for workflows in self._fetch_paginated_api(url): + for workflow in workflows: + yield workflow + except Exception as e: + logger.error(f"Error fetching workflows for {username}/{repo}: {e}")` The GitHub API returns workflows in a nested structure under a 'workflows' key, but the current implementation returns the entire response. This will cause issues when trying to extend the workflows list in the resync_workflows function. integrations/github-cloud/client.py [78-85] `async def get_workflows(self, username: str, repo: str): await self.check_rate_limit() url = f'{self.base_url}/repos/{username}/{repo}/actions/workflows' response = await self.client.get(url, headers=self.headers) if response.status_code == 200: - return response.json() + return response.json()["workflows"] else: response.raise_for_status()` `[Suggestion has been applied]` Suggestion importance[1-10]: 9 __ Why: This suggestion addresses a critical issue with the GitHub API response handling. The current implementation doesn't account for the nested structure of the workflows data, which would cause runtime errors when trying to extend the workflows list in the resync_workflows function.	High
General	✅ ~~Remove unused variable~~ Suggestion Impact: The commit removed the unused 'username' variable that was initialized from ocean.integration_config['github_username'] as suggested. The entire function was refactored to use async generators, but the specific issue pointed out in the suggestion was addressed. code diff: `- username = ocean.integration_config["github_username"] - repos = await handler.get_repositories()` Remove the unused `username` variable that's being initialized from the integration config but then immediately overwritten in the loop. This could cause confusion and is unnecessary. integrations/github-cloud/main.py [80-92] @ocean.on_resync('workflow') async def resync_workflows(kind: str) -> list[dict[Any, Any]]: try: handler = GithubHandler() - username = ocean.integration_config["github_username"] repos = await handler.get_repositories() all_workflows = [] for repo in repos: username = repo["owner"]["login"] repo_name = repo['name'] workflows = await handler.get_workflows(username, repo_name) all_workflows.extend(workflows) logger.info('Workflows: %s', all_workflows) return all_workflows except Exception as e: logger.error('Failed to resync workflows: %s', e) return [] `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies an unused variable that could cause confusion. The variable 'username' is initialized from the integration config but then immediately overwritten in the loop, making the initial assignment redundant and potentially misleading.	Medium
[ ] Update

Mar 23 '25 23:03 qodo-code-review[bot]

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-1510.d1ftd8v2gowp8w.amplifyapp.com

Mar 23 '25 23:03 aws-amplify-eu-west-1[bot]

[Integration][Github-Cloud]Added port ocean for github integration

User description

Description

Type of change

All tests should be run against the port production environment(using a testing org).

Core testing checklist

Integration testing checklist

Preflight checklist

Screenshots

API Documentation

PR Type

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

PR Code Suggestions ✨