User description
Description
What -
Why -
How -
Type of change
Please leave one option from the following and delete the rest:
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] New Integration (non-breaking change which adds a new integration)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Non-breaking change (fix of existing functionality that will not change current behavior)
- [ ] Documentation (added/updated documentation)
All tests should be run against the port production environment(using a testing org).
Core testing checklist
- [ ] Integration able to create all default resources from scratch
- [ ] Resync finishes successfully
- [ ] Resync able to create entities
- [ ] Resync able to update entities
- [ ] Resync able to detect and delete entities
- [ ] Scheduled resync able to abort existing resync and start a new one
- [ ] Tested with at least 2 integrations from scratch
- [ ] Tested with Kafka and Polling event listeners
- [ ] Tested deletion of entities that don't pass the selector
Integration testing checklist
- [ ] Integration able to create all default resources from scratch
- [ ] Resync able to create entities
- [ ] Resync able to update entities
- [ ] Resync able to detect and delete entities
- [ ] Resync finishes successfully
- [ ] If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the
examples folder in the integration directory.
- [ ] If resource kind is updated, run the integration with the example data and check if the expected result is achieved
- [ ] If new resource kind is added or updated, validate that live-events for that resource are working as expected
- [ ] Docs PR link here
Preflight checklist
- [ ] Handled rate limiting
- [ ] Handled pagination
- [ ] Implemented the code in async
- [ ] Support Multi account
Screenshots
Include screenshots from your environment showing how the resources of the integration will look.
API Documentation
Provide links to the API documentation used for this integration.
PR Type
Enhancement, Documentation, Tests
Description
-
Added a new GitHub integration for Port Ocean.
-
Implemented async handlers for GitHub API interactions.
-
Defined blueprints and configurations for GitHub resources.
-
Included documentation, testing, and development setup files.
Changes walkthrough 📝
| Relevant files |
|---|
| Enhancement | 4 files
client.pyImplemented GitHub API client with async handlers |
+85/-0 |
debug.pyAdded debug entry point for GitHub integration |
+4/-0 |
main.pyDefined resync handlers for GitHub resources |
+96/-0 |
blueprints.jsonDefined blueprints for GitHub resources |
+248/-0 |
|
| Tests | 1 files
test_sample.pyAdded placeholder test for GitHub integration |
+2/-0 |
|
| Configuration changes | 7 files
launch.jsonAdded debug configuration for GitHub integration |
+14/-1 |
.env.exampleProvided example environment variables for GitHub integration |
+8/-0 |
port-app-config.ymlConfigured mappings for GitHub resources |
+86/-0 |
MakefileAdded Makefile for GitHub integration infrastructure |
+1/-0 |
poetry.tomlConfigured Poetry for GitHub integration |
+3/-0 |
pyproject.tomlDefined dependencies and tools for GitHub integration |
+113/-0 |
sonar-project.propertiesAdded SonarQube configuration for GitHub integration |
+2/-0 |
|
| Documentation | 4 files
spec.yamlAdded specification for GitHub integration |
+24/-0 |
CHANGELOG.mdAdded changelog for GitHub integration |
+8/-0 |
CONTRIBUTING.mdAdded contributing guidelines for GitHub integration |
+7/-0 |
README.mdAdded README for GitHub integration |
+7/-0 |
|
| Dependencies | 1 files
pyproject.tomlAdded dotenv dependency for environment variable management |
+1/-0 |
|
| Additional files | 1 files |
Need help?
Type /help how to ... in the comments thread for any questions about Qodo Merge usage.Check out the documentation for more information.
PR Reviewer Guide 🔍
Here are some key observations to aid the review process:
| ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪ |
| 🧪 No relevant tests |
🔒 Security concerns
Sensitive information exposure: The GitHub access token is stored in the integration configuration and used directly in the Authorization header. While this is a common pattern, there's no validation to ensure the token is properly secured or rotated. Additionally, the token is logged in the headers when debugging, which could expose it in log files. |
⚡ Recommended focus areas for review
Error Handling
The client methods don't handle pagination for GitHub API responses, which could lead to incomplete data retrieval for repositories, issues, pull requests, etc. when there are more items than fit in a single response.
async def get_repositories(self):
await self.check_rate_limit()
url = f'{self.base_url}/user/repos'
response = await self.client.get(url, headers=self.headers)
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()
async def get_issues(self, username: str, repo: str):
await self.check_rate_limit()
url = f'{self.base_url}/repos/{username}/{repo}/issues'
response = await self.client.get(url, headers=self.headers)
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()
async def get_pull_requests(self, username: str, repo: str):
await self.check_rate_limit()
url = f'{self.base_url}/repos/{username}/{repo}/pulls'
response = await self.client.get(url, headers=self.headers)
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()
async def get_organizations(self):
await self.check_rate_limit()
url = f'{self.base_url}/user/orgs'
response = await self.client.get(url, headers=self.headers)
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()
async def get_teams(self, org: str):
await self.check_rate_limit()
url = f'{self.base_url}/orgs/{org}/teams'
response = await self.client.get(url, headers=self.headers)
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()
async def get_workflows(self, username: str, repo: str):
await self.check_rate_limit()
url = f'{self.base_url}/repos/{username}/{repo}/actions/workflows'
response = await self.client.get(url, headers=self.headers)
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()
Resource Exhaustion
The resync functions fetch all repositories and then make additional API calls for each repository, which could lead to rate limiting issues or timeouts with large GitHub accounts.
@ocean.on_resync('issue')
async def resync_issues(kind: str) -> list[dict[Any, Any]]:
try:
handler = GithubHandler()
repos = await handler.get_repositories()
all_issues = []
for repo in repos:
username = repo["owner"]["login"]
repo_name = repo['name']
issues = await handler.get_issues(username, repo_name)
all_issues.extend(issues)
logger.info('Issues: %s', all_issues)
return all_issues
except Exception as e:
logger.error('Failed to resync issues: %s', e)
return []
@ocean.on_resync('pull_request')
async def resync_pull_requests(kind: str) -> list[dict[Any, Any]]:
try:
handler = GithubHandler()
repos = await handler.get_repositories()
all_pull_requests = []
for repo in repos:
username = repo["owner"]["login"]
repo_name = repo['name']
pull_requests = await handler.get_pull_requests(username, repo_name)
all_pull_requests.extend(pull_requests)
logger.info('Pull Requests: %s', all_pull_requests)
return all_pull_requests
except Exception as e:
logger.error('Failed to resync pull requests: %s', e)
return []
@ocean.on_resync('team')
async def resync_teams(kind: str) -> list[dict[Any, Any]]:
try:
handler = GithubHandler()
organizations = await handler.get_organizations()
all_teams = []
for org in organizations:
teams = await handler.get_teams(org["login"])
all_teams.extend(teams)
logger.info('Teams: %s', all_teams)
return all_teams
except Exception as e:
logger.error('Failed to resync teams: %s', e)
return []
@ocean.on_resync('workflow')
async def resync_workflows(kind: str) -> list[dict[Any, Any]]:
try:
handler = GithubHandler()
username = ocean.integration_config["github_username"]
repos = await handler.get_repositories()
all_workflows = []
for repo in repos:
username = repo["owner"]["login"]
repo_name = repo['name']
workflows = await handler.get_workflows(username, repo_name)
all_workflows.extend(workflows)
logger.info('Workflows: %s', all_workflows)
return all_workflows
except Exception as e:
logger.error('Failed to resync workflows: %s', e)
return []
Redundant Rate Limit Checks
Each API method calls check_rate_limit before making a request, which adds unnecessary API calls since GitHub includes rate limit information in every response header.
async def check_rate_limit(self):
url = f'{self.base_url}/rate_limit'
response = await self.client.get(url, headers=self.headers)
if response.status_code == 200:
rate_limit = response.json()
logger.info('Rate Limit: %s', rate_limit)
return rate_limit
else:
response.raise_for_status()
|
PR Code Suggestions ✨
Explore these optional code suggestions:
| Category | Suggestion | Impact |
| Possible issue |
✅ Fix API response handling
Suggestion Impact:The suggestion was implemented in a different way. Instead of directly returning response.json()["workflows"], the commit completely refactored the client implementation to use pagination and yield individual workflow items. The new implementation in lines 236-244 iterates through the workflows and yields each workflow individually, which effectively addresses the original issue of accessing the nested workflows data.
code diff:
+ async def get_workflows(self, username: str, repo: str) -> AsyncGenerator[dict[str, Any], None]:
+ """Fetch all workflows for a specific repository."""
+ url = f"{self.base_url}/repos/{username}/{repo}/actions/workflows"
+ try:
+ async for workflows in self._fetch_paginated_api(url):
+ for workflow in workflows:
+ yield workflow
+ except Exception as e:
+ logger.error(f"Error fetching workflows for {username}/{repo}: {e}")
The GitHub API returns workflows in a nested structure under a 'workflows' key, but the current implementation returns the entire response. This will cause issues when trying to extend the workflows list in the resync_workflows function.
integrations/github-cloud/client.py [78-85]
async def get_workflows(self, username: str, repo: str):
await self.check_rate_limit()
url = f'{self.base_url}/repos/{username}/{repo}/actions/workflows'
response = await self.client.get(url, headers=self.headers)
if response.status_code == 200:
- return response.json()
+ return response.json()["workflows"]
else:
response.raise_for_status()
[Suggestion has been applied]
Suggestion importance[1-10]: 9
__
Why: This suggestion addresses a critical issue with the GitHub API response handling. The current implementation doesn't account for the nested structure of the workflows data, which would cause runtime errors when trying to extend the workflows list in the resync_workflows function.
| High
|
| General |
✅ Remove unused variable
Suggestion Impact:The commit removed the unused 'username' variable that was initialized from ocean.integration_config['github_username'] as suggested. The entire function was refactored to use async generators, but the specific issue pointed out in the suggestion was addressed.
code diff:
- username = ocean.integration_config["github_username"]
- repos = await handler.get_repositories()
Remove the unused username variable that's being initialized from the integration config but then immediately overwritten in the loop. This could cause confusion and is unnecessary.
integrations/github-cloud/main.py [80-92]
@ocean.on_resync('workflow')
async def resync_workflows(kind: str) -> list[dict[Any, Any]]:
try:
handler = GithubHandler()
- username = ocean.integration_config["github_username"]
repos = await handler.get_repositories()
all_workflows = []
for repo in repos:
username = repo["owner"]["login"]
repo_name = repo['name']
workflows = await handler.get_workflows(username, repo_name)
all_workflows.extend(workflows)
logger.info('Workflows: %s', all_workflows)
return all_workflows
except Exception as e:
logger.error('Failed to resync workflows: %s', e)
return []
[To ensure code accuracy, apply this suggestion manually]
Suggestion importance[1-10]: 7
__
Why: The suggestion correctly identifies an unused variable that could cause confusion. The variable 'username' is initialized from the integration config but then immediately overwritten in the loop, making the initial assignment redundant and potentially misleading.
| Medium
|
|
| |
This pull request is automatically being deployed by Amplify Hosting (learn more).
Access this pull request here: https://pr-1510.d1ftd8v2gowp8w.amplifyapp.com