User description
Description
What - A new GitHub integration for Port's Ocean framework that syncs GitHub resources to Port.
Why - To allow Port users to import and track their GitHub resources (repositories, pull requests, issues, teams, and workflows) in their developer portal.
How - Using GitHub's REST API v3 with async processing, rate limiting, and webhook support.
Type of change
Please leave one option from the following and delete the rest:
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [x] New Integration (non-breaking change which adds a new integration)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Non-breaking change (fix of existing functionality that will not change current behavior)
- [ ] Documentation (added/updated documentation)
All tests should be run against the port production environment(using a testing org).
Core testing checklist
- [x] Integration able to create all default resources from scratch
- [x] Resync finishes successfully
- [x] Resync able to create entities
- [x] Resync able to update entities
- [x] Resync able to detect and delete entities
- [x] Scheduled resync able to abort existing resync and start a new one
- [x] Tested with at least 2 integrations from scratch
- [x] Tested with Kafka and Polling event listeners
- [x] Tested deletion of entities that don't pass the selector
Integration testing checklist
- [x] Integration able to create all default resources from scratch
- [x] Resync able to create entities
- [x] Resync able to update entities
- [x] Resync able to detect and delete entities
- [x] Resync finishes successfully
- [x] If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the
examples folder in the integration directory.
- [x] If resource kind is updated, run the integration with the example data and check if the expected result is achieved
- [x] If new resource kind is added or updated, validate that live-events for that resource are working as expected
- [x] Docs PR link here
Preflight checklist
- [x] Handled rate limiting
- [x] Handled pagination
- [x] Implemented the code in async
- [x] Support Multi account
Screenshots
Include screenshots from your environment showing how the resources of the integration will look.
API Documentation
Provide links to the API documentation used for this integration.
Additional Implementation Details:
- Rate Limiting:
- Uses GitHub's rate limit headers (X-RateLimit-)
- Semaphore for concurrent request limiting
- Automatic backoff when limits are reached
- Logging of rate limit status
- Pagination:
- Implements GitHub's page-based pagination
- Configurable page size (default 100)
- Eficient async processing of pages
- Proper handling of empty results
- Webhook Support:
- Organization-level webhook creation
- Event-specific processors
- Secure webhook validation
- Real-time entity updates
- Resource Processing:
- Efficient batch processing
- Proper error handling
- Detailed logging
- Resource relationship mapping
PR Type
Enhancement, Documentation, Tests
Description
-
Added a new GitHub integration to sync repositories, pull requests, issues, teams, and workflows.
-
Implemented a GitHub client with rate-limited API requests.
-
Defined resource blueprints and mappings for GitHub entities in Port.
-
Included example environment configuration and documentation for setup and contribution.
Changes walkthrough 📝
| Relevant files |
|---|
| Enhancement | 6 files
client.pyImplement GitHub client with API rate-limiting |
+84/-0 |
debug.pyAdd debug entry point for GitHub integration |
+4/-0 |
integration.pyDefine GitHub integration logic and resource handling |
+97/-0 |
main.pyAdd main entry point for GitHub integration |
+86/-0 |
blueprints.jsonDefine resource blueprints for GitHub entities |
+228/-0 |
port-app-config.ymlAdd Port app configuration for GitHub integration |
+90/-0 |
|
| Tests | 1 files
test_sample.pyAdd placeholder test for GitHub integration |
+2/-0 |
|
| Configuration changes | 4 files
launch.jsonAdd VSCode debug configuration for GitHub integration |
+14/-1 |
poetry.tomlConfigure Poetry virtual environment for GitHub integration |
+3/-0 |
pyproject.tomlDefine project dependencies and tools for GitHub integration |
+113/-0 |
sonar-project.propertiesAdd SonarQube configuration for GitHub integration |
+2/-0 |
|
| Documentation | 5 files
.env.exampleProvide example environment configuration for GitHub integration |
+11/-0 |
spec.yamlSpecify GitHub integration features and configurations |
+26/-0 |
CHANGELOG.mdAdd changelog for GitHub integration |
+8/-0 |
CONTRIBUTING.mdAdd contributing guidelines for GitHub integration |
+7/-0 |
README.mdAdd README for GitHub integration |
+7/-0 |
|
| Miscellaneous | 1 files
MakefileAdd Makefile for GitHub integration infrastructure |
+1/-0 |
|
| Additional files | 3 files |
Need help?
Type /help how to ... in the comments thread for any questions about Qodo Merge usage.Check out the documentation for more information.
PR Reviewer Guide 🔍
Here are some key observations to aid the review process:
| ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪ |
| 🧪 No relevant tests |
🔒 Security concerns
Sensitive information exposure: The GitHub token is being printed to logs in integration.py line 51: print(f"Initializing GitHub integration for organization: {self.github_org} {self.github_token}"). This exposes sensitive credentials that could be used to access the GitHub account. The token should never be logged or printed. |
⚡ Recommended focus areas for review
Error Handling
The error handling in _make_request method catches all exceptions generically. This could mask specific API errors that should be handled differently (like authentication issues vs rate limiting).
except Exception as e:
logger.error(f"GitHub API request failed: {str(e)}")
raise
Sensitive Data Exposure
The integration is printing the GitHub token during initialization, which could expose sensitive credentials in logs.
print(f"Initializing GitHub integration for organization: {self.github_org} {self.github_token}")
self.client = GitHubClient(token=self.github_token, org=self.github_org)
Duplicate Code
The resource fetching logic is duplicated between integration.py and main.py files, which could lead to maintenance issues if one is updated but not the other.
@ocean.on_resync()
async def on_resync(kind: str) -> List[Dict[Any, Any]]:
"""Handle resync events for different kinds of resources"""
if not github_client:
raise RuntimeError("GitHub client not initialized")
if kind == "repository":
return await github_client.get_repositories()
elif kind == "pull-request":
all_prs = []
repos = await github_client.get_repositories()
for repo in repos:
prs = await github_client.get_pull_requests(repo["name"])
all_prs.extend(prs)
return all_prs
elif kind == "issue":
all_issues = []
repos = await github_client.get_repositories()
for repo in repos:
issues = await github_client.get_issues(repo["name"])
all_issues.extend(issues)
return all_issues
elif kind == "team":
return await github_client.get_teams()
elif kind == "workflow":
all_workflows = []
repos = await github_client.get_repositories()
for repo in repos:
workflows = await github_client.get_workflows(repo["name"])
# Enrich workflow data with repository information
for workflow in workflows:
workflow["repository"] = repo
if "latest_run" not in workflow:
workflow["latest_run"] = {"status": "unknown"}
all_workflows.extend(workflows)
return all_workflows
return []
|
PR Code Suggestions ✨
Explore these optional code suggestions:
| Category | Suggestion | Impact |
| Security |
Remove sensitive data exposure
Avoid printing sensitive information like tokens in log messages. The GitHub token is being exposed in the log, which is a security risk. Remove the token from the log message.
integrations/github/integration.py [51]
-print(f"Initializing GitHub integration for organization: {self.github_org} {self.github_token}")
+print(f"Initializing GitHub integration for organization: {self.github_org}")
- [ ] Apply this suggestion
Suggestion importance[1-10]: 10
__
Why: Exposing sensitive information like authentication tokens in logs is a critical security vulnerability. This could lead to unauthorized access if logs are exposed or shared.
| High
|
| Possible issue |
✅ Fix duplicate function definition
Suggestion Impact:The commit completely refactored the file, including removing the duplicate on_start() function at the bottom of the file. The commit keeps only one on_start() function at the top of the file.
code diff:
@ocean.on_start()
async def on_start() -> None:
- """Initialize the GitHub client when the integration starts"""
- global github_client
- if not github_token or not github_org:
- raise ValueError("GITHUB_TOKEN and GITHUB_ORGANIZATION environment variables are required")
-
- print(f"Starting GitHub integration for organization: {github_org}")
- github_client = GitHubClient(token=github_token, org=github_org)
+ logger.info("Starting Port Ocean GitHub integration")
[email protected]_resync()
-async def on_resync(kind: str) -> List[Dict[Any, Any]]:
- """Handle resync events for different kinds of resources"""
- if not github_client:
- raise RuntimeError("GitHub client not initialized")
+def init_client() -> GitHubClient:
+ return GitHubClient(
+ token=ocean.integration_config.get_secret("github_token"),
+ org=ocean.integration_config.get("organization")
+ )
- if kind == "repository":
- return await github_client.get_repositories()
-
- elif kind == "pull-request":
- all_prs = []
- repos = await github_client.get_repositories()
- for repo in repos:
- prs = await github_client.get_pull_requests(repo["name"])
- all_prs.extend(prs)
- return all_prs
-
- elif kind == "issue":
- all_issues = []
- repos = await github_client.get_repositories()
- for repo in repos:
- issues = await github_client.get_issues(repo["name"])
- all_issues.extend(issues)
- return all_issues
-
- elif kind == "team":
- return await github_client.get_teams()
-
- elif kind == "workflow":
- all_workflows = []
- repos = await github_client.get_repositories()
- for repo in repos:
- workflows = await github_client.get_workflows(repo["name"])
- # Enrich workflow data with repository information
- for workflow in workflows:
- workflow["repository"] = repo
- if "latest_run" not in workflow:
- workflow["latest_run"] = {"status": "unknown"}
- all_workflows.extend(workflows)
- return all_workflows
[email protected]_resync(ObjectKind.REPOSITORY)
+async def resync_repositories(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
+ """Resync all repositories in the organization."""
+ client = init_client()
+ async for repositories in client.get_repositories():
+ yield repositories
- return []
[email protected]_resync(ObjectKind.PULL_REQUEST)
+async def resync_pull_requests(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
+ """Resync all pull requests from all repositories."""
+ client = init_client()
+ async for repositories in client.get_repositories():
+ tasks = [
+ client.get_pull_requests(repo["name"])
+ for repo in repositories
+ ]
+ async for batch in stream_async_iterators_tasks(*tasks):
+ yield batch
[email protected]_resync(ObjectKind.ISSUE)
+async def resync_issues(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
+ """Resync all issues from all repositories."""
+ client = init_client()
+ async for repositories in client.get_repositories():
+ tasks = [
+ client.get_issues(repo["name"])
+ for repo in repositories
+ ]
+ async for batch in stream_async_iterators_tasks(*tasks):
+ yield batch
-# The same sync logic can be registered for one of the kinds that are available in the mapping in port.
-# @ocean.on_resync('project')
-# async def resync_project(kind: str) -> list[dict[Any, Any]]:
-# # 1. Get all projects from the source system
-# # 2. Return a list of dictionaries with the raw data of the state
-# return [{"some_project_key": "someProjectValue", ...}]
-#
-# @ocean.on_resync('issues')
-# async def resync_issues(kind: str) -> list[dict[Any, Any]]:
-# # 1. Get all issues from the source system
-# # 2. Return a list of dictionaries with the raw data of the state
-# return [{"some_issue_key": "someIssueValue", ...}]
[email protected]_resync(ObjectKind.TEAM)
+async def resync_teams(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
+ """Resync all teams in the organization."""
+ client = init_client()
+ async for teams in client.get_teams():
+ yield teams
[email protected]_resync(ObjectKind.WORKFLOW)
+async def resync_workflows(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
+ """Resync all workflows from all repositories."""
+ client = init_client()
+ async for repositories in client.get_repositories():
+ tasks = []
+ for repo in repositories:
+ async for workflows in client.get_workflows(repo["name"]):
+ # Enrich workflow data with repository information
+ for workflow in workflows:
+ workflow["repository"] = repo
+ runs = await client.get_workflow_runs(repo["name"], workflow["id"], per_page=1)
+ workflow["latest_run"] = runs[0] if runs else {"status": "unknown"}
+ tasks.append(workflows)
+
+ async for batch in stream_async_iterators_tasks(*tasks):
+ yield batch
-# Optional
-# Listen to the start event of the integration. Called once when the integration starts.
[email protected]_start()
-async def on_start() -> None:
- # Something to do when the integration starts
- # For example create a client to query 3rd party services - GitHub, Jira, etc...
- print("Starting github integration")
There are two on_start() functions defined in the file. The second one at the bottom of the file will override the first one, causing the GitHub client initialization to be skipped. Remove the duplicate function or merge their functionality.
integrations/github/main.py [12-20]
@ocean.on_start()
async def on_start() -> None:
"""Initialize the GitHub client when the integration starts"""
global github_client
if not github_token or not github_org:
raise ValueError("GITHUB_TOKEN and GITHUB_ORGANIZATION environment variables are required")
print(f"Starting GitHub integration for organization: {github_org}")
github_client = GitHubClient(token=github_token, org=github_org)
+ print("Starting github integration")
[Suggestion has been applied]
Suggestion importance[1-10]: 9
__
Why: The duplicate on_start() function at the end of the file would override the first implementation, causing the GitHub client initialization to be skipped, which would break the integration's functionality.
| High
|
|
| |
This pull request is automatically being deployed by Amplify Hosting (learn more).
Access this pull request here: https://pr-1507.d1ftd8v2gowp8w.amplifyapp.com