feat: add GitHub Pages connector
GitHub Pages connector
Description
This PR introduces a new GitHub Pages connector and integrates it into both the backend and frontend of Onyx.
Test
- ✅ Prettier applied on web files
- ✅ Pre-commit hooks (black, reorder-python-imports, autoflake, ruff, prettier) all passed
- ✅ mypy type checks passed on modified backend files
Demo
Related Issue / Claim
Closes #2282
Creating a GitHub PAT for the GitHub Pages connector
- Generate a fine-grained personal access token.
- Configure:
-
Token name:
Onyx GitHub Pages -
Expiration:
No expiration(recommended for connectors) - Resource owner: user/org that owns the repo
-
Repository access:
All repositories(or select specific repos)
-
Token name:
- Permissions:
-
Contents → Read-only -
Metadata → Read-only
-
- Copy and store the token securely.
Using the token in Onyx
- In the GitHub Pages connector config, paste the PAT into the GitHub access token field.
- Provide:
-
repo_owner(e.g.melmathari) -
repo_name(e.g.GitHub-pages)
-
- Save and validate the connector.
/claim #2282
- [ ] This PR should be backported
- [x] [Optional] Override Linear Check
Summary by cubic
Adds a GitHub Pages connector that indexes HTML/Markdown from a repo’s Pages site via the GitHub API and exposes it as a load-state connector in the app. Implements the flow requested in Linear #2282.
-
New Features
- Backend GitHub Pages connector with checkpointing, rate-limit handling, and credential validation
- Supports gh-pages, configured Pages branch, or default branch; converts repo paths to Pages URLs
- Parses HTML/Markdown using existing file processing utilities; includes title extraction and metadata
- New enum, factory mapping, and Slack icon for DocumentSource.GITHUB_PAGES
-
Frontend
- New connector config with fields: repo_owner, repo_name; advanced option: include_readme
- Uses existing GitHub access token credential template
- Added icon, source metadata, types, and inclusion in load-state and auto-sync sources
Someone is attempting to deploy a commit to the Danswer Team on Vercel.
A member of the Team first needs to authorize it.
@Weves Open to feedback, appreciate you looking into this. I am not sure whether this PR covers all the requirements so I might need some assistance.
@Weves fyi, appreciate your time.
@Weves fyi, appreciate your time.
