src-fingerprint
src-fingerprint copied to clipboard
Extract git related information (file shas, commit shas) from your hosted source version control system
src-fingerprint
- Introduction
-
Installation
- Using pre-compiled executables
- Installing from sources
-
Generate My Token
- GitHub
- GitLab
-
Compute my code fingerprints
- GitHub
- GitLab
- Bitbucket server (formely Atlassian Stash)
- Repository
- Performance
- License
Introduction
The purpose of src-fingerprint
is to provide an easy way to extract git related information (namely all file shas of a repository) from your hosted source version control system.
This util's main command is the collect
command used to collect source code fingerprints from a version control system or a local repository. It supports 3 main VCS:
- GitHub and GitHub Enterprise
- Gitlab CE and EE
- Bitbucket
Installation
Using pre-compiled executables
macOS, using Homebrew
If you're using Homebrew you can add GitGuardian's tap and then install src-fingerprint. Just run the following commands:
brew tap gitguardian/tap
brew install src-fingerprint
Linux packages
Deb and RPM packages are available on Cloudsmith.
Setup instructions:
Windows
Open a PowerShell prompt and run this command:
iwr -useb https://raw.githubusercontent.com/GitGuardian/src-fingerprint/main/scripts/windows-installer.ps1 | iex
The script asks for the installation directory. To install silently, use these commands instead:
iwr -useb https://raw.githubusercontent.com/GitGuardian/src-fingerprint/main/scripts/windows-installer.ps1 -Outfile install.ps1
.\install.ps1 C:\Destination\Dir
rm install.ps1
Note that src-fingerprint
requires Unix commands such as bash
to be available, so it runs better from a "Git Bash" prompt.
Manual download
You can also download the archives directly from the releases page.
Installing from sources
You need go
installed and GOBIN
in your PATH
. Once that is done, run the command:
$ go get -u github.com/gitguardian/src-fingerprint/cmd/src-fingerprint
Generate My Token
GitHub
- Click on your profile picture at the top right of the screen. A dropdown menu will appear and you will be able to access your personal settings by clicking on Settings.
- On your profile, go to Developer Settings.
- Select Personal Access Tokens.
- Click on
Generate a new token
. - Click the
repo
box. This is the only scope we need. - Click on
Generate token
. The token will only be available at this time so make sure you keep it in a safe place.
GitLab
- Click on your profile picture at the top right of the screen. A dropdown menu will appear and you will be able to access your personal settings by clicking on Preferences.
- In the left sidebar, click on
Access Tokens
. - Click the
read_api
box. This is the only scope we need. You can set an end-date for the token validity if you want more security. - Click on
Create personal token
. The token will only be available at this time so make sure you keep it in a safe place.
Collect my code fingerprints
General information
The output format can be chosen between jsonl
, json
, gzip-jsonl
and gzip-json
with the option --export-format
.
The default format is gzip-jsonl
to minimize the size of the output file.
The default output filepath is ./fingerprints.jsonl.gz
. Use --output
to override this behavior.
Also, note that if you were to download fingerprints for repositories of a big organization, src-fingerprint
has a limit to process no more than 100
repositories. You can override this limit with the option --limit
, a limit of 0 will process all repos of the organization.
Note that if multiple organizations are passed, the limit is applied to each one independently.
There is no default timeout, it can be set with the option --timeout
. Similarly to the limit, it is applied to each source independently.
Sample output
Here is an example of some lines of a .jsonl
format output:
{"repository_name":"src-fingerprint","private":false,"sha":"a0c16efce5e767f04ba0c6988d121147099a17df","type":"blob","filepath":".env.example","size":"31"}
{"repository_name":"src-fingerprint","private":false,"sha":"d425eb0f8af66203dbeef50c921ea5bff0f2acba","type":"blob","filepath":".github/workflows/tag.yml","size":"882"}
{"repository_name":"src-fingerprint","private":false,"sha":"c7f341033d78474b125dd56d8adaa3f0fc47faf2","type":"blob","filepath":".github/workflows/test.yml","size":"899"}
{"repository_name":"src-fingerprint","private":false,"sha":"f4409d88950abd4585d8938571864726533a7fa5","type":"blob","filepath":".gitignore","size":"356"}
{"repository_name":"src-fingerprint","private":false,"sha":"f733f951ace2e032c270d2f3cf79c2efb8187b5b","type":"blob","filepath":".gitlab-ci.yml","size":"85"}
{"repository_name":"src-fingerprint","private":false,"sha":"d17ae66a017477bc65a2f433bf23d551ffc6bd75","type":"blob","filepath":".golangci.yml","size":"1196"}
{"repository_name":"src-fingerprint","private":false,"sha":"ee08a617cfb1c63c1c55fa4cb15e8bac0095346f","type":"blob","filepath":".goreleaser.yml","size":"2127"}
Default behavior
Note that by default, src-fingerprint
will exclude forked repositories from the fingerprints computation. For GitHub provider archived repositories and public repositories will also be excluded by default. Use flags --include-forked-repos
, --include-archived-repos
or include-public-repos
to change this behavior.
For all the following examples, we assume that the user is able to clone repositories using an HTTP URL with basic authentication. If for any reason this is not possible with the user's organization, src-fingerprint
supports ssh cloning by using the dedicated option --ssh-cloning
. Note though that this option is not the standard configuration of the tool but rather a workaround for this type of edge case. Especially, this option may bring some issues in the event of discrepancies in permissions between the token provided for API-based repos listing, and the SSH keys used to clone these repos.
GitHub
- Export all fingerprints from private repositories from GitHub Orgs to the default path
./fingerprints.jsonl.gz
with logs:
env VCS_TOKEN="<token>" src-fingerprint -v collect --provider github --object ORG_1_NAME --object ORG_2_NAME
- Export all fingerprints of every repository the user can access to the default path
./fingerprints.jsonl.gz
:
env VCS_TOKEN="<token>" src-fingerprint -v collect --provider github --include-public-repos --include-forked-repos --include-archived-repos
GitLab
- Export all fingerprints from private repositories of a GitLab group to the default path
./fingerprints.jsonl.gz
with logs:
Note : If you are targeting a self-hosted GitLab instance, use the--provider-url
to specify its url, don't forget to include the scheme.
env VCS_TOKEN="<token>" src-fingerprint -v collect --provider gitlab --object "GitGuardian-dev-group"
- Export all fingerprints of every project the user can access to the default path
./fingerprints.jsonl.gz
with logs:
env VCS_TOKEN="<token>" src-fingerprint -v collect --provider gitlab --include-forked-repos
Bitbucket server (formerly Atlassian Stash)
- Export all fingerprints from a Bitbucket project with private repository to the default path
./fingerprints.jsonl.gz
with logs:
Note : If you are targeting a self-hosted BitBucket instance, use the--provider-url
to specify its url, don't forget to include the scheme.
env VCS_TOKEN="<token>" src-fingerprint -v collect --provider bitbucket --object "GitGuardian Project"
- Export all fingerprints of every repository the user can access to the default path
./fingerprints.jsonl.gz
with logs:
env VCS_TOKEN="<token>" src-fingerprint -v collect --provider bitbucket
Repository
Allows the processing of a single repository given a git clone URL
- ssh cloning
src-fingerprint collect -p repository -u '[email protected]:GitGuardian/gg-shield.git'
- http cloning with basic authentication
src-fingerprint collect -p repository -u 'https://user:[email protected]/GitGuardian/gg-shield.git'
- http cloning without basic authentication
src-fingerprint collect -p repository -u 'https://github.com/GitGuardian/gg-shield.git'
- repository in multiple local directories
src-fingerprint collect -p repository -u /projects/gitlab/src-fingerprint -u /projects/gitlab/internal-api
- repository in current directory
src-fingerprint collect -p repository -u .
Performance and memory usage
src-fingerprint
will by default process each object (--object
/-u
) one by one. When an object (ie: a GitHub Organization)
contains multiple repositories, they are processed in parallel by multiple cloners, the number of cloners is configurable
with --cloners
. Adding more cloners will increase the memory usage of src-fingerprint
. When extracting fingerprints
from multiple sources (e.g. with multiple --object values), you can use the option --pool
to configure the number of
workers that will process the objects in parallel. Each worker will have --cloners
cloners. Be cautious when increasing
both --cloners
and --pool
, the memory usage may increase drastically.
License
GitGuardian src-fingerprint
is MIT licensed.