guac
guac copied to clipboard
[feature] Continuous, unattended Google Cloud Storage collection
Hi,
In this PR https://github.com/guacsec/guac/pull/989, we exposed the GCS collector via the guacone CLI, this means that an user can on-demand collect SBOMs and other pieces of metadata form a GCS bucket.
This issue is about being able to configure such process but in such as way that is run periodically and unattended.
Describe the solution you'd like
I want to be able to configure Guac with tuples of bucket + credentials that the system could use to fetch periodically data from those data sources.
Describe alternatives you've considered
I've considered using guacone itself with a cron-like daemon, but I wanted to explore if this could become a first-class feature, since some of the foundations seems to be there (oci+git datasources)
Additional context
Our goal is to allow Chainloop users to be able to send SBOMs end to end automatically.
The first leg of the journey (CI -> GCS bucket) is fully automated but the last leg (GCS -> Guac) requires manual intervention via guacone collect #989. And it is this last leg what we want to automate too.
Note: it might be possible that this feature might exist already and I am just not able to figure out how to configure it.
Thanks!
Refs https://github.com/chainloop-dev/chainloop/issues/209
Ah yes - we have collectors that can run as daemons - which I believe should do exactly what you're asking for.
We have this being done for files, would something like this work? https://github.com/guacsec/guac/blob/main/cmd/guaccollect/cmd/files.go
$ bin/guaccollect files --help
take a folder of files and create a GUAC graph utilizing Nats pubsub
Usage:
guaccollect files [flags] file_path
Flags:
-h, --help help for files
Global Flags:
--csub-addr string address to connect to collect-sub service (default "localhost:2782")
--nats-addr string address to connect to NATs Server (default "nats://127.0.0.1:4222")
--service-poll sets the collector or certifier to polling mode (default true)
--use-csub use collectsub server for datasource (default true)
The only one caveat about this (for now) is there's a current known issue for large document files #731, which I am currently working on in the coming weeks.
+1 to @lumjjb, the GCS (and all the other collectors) are already set up to do polling to fetch periodically.