guac task: [processor] create cmd/processor to collect from collectors

task: [processor] create cmd/processor to collect from collectors

Open lumjjb opened this issue 3 years ago • 1 comments

Collectors that obtain documents need somewhere to emit them to. The processor, which is the next part of the pipeline needs to gather the documents and process them..

There are a couple options naturally:

Processor runs as a gRPC server
Processor obtains documents from a Pub/Sub queue (e.g. kafka, nats.io, etc.)
Processor ingests from STDIN or file
Processor and Collector are part of the same process.

This boils down to we collectors and processors want to be run in the architecture. The ingestor will most likely be tied to the assembler.

Deliberation:

Will all the collectors be run in a single executable? I.e. the processor will cache duplicate documents so it is beneficial to have an n:m relationship (where n>m) between collectors and executables. If the answer is no, this excludes option 3 and 4.
- I think it is likely that this answer is no, given the access of collectors to need credentials and not a single account/team would have all credentials
Options 1 and 2 are similar, with a trade-off between simplicity and scale.

Aug 25 '22 12:08 lumjjb

@trmiller this may be interesting to you

Aug 25 '22 16:08 lumjjb

guac guac copied to clipboard

task: [processor] create cmd/processor to collect from collectors

guac
guac copied to clipboard