guac
guac copied to clipboard
task: [processor] create cmd/processor to collect from collectors
Collectors that obtain documents need somewhere to emit them to. The processor, which is the next part of the pipeline needs to gather the documents and process them..
There are a couple options naturally:
- Processor runs as a gRPC server
- Processor obtains documents from a Pub/Sub queue (e.g. kafka, nats.io, etc.)
- Processor ingests from STDIN or file
- Processor and Collector are part of the same process.
This boils down to we collectors and processors want to be run in the architecture. The ingestor will most likely be tied to the assembler.
Deliberation:
- Will all the collectors be run in a single executable? I.e. the processor will cache duplicate documents so it is beneficial to have an n:m relationship (where n>m) between collectors and executables. If the answer is no, this excludes option 3 and 4.
- I think it is likely that this answer is no, given the access of collectors to need credentials and not a single account/team would have all credentials
- Options 1 and 2 are similar, with a trade-off between simplicity and scale.
@trmiller this may be interesting to you