Provide method for sharing state between steps

Open philidem opened this issue 5 years ago • 1 comments

In the Qualys integration, it was necessary to collect QID values (Qualys ID values) from multiple collectors/steps and then at the end we need to batch fetch all of the Qualys vulnerabilities using those accumulated QID values. That is we need some way to collect QID values across multiple steps and share that state with another step that will actually go fetch the Qualys vulnerabilities.

I'm open to different ideas for how to solve this problem as well. In this specific Qualys example, it seems necessary to accumulate a set of values (to avoid duplicates) and then wait until the end to actually fetch the entities from that set.

May 04 '20 22:05 philidem

from multiple collectors/steps

I see there is just a single step right now. Did you do it that way because there is no way to share data across steps? Looks like await qualysVulnEntityManager.saveVulnerabilities(); is where some collected data is used to produce a final set of vulnerabilities.

It would be really nice just collect the Finding entities, and then in a later step that loads the set of referenced Vulnerabilitys, be able to say the equivalent of SELECT qid FROM entities WHERE _type IN ("qualys_web_app_finding", "qualys_host_finding"), or FIND (qualys_web_app_finding|qualys_host_finding) AS finding RETURN finding.qid. This may take the form of jobState.pickPropertyValues("qid", { _type: ["qualys_web_app_finding", "qualys_host_finding"]}). This would make it clear that we're working from existing data.

Another thing to consider, adding to a collection as the Findings are iterated: jobState.addValueToSet("findingQids", findingEntity.qid), and later jobState.getSet<string[]>("findingQids"), or jobState.iterateSet<string>("findingQids").

In cases where relationship data is not in the entities, or is not an entity in itself but really only needed for building relationships, and is collected through another API - and assuming we have to fetch and cache those association resources and process them in a later step - perhaps we have jobState.addResource("somethingNotEntityOrRelationship", rawData) so that later we can jobState.iterateResources("somethingNotEntityOrRelationship").

FWIW, here is the code used in larger integrations (AWS, Azure) for moving data between steps: https://bitbucket.org/lifeomic/jupiter-managed-integration-sdk/src/master/src/integration/cache/. Check out the types.ts for the interface.

May 05 '20 14:05 aiwilliams