sdk icon indicating copy to clipboard operation
sdk copied to clipboard

[dont merge to main yet] Don't flush after step completes

Open Gonzalo-Avalos-Ribas opened this issue 2 years ago • 1 comments

The motivation for this PR is to avoid flushing every time we complete a step. There should only be two times we flush to disk/upload:

  1. When we reach the graphObjectBufferThresholdInBytes of data in memory
  2. All steps are done.

Also, I don't think is optimal to divide by step the data we update, that would generate way more unnecessary uploads.

Tried it on dev, on instance jupiterone-integration-dev: We went from 432 uploads in a single job, to 3.

Gonzalo-Avalos-Ribas avatar Oct 05 '23 21:10 Gonzalo-Avalos-Ribas

Looks good to me. Let's make an alpha version.

zemberdotnet avatar Dec 21 '23 15:12 zemberdotnet

@Gonzalo-Avalos-Ribas I think a lot of our changes in this were made to figure out which _types we should mark as partial if an upload fails. What if we just iterated over the entities/relationships after an upload fails and mark those as partial instead of trying to infer it from the step. I think it might simplify some of the other code where we could get rid of things like stepsInvolvedInUploads

zemberdotnet avatar Jun 18 '24 18:06 zemberdotnet

@Gonzalo-Avalos-Ribas I think a lot of our changes in this were made to figure out which _types we should mark as partial if an upload fails. What if we just iterated over the entities/relationships after an upload fails and mark those as partial instead of trying to infer it from the step. I think it might simplify some of the other code where we could get rid of things like stepsInvolvedInUploads

@zemberdotnet Actually the changes are for failing the steps that have graphObjects in the uploads that fail - But marking them as partials could also work. How do the steps show in the event logs if they are marked as partial? How can we mark an entity as partial during the execution?

Gonzalo-Avalos-Ribas avatar Jun 19 '24 14:06 Gonzalo-Avalos-Ribas