Update parents for all activities, not just finished ones
- Tickets addressed: N/A
- Review: By commit
- Merge strategy: Merge (no squash)
Description
@AaronPlave pointed out that unfinished spans always have null parent ids. This is undesirable - we do have all the necessary information to document a span's parent id.
The issue occurs in the PostgresResultsCell::postActivities method, which performs the following actions:
- Insert all spans, without parent ids. Use this step to collect the assigned postgres ids for these spans
- Update the spans that were just inserted to poke in the parent ids
Step 2 was taking as input only the simulatedActivityRecords (read: finished activities), when it should be operating on allActivityRecords.
Verification
No tests were added - I'm open to ideas for how to incorporate this edge case into our tests
Documentation
No documentation has been invalidated.
Future work
- The two-step span insert process seems a bit heavyweight to me - instead of having span ids always be auto-generated, we could allow the merlin worker to specify an id. This would allow us to do a single, self-consistent bulk insert, rather than an insert-then-update
For verification, a test plan with a parent activity should suffice, as long as the plan is long enough to start child (maybe even finish one child and start the other? 🤔). The test would then get the spans and check that the child spans all have their parent_id set correctly.
Regarding the future work, apparently the PK of the spans table is (dataset_id, id), which makes it surprising that we currently generate id. It should be relatively straightforward to drop the generation and have Merlin/the Scheduler decide on the span ids.
Closing in favor of #1443