removing deepcopy where we can
change deepcopy where still needed to custom copy command to increase speed using instead a shallow copy
add some data classes * tsc meeting comment

Jul 07 '23 21:07 guzzijones

The use of 'latest' is causing my tox unit testing to fail it appears.

  orquesta.exceptions.PluginFactoryError: Unable to load plugin orquesta.composers.native. No 'orquesta.composers' driver found, looking for 'native'

I also had to disable python 3.6 for now.

Jul 08 '23 00:07 guzzijones

@m4dcoder can you please take a look at this python 3.8 test failing. It looks like stevedore is not able to find the native, mock plugins to run the tests in python 3.8.

Jul 08 '23 23:07 guzzijones

I did some testing with st2 and this definitely breaks with-items no longer report failure to the parent workflow.

(virtualenv) [vagrant@stackstorm]~/st2/tools$ st2 execution get 64adc15887cff69baae67294
id: 64adc15887cff69baae67294
action.ref: test_pack.test_fail_workflow
parameters:
  number: 5
status: running (635s elapsed)
start_timestamp: Tue, 11 Jul 2023 20:53:44 UTC
end_timestamp:
log:
  - status: requested
    timestamp: '2023-07-11T20:53:44.168000Z'
  - status: scheduled
    timestamp: '2023-07-11T20:53:44.245000Z'
  - status: running
    timestamp: '2023-07-11T20:53:44.317000Z'
result:
  output: null
+--------------------------+------------------------+-----------+---------------------+-------------------------------+
| id                       | status                 | task      | action              | start_timestamp               |
+--------------------------+------------------------+-----------+---------------------+-------------------------------+
| 64adc1581982b0a045f91e70 | succeeded (0s elapsed) | do_action | test_pack.test_fail | Tue, 11 Jul 2023 20:53:44 UTC |
| 64adc1591982b0a045f91e85 | succeeded (0s elapsed) | with_test | test_pack.test_fail | Tue, 11 Jul 2023 20:53:45 UTC |
| 64adc1591982b0a045f91e88 | succeeded (1s elapsed) | with_test | test_pack.test_fail | Tue, 11 Jul 2023 20:53:45 UTC |
| 64adc1591982b0a045f91e8b | failed (1s elapsed)    | with_test | test_pack.test_fail | Tue, 11 Jul 2023 20:53:45 UTC |
| 64adc1591982b0a045f91e8f | succeeded (1s elapsed) | with_test | test_pack.test_fail | Tue, 11 Jul 2023 20:53:45 UTC |
| 64adc1591982b0a045f91e92 | succeeded (1s elapsed) | with_test | test_pack.test_fail | Tue, 11 Jul 2023 20:53:45 UTC |
+--------------------------+------------------------+-----------+---------------------+-------------------------------+

Jul 11 '23 21:07 guzzijones

staged tasks need to be copied or with-items tasks will remain stuck if one sub action fails

Jul 13 '23 00:07 guzzijones

I was actually able to get the json_8mb.json through this workflow with this change along with the changes from the zstandard compression over in the st2 PR queue

st2 run test_pack.test_workflow @json=/home/vagrant/json_8mb.json

action


from st2common.runners.base_action import Action

class Process(Action):
    def run(self, *args, **kwargs):
        return kwargs["json"]

workflow

version: 1.0

tasks:
  do_action:
    action: test_pack.test_action
    input:
      json: <% ctx().json %>
    next:
      - when: <%succeeded()%>
        publish:
          - bar: [1,2,3]
        do:
          - with_test
  with_test:
    with:
      items: <% ctx().bar %>
      concurrency: 1
    action: test_pack.test_action
    input:
      json: <% ctx().json %>
    next:
      - when: <% succeeded() %>
        publish:
          - with_output: <% result().result %>
input:
  - json

output:
  - with_output: <% ctx().with_output %>

output

+--------------------------+-------------------------+-----------+-----------------------+--------------------+
| id                       | status                  | task      | action                | start_timestamp    |
+--------------------------+-------------------------+-----------+-----------------------+--------------------+
| 64af43382c572b699fca6ea5 | succeeded (68s elapsed) | do_action | test_pack.test_action | Thu, 13 Jul 2023   |
|                          |                         |           |                       | 00:20:06 UTC       |
| 64af43d42c572b699fca6eb6 | succeeded (74s elapsed) | with_test | test_pack.test_action | Thu, 13 Jul 2023   |
|                          |                         |           |                       | 00:22:43 UTC       |
| 64af44722c572b699fca6ec7 | succeeded (79s elapsed) | with_test | test_pack.test_action | Thu, 13 Jul 2023   |
|                          |                         |           |                       | 00:25:20 UTC       |
| 64af45122c572b699fca6ed8 | succeeded (68s elapsed) | with_test | test_pack.test_action | Thu, 13 Jul 2023   |
|                          |                         |           |                       | 00:28:01 UTC       |
+--------------------------+-------------------------+-----------+-----------------------+--------------------+

Jul 13 '23 00:07 guzzijones

I was able to get the st2 unit tests to also pass. I had to leave in the deep copy for the errors. I am testing this locally on our dev box.

Jul 28 '23 15:07 guzzijones

Do we have any numbers on how much things are speed up by this change? Ideally, some micro + end to end benchmarks would be great (similar to what I've done for the DB layer changes).

When I was originally working on the database layer performance improvements and I implemented fast_deepcopy(), I also had a quick look if we could utilize it in Orquesta, but I found some blockers and Orquesta wasn't my priority back then so I didn't spend more time digging in.

IIRC, it was something related to complex types used by the state stuff since fast_deepcopy() only supports simple types aka no object instances.

If we can get those sorted out without any issues and the numbers indeed indicate performance improvements, then it would be a great change :)

Aug 09 '23 12:08 Kami

On a related note to my comment above^ - I think it would be good to add a feature flag / config option for this functionality.

This will serve multiple purposes:

We will be able to use feature flag to benchmark both scenarios - regular deepcopy vs fast_deepcopy
In case user encounters issues with fast_deepcopy() which we didn't catch here, they can easily opt-out of this functionality and continue using StackStorm without needing to downgrade or similar.

Aug 09 '23 12:08 Kami

We won't be able to add a feature flag without a change in ST2. Orquesta does not have the Stackstorm config object afaik. I will put some benchmarks in the PR and also make sure we double check that errors is also copied.

Aug 09 '23 18:08 guzzijones

I will put some benchmarks in the PR and also make sure we double check that errors is also copied.

Thanks.

We won't be able to add a feature flag without a change in ST2. Orquesta does not have the Stackstorm config object afaik.

Yeah, I assume it would probably also require a change on the StackStorm side. If you have cycles, I still think that would be a good idea.

Still much easier to flip the feature flag / config option in case something goes wrong versus needing to downgrade to a previous release.

Especially if it's combined with a breaking backward incompatible change in StackStorm core (live action FK change - there is only a forward migration in that PR, but not a backward one).

Aug 10 '23 18:08 Kami

I've been looking into some slowness for workflows with a large amount of items inside of a with items step, I saw this PR and I'm curious what items are we actually looking to have gated by a feature flag in this PR?

Jan 18 '24 04:01 rebrowning

task render speed gain when removing deep copy for large parameters

Jan 23 '24 21:01 guzzijones

I added a micro benchmark to show how much faster it is to NOT do a deep copy. I could add it for the serialize and deserialize, but the results will be the same. Copying large objects is costly.

Jan 23 '24 21:01 guzzijones

all the 3.9 tests passed with these updates minus the lint checks). rpm is here if @rebrowning wants to test it out.

Jan 24 '24 15:01 guzzijones

I am running this additional speed up on our internal fork of St2.
I will report back if I see any issues.

It looks like ci/cd failed because black changed with their latest release.

Jan 31 '24 21:01 guzzijones

@Kami @guzzijones are we requiring the feature flag to switch between deepcopy and fastdeepcopy for this to move forward?

Mar 12 '24 21:03 rebrowning

Note, this will help with "with items" speed issues.

Jul 09 '24 17:07 FileMagic

@nzlosh @cognifloyd please take a peak at this. I adds significant speed improvements by eliminating some deep copies. This helps with larger context objects.

Sep 13 '24 19:09 guzzijones

Anyone willing to give some feedback here?

Nov 22 '24 15:11 guzzijones

Technically I'm OK with this to be merged. As discussed mid 2024, it should be merged as part of st2 3.10. I think we should leave a release between the pymongo updates and the deepcopy change to avoid confusing any issues that may come up in the 3.9 and 3.10 release.

Nov 22 '24 16:11 nzlosh

remove some deepcopy to speed up workflow conductor

action

workflow

output