dace icon indicating copy to clipboard operation
dace copied to clipboard

Greedy fusion breaks loop with nested SDFG

Open mcopik opened this issue 2 years ago • 7 comments

Describe the bug

An application of greedy_fuse breaks one of the loops in CloudSC, as seen in the screenshot below. In this context, ZSINKSUM is both an input and output variable. The variable is correctly initialized but stored in the additional and unnecessary variable __tmp2 that is never passed correctly to the actual function.

Generated code:

DACE_DFI void loop_body_6_1_138_7(double* __restrict__ ZSOLQA, double* __restrict__ ZSINKSUM, int _for_it_73);

{
  double ZSINKSUM_out_1; 
  // Tasklet code (T_l4181_c4181)
  ZSINKSUM_out_1 = 0.0;
  __tmp2[0] = ZSINKSUM_out_1;
}

dace::CopyND<double, 1, false, 1>::template ConstDst<1>::Copy( __tmp2, ZSINKSUM + _for_it_69, 1); 
loop_body_6_1_138_7(&ZSOLQA[0], &__tmp1[0], (_for_it_69 + 1)); 
dace::CopyND<double, 1, false, 1>::template ConstDst<1>::Copy( __tmp1, ZSINKSUM + _for_it_69, 1);

image

To Reproduce

  1. Download the following SDFG: https://polybox.ethz.ch/index.php/s/YC0ugZL7vG73doz
  2. Apply the greedy_fuse from autoopt.
sdfg = dace.SDFG.from_file('CLOUDSCOUTER_unrolled.sdfg')                                                                                                                                                                 
greedy_fuse(sdfg, False)
sdfg.simplify(verbose=True)
  1. Look for the state starting with _state_l4191_c4191.

Desktop (please complete the following information):

  • Python 3.10

mcopik avatar Jun 18 '23 23:06 mcopik

@alexnick83 contributed a patch fixing this issue in #1307. Unfortunately, with the new master, it fails during simplification:

Traceback (most recent call last):
  File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/sdfg/validation.py", line 363, in validate_state
    node.validate(sdfg, state, references, **context)
  File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/sdfg/nodes.py", line 640, in validate
    raise NameError('Data descriptor "%s" not found in nested SDFG connectors' % dname)
NameError: Data descriptor "NSSOPT" not found in nested SDFG connectors

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/generation_gpu_with_new_master/../dace/tests/fortran/cloudsc/generated_sdfgs_new.py", line 517, in <module>
    generate_all_sdfgs(args.specialize, args.stride_transformation, args.target, args.restart_index)
  File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/generation_gpu_with_new_master/../dace/tests/fortran/cloudsc/generated_sdfgs_new.py", line 432, in generate_all_sdfgs
    sdfg = simplify_sdfg(sdfg, specialize)
  File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/generation_gpu_with_new_master/../dace/tests/fortran/cloudsc/generated_sdfgs_new.py", line 143, in simplify_sdfg
    sdfg.simplify(verbose=True)
  File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/sdfg/sdfg.py", line 2471, in simplify
    return SimplifyPass(validate=validate, validate_all=validate_all, verbose=verbose).apply_pass(self, {})
  File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/transformation/passes/simplify.py", line 115, in apply_pass
    sdfg.validate()
  File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/sdfg/sdfg.py", line 2440, in validate
    validate_sdfg(self, references, **context)
  File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/sdfg/validation.py", line 195, in validate_sdfg
    validate_state(edge.dst, sdfg.node_id(edge.dst), sdfg, symbols, initialized_transients, references,
  File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/sdfg/validation.py", line 369, in validate_state
    raise InvalidSDFGNodeError("Node validation failed: " + str(ex), sdfg, state_id, nid) from ex
dace.sdfg.validation.InvalidSDFGNodeError: Node validation failed: Data descriptor "NSSOPT" not found in nested SDFG connectors (at state stateCLOUDSC, node CLOUDSC)
Originating from source code at File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/frontend/fortran/fortran_parser.py", line 666

Here's the link to the SDFG on which the simplify fails: https://polybox.ethz.ch/index.php/s/WElGWZyB4lR8Ig3

The script generating the SDFG can be found here..

mcopik avatar Sep 13 '23 16:09 mcopik

Sorry for the wait.

I have no idea how to reproduce this issue. I used the following testcase:

# Copyright 2019-2021 ETH Zurich and the DaCe authors. All rights reserved.
import dace


def test_bug_1280():
    CLOUDSCOUTER_unoptimized = dace.sdfg.SDFG.from_file("./CLOUDSCOUTER_unoptimized.sdfg")
    CLOUDSCOUTER_unoptimized.simplify()


if __name__ == '__main__':
    test_bug_1280()

I tested this both with the latest master (3e733044a07467878526ecf11db007efdad329cf) and the new_transform_2 branch (73579a7877a7f496e181d6bea87dd1c23bc5bf56).

If I try to use the tests/fortran/cloudsc/generated_sdfgs.py script directly, I get a whole lot of import errors related to fparser (my .venv contains fparser 0.1.3).

BenWeber42 avatar Oct 02 '23 16:10 BenWeber42

@BenWeber42 To run the generated_sdfgs.py script, you can downgrade fparser to 0.1.2 or merge changes from the mater - we updated code to the breaking API changes in this library.

mcopik avatar Oct 02 '23 16:10 mcopik

Downgrading fparser to 0.1.2 allows me to run the generated_sdfgs.py script which seems the fail the same way. I guess I can reproduce the bug this way:

Traceback (most recent call last):
  File "/data/ben/spcl/repos/dace/new_transform_2/dace/sdfg/validation.py", line 378, in validate_state
    node.validate(sdfg, state, references, **context)
  File "/data/ben/spcl/repos/dace/new_transform_2/dace/sdfg/nodes.py", line 640, in validate
    raise NameError('Data descriptor "%s" not found in nested SDFG connectors' % dname)
NameError: Data descriptor "NSSOPT" not found in nested SDFG connectors

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data/ben/spcl/repos/dace/new_transform_2/tests/fortran/cloudsc/generated_sdfgs.py", line 500, in <module>
    generate_all_sdfgs(args.specialize, args.stride_transformation, args.target, args.restart_index)
  File "/data/ben/spcl/repos/dace/new_transform_2/tests/fortran/cloudsc/generated_sdfgs.py", line 431, in generate_all_sdfgs
    sdfg = simplify_sdfg(sdfg, specialize)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ben/spcl/repos/dace/new_transform_2/tests/fortran/cloudsc/generated_sdfgs.py", line 143, in simplify_sdfg
    sdfg.simplify(verbose=True)
  File "/data/ben/spcl/repos/dace/new_transform_2/dace/sdfg/sdfg.py", line 2481, in simplify
    return SimplifyPass(validate=validate, validate_all=validate_all, verbose=verbose).apply_pass(self, {})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ben/spcl/repos/dace/new_transform_2/dace/transformation/passes/simplify.py", line 155, in apply_pass
    sdfg.validate()
  File "/data/ben/spcl/repos/dace/new_transform_2/dace/sdfg/sdfg.py", line 2450, in validate
    validate_sdfg(self, references, **context)
  File "/data/ben/spcl/repos/dace/new_transform_2/dace/sdfg/validation.py", line 195, in validate_sdfg
    validate_state(edge.dst, sdfg.node_id(edge.dst), sdfg, symbols, initialized_transients, references,
  File "/data/ben/spcl/repos/dace/new_transform_2/dace/sdfg/validation.py", line 384, in validate_state
    raise InvalidSDFGNodeError("Node validation failed: " + str(ex), sdfg, state_id, nid) from ex
dace.sdfg.validation.InvalidSDFGNodeError: Node validation failed: Data descriptor "NSSOPT" not found in nested SDFG connectors (at state stateCLOUDSC, node CLOUDSC)
Originating from source code at File "/data/ben/spcl/repos/dace/new_transform_2/dace/frontend/fortran/fortran_parser.py", line 666
Invalid SDFG saved for inspection in /data/ben/spcl/repos/dace/new_transform_2/_dacegraphs/invalid.sdfg

The line numbers don't quite match. But there was a commit since your 2nd comment (73579a7877a7f496e181d6bea87dd1c23bc5bf56) which likely explains the differences.

BenWeber42 avatar Oct 02 '23 16:10 BenWeber42

I looked further into this and I believe the issue occurs earlier. In particular, sdfg validates before this section, but fails validation afterwards (with the same error message):

https://github.com/spcl/dace/blob/73579a7877a7f496e181d6bea87dd1c23bc5bf56/tests/fortran/cloudsc/generated_sdfgs.py#L91-L133

BenWeber42 avatar Oct 03 '23 18:10 BenWeber42

@alexnick83 @acalotoiu Based on the error message posted above, do you think it's likely that we simply fail to replace all instances of NSSOPT? If we do replacement in the code snippet posted above by Ben, shouldn't this symbol disappear from the SDFG?

mcopik avatar Oct 05 '23 02:10 mcopik

This fixes the validation issue after the specialization step: #1398

BenWeber42 avatar Oct 23 '23 10:10 BenWeber42