dace
dace copied to clipboard
Greedy fusion breaks loop with nested SDFG
Describe the bug
An application of greedy_fuse breaks one of the loops in CloudSC, as seen in the screenshot below. In this context, ZSINKSUM is both an input and output variable. The variable is correctly initialized but stored in the additional and unnecessary variable __tmp2 that is never passed correctly to the actual function.
Generated code:
DACE_DFI void loop_body_6_1_138_7(double* __restrict__ ZSOLQA, double* __restrict__ ZSINKSUM, int _for_it_73);
{
double ZSINKSUM_out_1;
// Tasklet code (T_l4181_c4181)
ZSINKSUM_out_1 = 0.0;
__tmp2[0] = ZSINKSUM_out_1;
}
dace::CopyND<double, 1, false, 1>::template ConstDst<1>::Copy( __tmp2, ZSINKSUM + _for_it_69, 1);
loop_body_6_1_138_7(&ZSOLQA[0], &__tmp1[0], (_for_it_69 + 1));
dace::CopyND<double, 1, false, 1>::template ConstDst<1>::Copy( __tmp1, ZSINKSUM + _for_it_69, 1);
To Reproduce
- Download the following SDFG: https://polybox.ethz.ch/index.php/s/YC0ugZL7vG73doz
- Apply the
greedy_fusefrom autoopt.
sdfg = dace.SDFG.from_file('CLOUDSCOUTER_unrolled.sdfg')
greedy_fuse(sdfg, False)
sdfg.simplify(verbose=True)
- Look for the state starting with
_state_l4191_c4191.
Desktop (please complete the following information):
- Python 3.10
@alexnick83 contributed a patch fixing this issue in #1307. Unfortunately, with the new master, it fails during simplification:
Traceback (most recent call last):
File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/sdfg/validation.py", line 363, in validate_state
node.validate(sdfg, state, references, **context)
File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/sdfg/nodes.py", line 640, in validate
raise NameError('Data descriptor "%s" not found in nested SDFG connectors' % dname)
NameError: Data descriptor "NSSOPT" not found in nested SDFG connectors
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/generation_gpu_with_new_master/../dace/tests/fortran/cloudsc/generated_sdfgs_new.py", line 517, in <module>
generate_all_sdfgs(args.specialize, args.stride_transformation, args.target, args.restart_index)
File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/generation_gpu_with_new_master/../dace/tests/fortran/cloudsc/generated_sdfgs_new.py", line 432, in generate_all_sdfgs
sdfg = simplify_sdfg(sdfg, specialize)
File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/generation_gpu_with_new_master/../dace/tests/fortran/cloudsc/generated_sdfgs_new.py", line 143, in simplify_sdfg
sdfg.simplify(verbose=True)
File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/sdfg/sdfg.py", line 2471, in simplify
return SimplifyPass(validate=validate, validate_all=validate_all, verbose=verbose).apply_pass(self, {})
File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/transformation/passes/simplify.py", line 115, in apply_pass
sdfg.validate()
File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/sdfg/sdfg.py", line 2440, in validate
validate_sdfg(self, references, **context)
File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/sdfg/validation.py", line 195, in validate_sdfg
validate_state(edge.dst, sdfg.node_id(edge.dst), sdfg, symbols, initialized_transients, references,
File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/sdfg/validation.py", line 369, in validate_state
raise InvalidSDFGNodeError("Node validation failed: " + str(ex), sdfg, state_id, nid) from ex
dace.sdfg.validation.InvalidSDFGNodeError: Node validation failed: Data descriptor "NSSOPT" not found in nested SDFG connectors (at state stateCLOUDSC, node CLOUDSC)
Originating from source code at File "/users/mcopik/projects/2023/dace_gpu/june_clean_generation/dace/dace/frontend/fortran/fortran_parser.py", line 666
Here's the link to the SDFG on which the simplify fails: https://polybox.ethz.ch/index.php/s/WElGWZyB4lR8Ig3
The script generating the SDFG can be found here..
Sorry for the wait.
I have no idea how to reproduce this issue. I used the following testcase:
# Copyright 2019-2021 ETH Zurich and the DaCe authors. All rights reserved.
import dace
def test_bug_1280():
CLOUDSCOUTER_unoptimized = dace.sdfg.SDFG.from_file("./CLOUDSCOUTER_unoptimized.sdfg")
CLOUDSCOUTER_unoptimized.simplify()
if __name__ == '__main__':
test_bug_1280()
I tested this both with the latest master (3e733044a07467878526ecf11db007efdad329cf) and the new_transform_2 branch (73579a7877a7f496e181d6bea87dd1c23bc5bf56).
If I try to use the tests/fortran/cloudsc/generated_sdfgs.py script directly, I get a whole lot of import errors related to fparser (my .venv contains fparser 0.1.3).
@BenWeber42 To run the generated_sdfgs.py script, you can downgrade fparser to 0.1.2 or merge changes from the mater - we updated code to the breaking API changes in this library.
Downgrading fparser to 0.1.2 allows me to run the generated_sdfgs.py script which seems the fail the same way. I guess I can reproduce the bug this way:
Traceback (most recent call last):
File "/data/ben/spcl/repos/dace/new_transform_2/dace/sdfg/validation.py", line 378, in validate_state
node.validate(sdfg, state, references, **context)
File "/data/ben/spcl/repos/dace/new_transform_2/dace/sdfg/nodes.py", line 640, in validate
raise NameError('Data descriptor "%s" not found in nested SDFG connectors' % dname)
NameError: Data descriptor "NSSOPT" not found in nested SDFG connectors
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/ben/spcl/repos/dace/new_transform_2/tests/fortran/cloudsc/generated_sdfgs.py", line 500, in <module>
generate_all_sdfgs(args.specialize, args.stride_transformation, args.target, args.restart_index)
File "/data/ben/spcl/repos/dace/new_transform_2/tests/fortran/cloudsc/generated_sdfgs.py", line 431, in generate_all_sdfgs
sdfg = simplify_sdfg(sdfg, specialize)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/ben/spcl/repos/dace/new_transform_2/tests/fortran/cloudsc/generated_sdfgs.py", line 143, in simplify_sdfg
sdfg.simplify(verbose=True)
File "/data/ben/spcl/repos/dace/new_transform_2/dace/sdfg/sdfg.py", line 2481, in simplify
return SimplifyPass(validate=validate, validate_all=validate_all, verbose=verbose).apply_pass(self, {})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/ben/spcl/repos/dace/new_transform_2/dace/transformation/passes/simplify.py", line 155, in apply_pass
sdfg.validate()
File "/data/ben/spcl/repos/dace/new_transform_2/dace/sdfg/sdfg.py", line 2450, in validate
validate_sdfg(self, references, **context)
File "/data/ben/spcl/repos/dace/new_transform_2/dace/sdfg/validation.py", line 195, in validate_sdfg
validate_state(edge.dst, sdfg.node_id(edge.dst), sdfg, symbols, initialized_transients, references,
File "/data/ben/spcl/repos/dace/new_transform_2/dace/sdfg/validation.py", line 384, in validate_state
raise InvalidSDFGNodeError("Node validation failed: " + str(ex), sdfg, state_id, nid) from ex
dace.sdfg.validation.InvalidSDFGNodeError: Node validation failed: Data descriptor "NSSOPT" not found in nested SDFG connectors (at state stateCLOUDSC, node CLOUDSC)
Originating from source code at File "/data/ben/spcl/repos/dace/new_transform_2/dace/frontend/fortran/fortran_parser.py", line 666
Invalid SDFG saved for inspection in /data/ben/spcl/repos/dace/new_transform_2/_dacegraphs/invalid.sdfg
The line numbers don't quite match. But there was a commit since your 2nd comment (73579a7877a7f496e181d6bea87dd1c23bc5bf56) which likely explains the differences.
I looked further into this and I believe the issue occurs earlier. In particular, sdfg validates before this section, but fails validation afterwards (with the same error message):
https://github.com/spcl/dace/blob/73579a7877a7f496e181d6bea87dd1c23bc5bf56/tests/fortran/cloudsc/generated_sdfgs.py#L91-L133
@alexnick83 @acalotoiu Based on the error message posted above, do you think it's likely that we simply fail to replace all instances of NSSOPT? If we do replacement in the code snippet posted above by Ben, shouldn't this symbol disappear from the SDFG?
This fixes the validation issue after the specialization step: #1398