dace icon indicating copy to clipboard operation
dace copied to clipboard

Fix stream allocation scoping

Open komplexon3 opened this issue 3 years ago • 9 comments

There was an issue where streams would be allocated globally (to a state) and locally. This should not happen. The expected behaviour is to never allocate streams locally to a scope.

This PR fixes this issue by never including streams in the scope transient analysis.

komplexon3 avatar Jun 28 '21 06:06 komplexon3

Codecov Report

Merging #766 (b6e1c96) into master (ea8aeed) will decrease coverage by 21.66%. The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #766       +/-   ##
===========================================
- Coverage   80.49%   58.83%   -21.66%     
===========================================
  Files         222      224        +2     
  Lines       39796    40707      +911     
===========================================
- Hits        32031    23949     -8082     
- Misses       7765    16758     +8993     
Impacted Files Coverage Δ
dace/codegen/targets/fpga.py 13.96% <ø> (-55.28%) :arrow_down:
dace/sdfg/utils.py 66.02% <100.00%> (-17.45%) :arrow_down:
dace/libraries/pblas/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
dace/libraries/pblas/nodes/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
dace/libraries/pblas/environments/__init__.py 0.00% <0.00%> (-100.00%) :arrow_down:
dace/libraries/pblas/nodes/pgeadd.py 0.00% <0.00%> (-94.81%) :arrow_down:
dace/frontend/octave/ast_expression.py 11.39% <0.00%> (-82.67%) :arrow_down:
dace/transformation/interstate/transient_reuse.py 12.80% <0.00%> (-81.60%) :arrow_down:
dace/transformation/testing.py 0.00% <0.00%> (-79.66%) :arrow_down:
dace/codegen/targets/sve/unparse.py 12.24% <0.00%> (-76.87%) :arrow_down:
... and 147 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ea8aeed...b6e1c96. Read the comment docs.

codecov[bot] avatar Jun 28 '21 06:06 codecov[bot]

I think the approach is still not the right one. What we want is to make sure that FPGA streams are always allocated in exactly one place, which is at the beginning of kernel generation. My intuition would be that we have to:

  • Make sure that all streams are allocated when generating the FPGA kernel.
  • Never allocate local streams when inside an FPGA kernel.

definelicht avatar Jun 28 '21 08:06 definelicht

Okay - I found something interesting. The place where the stream allocation "would" be suppressed [here],(https://github.com/spcl/dace/blob/62ae0f0a43352af1c88dea4d457dac392203e790/dace/codegen/targets/fpga.py#L703) it is explicitly not suppressed because of something with the Intel FPGA backend. Can someone explain to me why this is?

komplexon3 avatar Jun 28 '21 11:06 komplexon3

Okay - I found something interesting. The place where the stream allocation "would" be suppressed [here],(

https://github.com/spcl/dace/blob/62ae0f0a43352af1c88dea4d457dac392203e790/dace/codegen/targets/fpga.py#L703

) it is explicitly not suppressed because of something with the Intel FPGA backend. Can someone explain to me why this is?

Honestly I already find it a bit weird that we are suppressing these allocations here. Have you checked the callsite of where allocate_array is being called from in the case that is causing problems for you? Could you post the call stack here?

definelicht avatar Jun 28 '21 11:06 definelicht

I think the approach is still not the right one. What we want is to make sure that FPGA streams are always allocated in exactly one place, which is at the beginning of kernel generation. My intuition would be that we have to:

  • Make sure that all streams are allocated when generating the FPGA kernel.
  • Never allocate local streams when inside an FPGA kernel.

exactly this. This PR should be changed to focus on making sure the FPGA codegen allocates all streams in one place.

tbennun avatar Jun 28 '21 11:06 tbennun

@definelicht This would be the stack trace (made it throw an exception if an already defined stream reaches this bit of code).

It is called by dispatch_allocate. That's the same code allocation "top-level local" data.

Traceback (most recent call last):
  File "tests/stream_allocation_scope_test.py", line 98, in <module>
    test_stream_allocation_scope()
  File "tests/stream_allocation_scope_test.py", line 91, in test_stream_allocation_scope
    assert sdfg.generate_code()[2].clean_code.count("dace::SetNames(AtoB") == 1
  File "/home/widmmarc/k3-dace/dace/sdfg/sdfg.py", line 2249, in generate_code
    program_code = codegen.generate_code(sdfg)
  File "/home/widmmarc/k3-dace/dace/codegen/codegen.py", line 178, in generate_code
    used_environments) = frame.generate_code(sdfg, None)
  File "/home/widmmarc/k3-dace/dace/codegen/targets/framecode.py", line 552, in generate_code
    states_generated = self.generate_states(sdfg, global_stream,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/framecode.py", line 463, in generate_states
    cft.as_cpp(self.dispatcher.defined_vars, sdfg.symbols), sdfg)
  File "/home/widmmarc/k3-dace/dace/codegen/control_flow.py", line 195, in as_cpp
    expr += elem.as_cpp(defined_vars, symbols)
  File "/home/widmmarc/k3-dace/dace/codegen/control_flow.py", line 126, in as_cpp
    expr += self.dispatch_state(self.state)
  File "/home/widmmarc/k3-dace/dace/codegen/targets/framecode.py", line 440, in dispatch_state
    self._dispatcher.dispatch_state(sdfg, state, global_stream, stream)
  File "/home/widmmarc/k3-dace/dace/codegen/dispatcher.py", line 310, in dispatch_state
    satisfied_dispatchers[0].generate_state(sdfg, state,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 211, in generate_state
    self.generate_kernel(sdfg, state, kernel_name, subgraphs,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 1399, in generate_kernel
    self.generate_kernel_internal(sdfg, state, kernel_name, subgraphs,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/xilinx.py", line 798, in generate_kernel_internal
    self.generate_modules(sdfg, state, kernel_name, subgraphs,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 1450, in generate_modules
    self.generate_module(sdfg, state, module_name, subgraph,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/xilinx.py", line 732, in generate_module
    self._dispatcher.dispatch_subgraph(sdfg,
  File "/home/widmmarc/k3-dace/dace/codegen/dispatcher.py", line 351, in dispatch_subgraph
    self.dispatch_scope(v.map.schedule, sdfg, scope_subgraph,
  File "/home/widmmarc/k3-dace/dace/codegen/dispatcher.py", line 403, in dispatch_scope
    self._map_dispatchers[map_schedule].generate_scope(
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 510, in generate_scope
    self.generate_node(sdfg, dfg_scope, state_id,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 1043, in generate_node
    getattr(self, method_name)(sdfg, dfg, state_id, node,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 1345, in _generate_MapEntry
    self._dispatcher.dispatch_allocate(sdfg, dfg, state_id, child, None,
  File "/home/widmmarc/k3-dace/dace/codegen/dispatcher.py", line 424, in dispatch_allocate
    dispatcher.allocate_array(sdfg, dfg, state_id, node, function_stream,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 543, in allocate_array
    raise Exception

I will also need to ensure that all streams are allocated once (at the top-level). At the moment, only data shared between two or more subgraphs are allocated "top-level locally". (Missed that bit before.) This should be rather straightforward - just look for all streams in the subgraphs and allocate them when it allocates the data shared between subgraphs.

komplexon3 avatar Jun 28 '21 12:06 komplexon3

As a first step, make sure that all these streams are allocated where we want them to be allocated (i.e., at the beginning of generating an FPGA kernel). Then we can see if we need to prevent them from being reallocated later, but we might not need to, because of that code you linked.

definelicht avatar Jun 28 '21 13:06 definelicht

Now all streams are allocated "top-level locally". The FPGA codegen now allocates data at that level if it shared between two subgraphs (status-quo) or if it is a stream.

There is still the issue with them being declared locally due to that explicit check mentioned before.https://github.com/spcl/dace/blob/62ae0f0a43352af1c88dea4d457dac392203e790/dace/codegen/targets/fpga.py#L703 @TizianoDeMatteis It was added by this commit. Could you explain to me why this check is needed? I am not familiar with the concept of views in DaCe.

komplexon3 avatar Jul 06 '21 18:07 komplexon3

@TizianoDeMatteis It was added by this commit. Could you explain to me why this check is needed? I am not familiar with the concept of views in DaCe.

This is not directly related to views. We need to allocate anyway streams to keep track of their names across the SDFGs hierarchy (a stream may have a certain name in the top-level scope, then another one in a nested SDFG and so on...)

TizianoDeMatteis avatar Jul 07 '21 08:07 TizianoDeMatteis