parenchyma icon indicating copy to clipboard operation
parenchyma copied to clipboard

Transfer Matrix

Open drahnr opened this issue 7 years ago • 1 comments

There is the need to handle transfers between devices more easily.

The current attempt to sync from backend to another is not sufficient/does not scale with more backends.

There are two things to think of (and fallback each):

  1. Inter Framework
  2. Fallback to do a Framework A -> Native -> Framework B
  3. Inter Device (if the framework does not handle it, i.e. CUDA afaik)
  4. Fallback to do a Framework A/Device A -> Native -> Framework A/Device B

Note that the matrix is supposedly symmetrical, but the transfer functions are not identical! Read is not write after all.

Note that this allows to scale very quickly, basically if this becomes a bottleneck, special functions can be registered. If not, and host memory is sufficient, this will default.

Note that: To and from Native is obviously always populated.

Note that: Maybe a big framework matrix is best suited, and then, if necessary a inter device matrix within the framework.

drahnr avatar Mar 21 '17 19:03 drahnr

In addition:

The body of the SharedTensor::autosync method contains the logic in question.

@alexandermorozov's original comment:

Backends may define transfers asymmetrically; for example, CUDA may know how to transfer to and from Native backend, while Native may know nothing about CUDA at all. So if the first attempt fails, we change the order and try again.

Removing that would require moving the logic to the Sync implementations, which could increase complexity. Although that's a disadvantage, the advantage of transferring the responsibility to frameworks would make adding other frameworks less of a hassle as the core codebase wouldn't need to be aware of individual frameworks (i.e., transferring from CUDA to OpenCL).

This may be a case of over-engineering, though. Transferring from framework-x to framework-y is rarely, if ever, done.

jonysy avatar Mar 21 '17 20:03 jonysy