taco
taco copied to clipboard
need docs for .assemble() scheduling directive
The .assemble()
scheduling directive was added fairly recently, it seems to control the mechanism by which values are written to sparse output tensors.
This directive isn't mentioned on the scheduling page of the website, and it isn't mentioned in the taco -help=scheduling
text either. If someone could write something for the web docs, I'd be happy to update the command line tool accordingly.
I also see that the web tool supports this scheduling directive, though it omits the separately_schedulable
flag.
I played with this directive a bit, and it seems to work. I managed to get a "Precondition failed: Ungrouped insertion not support for output tensors that are scattered into" error in some cases. I didn't fully understand that, but I managed to solve it with a reorder.
It would be good if we could describe:
- what does it do?
- how does it affect performance?
- how does it affect parallelism?
- what are the restrictions on its use?
- is this more useful for some sparse formats than others, or some styles of parallelism than others?
- can we provide a practical example or two?
- what does the
separately_schedulable
flag mean?