pytorch_scatter
pytorch_scatter copied to clipboard
[DO NOT REVIEW] Testing integration
[wip] Proof of concept that torch ops in nightly will pass tests + identify gaps in what's in core
Gaps with no workarounds (fixes are wip):
- [x] integer support
- [ ] ~~missing arg_* variants in core (argmin, argmax) (this is not exposed by the torch_scatter documentation)~~
- [ ] no support for segment_coo yet (which does not seem to be used by pytorch_scatter/pytorch_geometric)
Gaps that have workarounds:
- [ ] when using torch.scatter/torch.segment_reduce, inplace operations in composite ops break autograd, see softmax for example, so we need to do many wasteful copies we could support logsumexp/softmax as options in the kernel to avoid this
- [ ] empty segments are filled with reduction inits in core (current workaround is using torch.nan_to_num)
- this is only a good workaround assuming that the data itself has no nans/infs/-infs, or that even if these exist in the data overwriting them is ok
- [ ] [easy] remove data.numel() > 0 check in core for segment_reduce (will be part of porting SegmentReduce to TensorIterator)
Next steps:
-
[x] run benchmarking script before and after PR rendered benchmarking results [cuda after] [cuda before from 303] [cuda speedups]
- [x] ~~Currently blocked on this as running the benchmarking script before the PR is leading to an illegal CUDA memory access (for options other than mean and sum which call into core ops on master)~~ #303
Findings:
- Our backward implementation in core is slow, this might be because the gradient formulas in core are a bit different from those here
- we evenly distribute the gradient as in
torch.min
/torch.max
when multiple elements correspond to the max - we handle the case for prod when an element is 0 and gradient * result / out will not accurately propagate the gradient for 0s
- [ ] more investigation is needed regarding which parts of the backward function are the bottlenecks, writing a standalone backward kernel was a potential next step that could speed this up
- we evenly distribute the gradient as in
-
include_self
flag in core is slow, this makes sense because wheninclude_self=False
there are 2 scatters being done- [ ] This needs to be wrapped into the kernel in core
Codecov Report
Merging #301 (4bd199f) into master (847995d) will increase coverage by
0.43%
. The diff coverage is98.73%
.
@@ Coverage Diff @@
## master #301 +/- ##
==========================================
+ Coverage 97.54% 97.98% +0.43%
==========================================
Files 9 9
Lines 204 248 +44
==========================================
+ Hits 199 243 +44
Misses 5 5
Impacted Files | Coverage Δ | |
---|---|---|
torch_scatter/utils.py | 95.65% <91.66%> (-4.35%) |
:arrow_down: |
torch_scatter/composite/logsumexp.py | 90.90% <100.00%> (ø) |
|
torch_scatter/composite/softmax.py | 100.00% <100.00%> (ø) |
|
torch_scatter/scatter.py | 100.00% <100.00%> (+2.00%) |
:arrow_up: |
torch_scatter/segment_csr.py | 100.00% <100.00%> (ø) |
:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more
This pull request had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity.