burn
burn copied to clipboard
Feat/autotune int bool ops
Pull Request Template
Checklist
WIP
- [ ] Confirmed that
run-checks all
script has been executed. - [ ] Made sure the book is up to date with changes in this PR.
Related Issues/PRs
Provide links to relevant issues and dependent PRs. #944
Changes
Summarize the problem being addressed and your solution. Added int_random() to int tensor ops.
Testing
Describe how these changes have been tested.
@louisfd Finally got the chance to start on #944. I'm going down the long route and trying to add support at large for random()
on int tensors. When you say "add a call to autotune for sum_dim and mean_dim (int versions)", I'm assuming that means creating an impl<E: IntElement, const D: usize> SumDimAutotuneOperationSet<E, D>
and associated AutotuneOperationSet
impl (and similar story for mean_dim)?
Codecov Report
Attention: Patch coverage is 61.05991%
with 169 lines
in your changes are missing coverage. Please review.
Project coverage is 85.53%. Comparing base (
cbf7550
) to head (63455c4
).
Additional details and impacted files
@@ Coverage Diff @@
## main #1136 +/- ##
==========================================
- Coverage 85.69% 85.53% -0.17%
==========================================
Files 586 586
Lines 65339 65769 +430
==========================================
+ Hits 55993 56254 +261
- Misses 9346 9515 +169
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@louisfd Finally got the chance to start on #944. I'm going down the long route and trying to add support at large for
random()
on int tensors. When you say "add a call to autotune for sum_dim and mean_dim (int versions)", I'm assuming that means creating animpl<E: IntElement, const D: usize> SumDimAutotuneOperationSet<E, D>
and associatedAutotuneOperationSet
impl (and similar story for mean_dim)?
Hi @agelas, thanks for tackling this issue!
The only place that was problematic was these lines in burn-wgpu/src/kernel/reduce/tune/sum_dim.rs
fn autotunables(&self) -> Vec<Box<dyn AutotuneOperation>> {
let random_bounds: (E, E) = ((-10.0).elem::<E>(), (10.0).elem::<E>());
let input = random_like_uniform(&self.input, random_bounds.0, random_bounds.1);
because random_like_uniform only worked with floats even if E was an int. I'm not sure if you could just rewrite those lines to make it call your new int ops instead of the lower level random_like_uniform or if you'll have trouble determining if you need to call the float or int version. If that's the case then yes you can duplicate the AutotuneOperationSet, but it should keep the same key.
In the end you will only have to change sum_dim / mean_dim implementations in burn-wgpu/src/ops/int_ops.rs
to something very similar to the float ones:
fn sum_dim<const D: usize>(tensor: FloatTensor<Self, D>, dim: usize) -> FloatTensor<Self, D> {
#[cfg(feature = "autotune")]
{
reduce::sum_dim_autotune(tensor, dim)
}
#[cfg(not(feature = "autotune"))]
{
let output = init_reduce_output(&tensor, dim);
reduce::sum_dim(tensor, output, dim)
}
}
@louisfd I'm gonna take a breather and ask for feedback here because this PR is starting to get on the large size. I think everything is working just about the way it should. I had to unfortunately basically copy the whole AutotuneOperationSet
and associated functions/impls since the WgpuElement vs IntElement trait bound can't be determined at runtime.
Also I'm not sure what's up with the fusion::base::tests::aggregation::tests::test_should_mean_last_dim_int
, it worked on the prior commit and locally and the only thing I've changed is the book docs.
@louisfd Yes sorry, I kept changing the distributions around on the backends for testing purposes and forgot to make them all the same. I think it's nice to have a default just in case, and the flexibility to choose your own distribution is provided anyways without much extra effort . What the bounds should be though I'm not sure. If we mimic the [0,1] behavior of floats like you mentioned it would be pretty easy to port this over to bool tensors so I sort of lean towards that approach.
@louisfd actually on second thought, maybe it would be best for the default distribution to be [0, 255]- I think that's the standard range for 8-bit quantization and would also be good for most image data as well, so that covers a lot of use cases right there.
The casting problem is quite obvious in cast_float
.
This function is used in the prng shader.
fn cast_float(number: u32) -> {{ elem }} {
return 2.3283064365387e-10 * {{ elem }}(number);
}
I think it should be :
fn cast_elem(number: u32) -> {{ elem }} {
let tmp = 2.3283064365387e-10 * f32(number);
return {{ elem }}(tmp);
}
It was assume that {{ elem }}
was a float, but with this PR, it becomes an int i32
. @agelas @louisfd
@nathanielsimard In burn-wgpu/src/kernel/prng/bernoulli.rs
, it looks like there's already a cast_elem
function, so maybe convert_elem
or something similar?
@louisfd Also, for the test that's failing, is there a way to switch to use int_mean_dim(1)
instead of mean_dim(1)
? I think that and the next test in the aggregation.rs
file (test_should_sim_last_dim_int()
) now have int versions that should probably be used.
#[test]
fn test_should_mean_last_dim_int() {
let tensor = TestTensorInt::from([[0, 1, 2], [3, 4, 5]]);
let data_actual = tensor.mean_dim(1).to_data(); // <- use int_mean_dim(1) instead?
assert_eq!(data_actual, Data::from([[1], [4]]));
}
@nathanielsimard In
burn-wgpu/src/kernel/prng/bernoulli.rs
, it looks like there's already acast_elem
function, so maybeconvert_elem
or something similar?@louisfd Also, for the test that's failing, is there a way to switch to use
int_mean_dim(1)
instead ofmean_dim(1)
? I think that and the next test in theaggregation.rs
file (test_should_sim_last_dim_int()
) now have int versions that should probably be used.#[test] fn test_should_mean_last_dim_int() { let tensor = TestTensorInt::from([[0, 1, 2], [3, 4, 5]]); let data_actual = tensor.mean_dim(1).to_data(); // <- use int_mean_dim(1) instead? assert_eq!(data_actual, Data::from([[1], [4]])); }
Hi @agelas Sorry for the late reply.
Regarding the cast functions, since it's getting a bit confusing I think we should be more explicit with the names:
cast_X_to_Y
with X and Y changed to bool, float or elem when it depends.
For your second question, maybe you're confusing tensor methods and backend operations? Because it's an int tensor the mean_dim method will call int_mean_dim underneath.
impl<B: Backend> Numeric<B> for Int {
// ...
fn mean_dim<const D: usize>(tensor: Self::Primitive<D>, dim: usize) -> Self::Primitive<D> {
B::int_mean_dim(tensor, dim)
}
// ...
}
For the CI error, it seems to be the modulo. Are you positive that both random_u32 and range are the same type?
48 │ let random_in_range = (random_u32 % range) + low;
│ ^^^^^^^^^^^^^^^^^^ naga::Expression [93]
Hi @antimora @louisfd , sorry about the delay, life got in the way.
@louisfd w/regards to the naming issue, I changed cast_float
to cast_u32_to_float
like you suggested. For the CI error, it seemed to be with the KernelSettings
, but that's resolved now.
The CI is failing on codecov/path now at 63455c4 (the latest commit), I'm not sure what to do here since a lot of the code here gets used via a somewhat complicated mixture of macros and regular rust,
@agelas, thank you for the update. Do not worry, the life is a priority.
Regarding the coverage - it's not a hard requirement to hit 80%. We can override manually.