burn Feat/autotune int bool ops

Pull Request Template

Checklist

WIP

[ ] Confirmed that run-checks all script has been executed.
[ ] Made sure the book is up to date with changes in this PR.

Related Issues/PRs

Provide links to relevant issues and dependent PRs. #944

Changes

Summarize the problem being addressed and your solution. Added int_random() to int tensor ops.

Testing

Describe how these changes have been tested.

Jan 12 '24 01:01 agelas

@louisfd Finally got the chance to start on #944. I'm going down the long route and trying to add support at large for random() on int tensors. When you say "add a call to autotune for sum_dim and mean_dim (int versions)", I'm assuming that means creating an impl<E: IntElement, const D: usize> SumDimAutotuneOperationSet<E, D> and associated AutotuneOperationSet impl (and similar story for mean_dim)?

Jan 12 '24 02:01 agelas

Codecov Report

Attention: Patch coverage is 61.05991% with 169 lines in your changes are missing coverage. Please review.

Project coverage is 85.53%. Comparing base (cbf7550) to head (63455c4).

Files	Patch %	Lines
crates/burn-fusion/src/ops/int.rs	0.00%	36 Missing :warning:
crates/burn-candle/src/ops/int_tensor.rs	0.00%	32 Missing :warning:
crates/burn-wgpu/src/kernel/reduce/tune/sum_dim.rs	59.72%	29 Missing :warning:
crates/burn-tch/src/ops/int_tensor.rs	0.00%	22 Missing :warning:
crates/burn-ndarray/src/ops/int_tensor.rs	0.00%	19 Missing :warning:
crates/burn-autodiff/src/ops/int_tensor.rs	0.00%	7 Missing :warning:
...-wgpu/src/kernel/reduce/reduction_shared_memory.rs	50.00%	7 Missing :warning:
...rates/burn-wgpu/src/kernel/reduce/tune/mean_dim.rs	91.66%	6 Missing :warning:
crates/burn-fusion/src/stream/context.rs	0.00%	5 Missing :warning:
crates/burn-fusion/src/stream/operation.rs	0.00%	3 Missing :warning:
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1136      +/-   ##
==========================================
- Coverage   85.69%   85.53%   -0.17%     
==========================================
  Files         586      586              
  Lines       65339    65769     +430     
==========================================
+ Hits        55993    56254     +261     
- Misses       9346     9515     +169

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jan 12 '24 02:01 codecov[bot]

@louisfd Finally got the chance to start on #944. I'm going down the long route and trying to add support at large for random() on int tensors. When you say "add a call to autotune for sum_dim and mean_dim (int versions)", I'm assuming that means creating an impl<E: IntElement, const D: usize> SumDimAutotuneOperationSet<E, D> and associated AutotuneOperationSet impl (and similar story for mean_dim)?

Hi @agelas, thanks for tackling this issue! The only place that was problematic was these lines in burn-wgpu/src/kernel/reduce/tune/sum_dim.rs

fn autotunables(&self) -> Vec<Box<dyn AutotuneOperation>> {
    let random_bounds: (E, E) = ((-10.0).elem::<E>(), (10.0).elem::<E>());
    let input = random_like_uniform(&self.input, random_bounds.0, random_bounds.1);

because random_like_uniform only worked with floats even if E was an int. I'm not sure if you could just rewrite those lines to make it call your new int ops instead of the lower level random_like_uniform or if you'll have trouble determining if you need to call the float or int version. If that's the case then yes you can duplicate the AutotuneOperationSet, but it should keep the same key.

In the end you will only have to change sum_dim / mean_dim implementations in burn-wgpu/src/ops/int_ops.rs to something very similar to the float ones:

    fn sum_dim<const D: usize>(tensor: FloatTensor<Self, D>, dim: usize) -> FloatTensor<Self, D> {
        #[cfg(feature = "autotune")]
        {
            reduce::sum_dim_autotune(tensor, dim)
        }

        #[cfg(not(feature = "autotune"))]
        {
            let output = init_reduce_output(&tensor, dim);
            reduce::sum_dim(tensor, output, dim)
        }
    }

Jan 12 '24 14:01 louisfd

@louisfd I'm gonna take a breather and ask for feedback here because this PR is starting to get on the large size. I think everything is working just about the way it should. I had to unfortunately basically copy the whole AutotuneOperationSet and associated functions/impls since the WgpuElement vs IntElement trait bound can't be determined at runtime.

Also I'm not sure what's up with the fusion::base::tests::aggregation::tests::test_should_mean_last_dim_int, it worked on the prior commit and locally and the only thing I've changed is the book docs.

Jan 20 '24 21:01 agelas

@louisfd Yes sorry, I kept changing the distributions around on the backends for testing purposes and forgot to make them all the same. I think it's nice to have a default just in case, and the flexibility to choose your own distribution is provided anyways without much extra effort . What the bounds should be though I'm not sure. If we mimic the [0,1] behavior of floats like you mentioned it would be pretty easy to port this over to bool tensors so I sort of lean towards that approach.

Jan 29 '24 19:01 agelas

@louisfd actually on second thought, maybe it would be best for the default distribution to be [0, 255]- I think that's the standard range for 8-bit quantization and would also be good for most image data as well, so that covers a lot of use cases right there.

Jan 29 '24 21:01 agelas

The casting problem is quite obvious in cast_float. This function is used in the prng shader.

fn cast_float(number: u32) -> {{ elem }} {
   return 2.3283064365387e-10 * {{ elem }}(number);
}

I think it should be :

fn cast_elem(number: u32) -> {{ elem }} {
   let tmp = 2.3283064365387e-10 * f32(number);
   return {{ elem }}(tmp);
}

It was assume that {{ elem }} was a float, but with this PR, it becomes an int i32. @agelas @louisfd

Feb 02 '24 14:02 nathanielsimard

@nathanielsimard In burn-wgpu/src/kernel/prng/bernoulli.rs, it looks like there's already a cast_elem function, so maybe convert_elem or something similar?

@louisfd Also, for the test that's failing, is there a way to switch to use int_mean_dim(1) instead of mean_dim(1)? I think that and the next test in the aggregation.rs file (test_should_sim_last_dim_int()) now have int versions that should probably be used.

#[test]
    fn test_should_mean_last_dim_int() {
        let tensor = TestTensorInt::from([[0, 1, 2], [3, 4, 5]]);

        let data_actual = tensor.mean_dim(1).to_data(); // <- use int_mean_dim(1) instead?

        assert_eq!(data_actual, Data::from([[1], [4]]));
    }

Feb 03 '24 00:02 agelas

@nathanielsimard In burn-wgpu/src/kernel/prng/bernoulli.rs, it looks like there's already a cast_elem function, so maybe convert_elem or something similar?

@louisfd Also, for the test that's failing, is there a way to switch to use int_mean_dim(1) instead of mean_dim(1)? I think that and the next test in the aggregation.rs file (test_should_sim_last_dim_int()) now have int versions that should probably be used.
#[test]
    fn test_should_mean_last_dim_int() {
        let tensor = TestTensorInt::from([[0, 1, 2], [3, 4, 5]]);

        let data_actual = tensor.mean_dim(1).to_data(); // <- use int_mean_dim(1) instead?

        assert_eq!(data_actual, Data::from([[1], [4]]));
    }

Hi @agelas Sorry for the late reply.

Regarding the cast functions, since it's getting a bit confusing I think we should be more explicit with the names: cast_X_to_Y with X and Y changed to bool, float or elem when it depends.

For your second question, maybe you're confusing tensor methods and backend operations? Because it's an int tensor the mean_dim method will call int_mean_dim underneath.

impl<B: Backend> Numeric<B> for Int {
    // ...

    fn mean_dim<const D: usize>(tensor: Self::Primitive<D>, dim: usize) -> Self::Primitive<D> {
        B::int_mean_dim(tensor, dim)
    }

    // ...
}

For the CI error, it seems to be the modulo. Are you positive that both random_u32 and range are the same type?

  48 │     let random_in_range = (random_u32 % range) + low;
     │                            ^^^^^^^^^^^^^^^^^^ naga::Expression [93]

Feb 07 '24 17:02 louisfd

Hi @antimora @louisfd , sorry about the delay, life got in the way.

@louisfd w/regards to the naming issue, I changed cast_float to cast_u32_to_float like you suggested. For the CI error, it seemed to be with the KernelSettings, but that's resolved now.

The CI is failing on codecov/path now at 63455c4 (the latest commit), I'm not sure what to do here since a lot of the code here gets used via a somewhat complicated mixture of macros and regular rust,

Feb 25 '24 07:02 agelas

@agelas, thank you for the update. Do not worry, the life is a priority.

Regarding the coverage - it's not a hard requirement to hit 80%. We can override manually.

Feb 25 '24 17:02 antimora

burn burn copied to clipboard

Feat/autotune int bool ops

Pull Request Template

Checklist

Related Issues/PRs

Changes

Testing

Codecov Report

burn
burn copied to clipboard