burn Small negative values cause the gradient of `sigmoid` to become `NaN`

Small negative values cause the gradient of `sigmoid` to become `NaN`

Open wcshds opened this issue 1 year ago • 1 comments

Here is the code:

use burn::{
    backend::{Autodiff, NdArray},
    tensor::{activation, Data, Tensor},
};

fn main() {
    let data = Data::<f32, 1>::from([-90.0]);

    let device = Default::default();
    let tensor_1 = Tensor::<Autodiff<NdArray>, 1>::from_data(data, &device).require_grad();

    let tensor_2 = activation::sigmoid(tensor_1.clone());
    let grads = tensor_2.backward();

    let grad_1 = tensor_1.grad(&grads).unwrap();
    println!("{}", grad_1);
}

The result is NaN.

Jan 12 '24 20:01 wcshds

Is it possible to manually define differentials during the activation function backpropagation? So we don't have to automatically differentiate log and exp.

Jan 12 '24 20:01 wcshds

burn burn copied to clipboard

Small negative values cause the gradient of `sigmoid` to become `NaN`

burn
burn copied to clipboard