Jacob Hucker
Jacob Hucker
Unfortunately my C++ experience is lacking, so it will be unlikely that I will be able to contribute towards autocxx any time soon. Totally see the worth of moving away...
Regarding the attention layers, I had an attempt at using `f_internal_native_multi_head_attention` following the provided links. However doing so raises the following error upon calling backwards: ```Torch("derivative for aten::_native_multi_head_attention is not...
Gotcha. Tried a higher level func `internal_transformer_encoder_layer_fwd` and got the same error. Thanks for the useful links above, would have never figured this out without those. Will continue with the...