xla Full codegen addcmul

We're seeing some issues when we're trying to full codgen ops that has at::Scalar as part of their inputs like addcmul op. The generated class Addcmul in LazyIr.h looks like:

Addcmul(const torch_xla::XlaValue& self, const torch_xla::XlaValue& tensor1, 
  const torch_xla::XlaValue& tensor2, const torch_xla::XlaValue& value, 
  std::vector<torch::lazy::Shape>&& shapes)

However, the generated MakeNode<Addcmul> call in XlaNativeFunctions.cpp file looks like:

auto node = torch::lazy::MakeNode<Addcmul>(lazy_self->GetIrValue(),
                      lazy_tensor1->GetIrValue(),
                      lazy_tensor2->GetIrValue(),
                      torch::lazy::LazyGraphExecutor::Get()->GetIrValueForScalarFromCodegen(value),
                      std::move(shapes));

And since the GetIrValueForScalarFromCodegen(value) call in XlaNativeFunctions.cpp returns a torch::lazy::Value while class Addcmul in LazyIr.h expects a torch_xla::Value, the build fails with error:

/workspace/pytorch/xla/torch_xla/csrc/generated/XLANativeFunctions.cpp:111:34: note: in instantiation of function template specialization 'torch::lazy::MakeNode<torch_xla::Addcmul, torch_xla::XlaValue, torch_xla::XlaValue, torch_xla::XlaValue, torch::lazy::Value, std::vector<torch::lazy::Shape, std::allocator<torch::lazy::Shape> > >' requested here
        auto node = torch::lazy::MakeNode<Addcmul>(lazy_self->GetIrValue(),
                                 ^
/workspace/pytorch/xla/torch_xla/csrc/generated/LazyIr.h:128:3: note: candidate constructor not viable: no known conversion from 'torch::lazy::Value' to 'const torch_xla::XlaValue' for 4th argument
  Addcmul(const torch_xla::XlaValue& self, const torch_xla::XlaValue& tensor1, const torch_xla::XlaValue& tensor2, const torch_xla::XlaValue& value, std::vector<torch::lazy::Shape>&& shapes)
  ^
/workspace/pytorch/xla/torch_xla/csrc/generated/LazyIr.h:122:7: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 5 were provided
class Addcmul : public XlaNode {
      ^
/workspace/pytorch/xla/torch_xla/csrc/generated/LazyIr.h:122:7: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 5 were provided

The upstream codegen for GetIrValueForScalarFromCodegen(value) calls seems to happen here at https://github.com/pytorch/pytorch/blob/master/torchgen/dest/lazy_ir.py#L313. @JackCaoG, should we override the upstream GenLazyNativeFuncDefinition (https://github.com/pytorch/pytorch/blob/master/torchgen/dest/lazy_ir.py#L281) to override this function?

Generated LazyIr.h file:

class Addcmul : public XlaNode {
 public:
  static torch::lazy::OpKind ClassOpKind() {
    return torch::lazy::OpKind(at::aten::addcmul);
  }

  Addcmul(const torch_xla::XlaValue& self, const torch_xla::XlaValue& tensor1, const torch_xla::XlaValue& tensor2, const torch_xla::XlaValue& value, std::vector<torch::lazy::Shape>&& shapes)

      : XlaNode(torch::lazy::OpKind(at::aten::addcmul),
              {self, tensor1, tensor2, value}, std::move(shapes),
              [&]() { return AddcmulOutputShape(self, tensor1, tensor2, value); },
              /* num_outputs */ 1,
              torch::lazy::MHash())
        

  {
    
  }

  std::string ToString() const override {
    std::stringstream ss;
    ss << XlaNode::ToString();
    
    return ss.str();
  }

  torch_xla::XlaOpVector Lower(LoweringContext* loctx) const override;

  
  

};

Generated XLANativeFunctions.cpp:

    at::Tensor XLANativeFunctions::addcmul(const at::Tensor & self, const at::Tensor & tensor1, const at::Tensor & tensor2, const at::Scalar & value) {
        
        XLA_FN_COUNTER("xla::");
        auto common_device = torch_xla::bridge::GetXlaDevice(self, tensor1, tensor2);
        TORCH_INTERNAL_ASSERT(common_device);
        
        torch_xla::XLATensorPtr lazy_self = torch_xla::bridge::GetXlaTensorOrCreateForWrappedNumber(self, *common_device);
        torch_xla::XLATensorPtr lazy_tensor1 = torch_xla::bridge::GetXlaTensorOrCreateForWrappedNumber(tensor1, *common_device);
        torch_xla::XLATensorPtr lazy_tensor2 = torch_xla::bridge::GetXlaTensorOrCreateForWrappedNumber(tensor2, *common_device);
        auto out_meta = at::meta::addcmul(self, tensor1, tensor2, value);
        std::vector<torch::lazy::Shape> shapes{
        torch::lazy::Shape(out_meta.scalar_type(), out_meta.sizes().vec())};
        TORCH_INTERNAL_ASSERT(shapes.size() == 1);
        if(torch::lazy::symbolicShapeEnabled()){
            std::vector<torch::jit::IValue> inputs = { self, tensor1, tensor2, value };
            char* schema_str = "aten::addcmul(Tensor self, Tensor tensor1, Tensor tensor2, *, Scalar value=1) -> Tensor";
            applySymbolicShapesOnLT(schema_str, inputs, shapes);
        }
        
        auto node = torch::lazy::MakeNode<Addcmul>(lazy_self->GetIrValue(),
                              lazy_tensor1->GetIrValue(),
                              lazy_tensor2->GetIrValue(),
                              torch::lazy::LazyGraphExecutor::Get()->GetIrValueForScalarFromCodegen(value),
                                                                                      std::move(shapes));
        auto result = torch_xla::bridge::AtenFromXlaTensor(
                torch_xla::XLATensor::Create(std::move(node), *common_device));
        return result;
    };

May 13 '22 23:05 wonjoo-wj

yea, I would say we can modify the upstream code gen to call IrValueFromScalar instead of torch::lazy::LazyGraphExecutor::Get()->GetIrValueForScalarFromCodegen for now. You can open a pr to pytorch after testing you change locally.

FYI @wconstab. I am slightly worry about this because XlaValue is something we need to keep as long as we keep xla::shape in our IR. I am not sure if it will introduce more issue on phase 3 migration. I want to do a more detail test after migration is more complete to see if I can safely drop xla::shape from IR so code base will be much cleaner.

May 13 '22 23:05 JackCaoG

Yea, it should be ok to modify the codegen in the short term to call xla's function for generating the IR for a scalar value.

Longer term.. i wonder if there is a way to make it so you can use lazy::Value and when you try to access .shape() field (expecting xla shape), you use a helper function that can

get the LazyIr node from the lazy::Value
dynamic-cast it to xla::Node
return the xla::shape

this is a bit ugly but it might not be that bad if the alternative is making a lot of changes to keep xla::value around in codegen?

May 14 '22 00:05 wconstab

@wonjoolee95 can comment, functionality wise this is the same as we do in https://github.com/pytorch/xla/blob/56a52d53359b0862d0ada8d49c4e3dc52ff75d81/torch_xla/csrc/ir.cpp#L87-L90

I remembered there were more than 50 places where we call value.xla_shape directly. I guess the laternaive is we call GetXlaShape(value). It might not be as bad as I thought but will help us not to worry about XlaValue in the future. @wonjoolee95 wdyt?

May 14 '22 00:05 JackCaoG

Thanks for the ideas, I'll spend some time to see if we can call GetXlaShape(value) throughout our code and drop XlaValue if possible -- this seems like a cleaner solution going forward. If that seems too heavy for now, we can go for the shorter term solution of modifying the upstream codegen. I'll spend some time on this and post updates here.

May 16 '22 06:05 wonjoo-wj

Couple of tests failing with: RuntimeError: Lazy tensor backend not registered. This error seems to be thrown here -- https://github.com/pytorch/pytorch/blob/master/torch/csrc/lazy/backend/backend_interface.cpp#L16. I can reproduce the error locally through python idle.

May 20 '22 20:05 wonjoo-wj

it is caused by torch::lazy::LazyGraphExecutor::Get()->GetIrValueForScalarFromCodegen. I might be able to enable that after https://github.com/pytorch/xla/pull/3571. Can you work on other node that does not involve scalar for now.

May 20 '22 20:05 JackCaoG

I think we can close this one in favor of https://github.com/pytorch/xla/pull/3768

Aug 09 '22 00:08 JackCaoG

xla xla copied to clipboard

Full codegen addcmul

xla
xla copied to clipboard