xla
xla copied to clipboard
Full codegen addcmul
We're seeing some issues when we're trying to full codgen ops that has at::Scalar
as part of their inputs like addcmul
op. The generated class Addcmul
in LazyIr.h
looks like:
Addcmul(const torch_xla::XlaValue& self, const torch_xla::XlaValue& tensor1,
const torch_xla::XlaValue& tensor2, const torch_xla::XlaValue& value,
std::vector<torch::lazy::Shape>&& shapes)
However, the generated MakeNode<Addcmul>
call in XlaNativeFunctions.cpp
file looks like:
auto node = torch::lazy::MakeNode<Addcmul>(lazy_self->GetIrValue(),
lazy_tensor1->GetIrValue(),
lazy_tensor2->GetIrValue(),
torch::lazy::LazyGraphExecutor::Get()->GetIrValueForScalarFromCodegen(value),
std::move(shapes));
And since the GetIrValueForScalarFromCodegen(value)
call in XlaNativeFunctions.cpp
returns a torch::lazy::Value
while class Addcmul
in LazyIr.h
expects a torch_xla::Value
, the build fails with error:
/workspace/pytorch/xla/torch_xla/csrc/generated/XLANativeFunctions.cpp:111:34: note: in instantiation of function template specialization 'torch::lazy::MakeNode<torch_xla::Addcmul, torch_xla::XlaValue, torch_xla::XlaValue, torch_xla::XlaValue, torch::lazy::Value, std::vector<torch::lazy::Shape, std::allocator<torch::lazy::Shape> > >' requested here
auto node = torch::lazy::MakeNode<Addcmul>(lazy_self->GetIrValue(),
^
/workspace/pytorch/xla/torch_xla/csrc/generated/LazyIr.h:128:3: note: candidate constructor not viable: no known conversion from 'torch::lazy::Value' to 'const torch_xla::XlaValue' for 4th argument
Addcmul(const torch_xla::XlaValue& self, const torch_xla::XlaValue& tensor1, const torch_xla::XlaValue& tensor2, const torch_xla::XlaValue& value, std::vector<torch::lazy::Shape>&& shapes)
^
/workspace/pytorch/xla/torch_xla/csrc/generated/LazyIr.h:122:7: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 5 were provided
class Addcmul : public XlaNode {
^
/workspace/pytorch/xla/torch_xla/csrc/generated/LazyIr.h:122:7: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 5 were provided
The upstream codegen for GetIrValueForScalarFromCodegen(value)
calls seems to happen here at https://github.com/pytorch/pytorch/blob/master/torchgen/dest/lazy_ir.py#L313. @JackCaoG, should we override the upstream GenLazyNativeFuncDefinition
(https://github.com/pytorch/pytorch/blob/master/torchgen/dest/lazy_ir.py#L281) to override this function?
Generated LazyIr.h
file:
class Addcmul : public XlaNode {
public:
static torch::lazy::OpKind ClassOpKind() {
return torch::lazy::OpKind(at::aten::addcmul);
}
Addcmul(const torch_xla::XlaValue& self, const torch_xla::XlaValue& tensor1, const torch_xla::XlaValue& tensor2, const torch_xla::XlaValue& value, std::vector<torch::lazy::Shape>&& shapes)
: XlaNode(torch::lazy::OpKind(at::aten::addcmul),
{self, tensor1, tensor2, value}, std::move(shapes),
[&]() { return AddcmulOutputShape(self, tensor1, tensor2, value); },
/* num_outputs */ 1,
torch::lazy::MHash())
{
}
std::string ToString() const override {
std::stringstream ss;
ss << XlaNode::ToString();
return ss.str();
}
torch_xla::XlaOpVector Lower(LoweringContext* loctx) const override;
};
Generated XLANativeFunctions.cpp
:
at::Tensor XLANativeFunctions::addcmul(const at::Tensor & self, const at::Tensor & tensor1, const at::Tensor & tensor2, const at::Scalar & value) {
XLA_FN_COUNTER("xla::");
auto common_device = torch_xla::bridge::GetXlaDevice(self, tensor1, tensor2);
TORCH_INTERNAL_ASSERT(common_device);
torch_xla::XLATensorPtr lazy_self = torch_xla::bridge::GetXlaTensorOrCreateForWrappedNumber(self, *common_device);
torch_xla::XLATensorPtr lazy_tensor1 = torch_xla::bridge::GetXlaTensorOrCreateForWrappedNumber(tensor1, *common_device);
torch_xla::XLATensorPtr lazy_tensor2 = torch_xla::bridge::GetXlaTensorOrCreateForWrappedNumber(tensor2, *common_device);
auto out_meta = at::meta::addcmul(self, tensor1, tensor2, value);
std::vector<torch::lazy::Shape> shapes{
torch::lazy::Shape(out_meta.scalar_type(), out_meta.sizes().vec())};
TORCH_INTERNAL_ASSERT(shapes.size() == 1);
if(torch::lazy::symbolicShapeEnabled()){
std::vector<torch::jit::IValue> inputs = { self, tensor1, tensor2, value };
char* schema_str = "aten::addcmul(Tensor self, Tensor tensor1, Tensor tensor2, *, Scalar value=1) -> Tensor";
applySymbolicShapesOnLT(schema_str, inputs, shapes);
}
auto node = torch::lazy::MakeNode<Addcmul>(lazy_self->GetIrValue(),
lazy_tensor1->GetIrValue(),
lazy_tensor2->GetIrValue(),
torch::lazy::LazyGraphExecutor::Get()->GetIrValueForScalarFromCodegen(value),
std::move(shapes));
auto result = torch_xla::bridge::AtenFromXlaTensor(
torch_xla::XLATensor::Create(std::move(node), *common_device));
return result;
};
yea, I would say we can modify the upstream code gen to call IrValueFromScalar
instead of torch::lazy::LazyGraphExecutor::Get()->GetIrValueForScalarFromCodegen
for now. You can open a pr to pytorch after testing you change locally.
FYI @wconstab. I am slightly worry about this because XlaValue
is something we need to keep as long as we keep xla::shape
in our IR. I am not sure if it will introduce more issue on phase 3 migration. I want to do a more detail test after migration is more complete to see if I can safely drop xla::shape
from IR so code base will be much cleaner.
Yea, it should be ok to modify the codegen in the short term to call xla's function for generating the IR for a scalar value.
Longer term.. i wonder if there is a way to make it so you can use lazy::Value and when you try to access .shape() field (expecting xla shape), you use a helper function that can
- get the LazyIr node from the lazy::Value
- dynamic-cast it to xla::Node
- return the xla::shape
this is a bit ugly but it might not be that bad if the alternative is making a lot of changes to keep xla::value around in codegen?
@wonjoolee95 can comment, functionality wise this is the same as we do in https://github.com/pytorch/xla/blob/56a52d53359b0862d0ada8d49c4e3dc52ff75d81/torch_xla/csrc/ir.cpp#L87-L90
I remembered there were more than 50 places where we call value.xla_shape
directly. I guess the laternaive is we call GetXlaShape(value)
. It might not be as bad as I thought but will help us not to worry about XlaValue
in the future. @wonjoolee95 wdyt?
Thanks for the ideas, I'll spend some time to see if we can call GetXlaShape(value)
throughout our code and drop XlaValue
if possible -- this seems like a cleaner solution going forward. If that seems too heavy for now, we can go for the shorter term solution of modifying the upstream codegen. I'll spend some time on this and post updates here.
Couple of tests failing with: RuntimeError: Lazy tensor backend not registered
. This error seems to be thrown here -- https://github.com/pytorch/pytorch/blob/master/torch/csrc/lazy/backend/backend_interface.cpp#L16. I can reproduce the error locally through python idle.
it is caused by torch::lazy::LazyGraphExecutor::Get()->GetIrValueForScalarFromCodegen
. I might be able to enable that after https://github.com/pytorch/xla/pull/3571. Can you work on other node that does not involve scalar for now.
I think we can close this one in favor of https://github.com/pytorch/xla/pull/3768