Eikan Wang

[email protected]

@Intel Shanghai, China

Results 69 comments of


                                            Eikan Wang

TensorExpr eval: fix copying variables from pointers on big endian systems

> I propose to fix "int" and smaller integer types use-cases, and to have current values binding system reworked when longer ints (int64_t) or floating-point types would actually be used....

TensorExpr eval: fix copying variables from pointers on big endian systems

Then how about let's add something like follows to ensure the input is neither `Double` nor `Long` for big endian system? ```c++ TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Double); TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Long); ``` With...

TensorExpr eval: fix copying variables from pointers on big endian systems

> A few lines below it's already caught in default case, and exception would be thrown: > > https://github.com/pytorch/pytorch/pull/96951/files#diff-33783d984927670883fec7121b94a5142e54bedf159d7b85af6800818e513d09R1311-R1312 > > Should it still be added? It means that we...

TensorExpr eval: fix copying variables from pointers on big endian systems

@AlekseiNikiforovIBM , I submitted a PR to fix the 32-bit issue - https://github.com/pytorch/pytorch/pull/97669.

TensorExpr eval: fix copying variables from pointers on big endian systems

@AlekseiNikiforovIBM my PR https://github.com/pytorch/pytorch/pull/97669 has been merged, please rebase this PR.

Add a cache mechanism to accelerate torch.compile-for-eager

> If I understand correctly this currently does: C++ (dispatcher) -> Python -> C++ (torch.compile generated operator) Yes. > At some point in the future, what I would want is:...

Add a cache mechanism to accelerate torch.compile-for-eager

@jansel , in terms of the C++ cache, I'm thinking of adding the cache to `PythonKernelHolder` if the kernel is a torch.compile-based. The process could be as follows. - Load...

Add a cache mechanism to accelerate torch.compile-for-eager

> I don't think the final version this will look very much like `PythonKernelHolder`, nor do I think it belongs as part of that class. Got it. We need another...

Add a cache mechanism to accelerate torch.compile-for-eager

> Also, the hermetic PyObject thing might be a red herring - I don't see a meta kernel for the operator, how are you handling that? We should have directly...

Add a cache mechanism to accelerate torch.compile-for-eager

@zou3519 , we had some discussion in #115545 , may I know if I have addressed all your questions?

‹
1
2
3
4
5
6
7
›