Less Wright issues

Results 44 issues of


Less Wright

[SPMD][Fusion] Add unit tests for fusion

After PR https://github.com/pytorch/tau/pull/631 lands, add unit testing. Simple tests would involved fusion based on a set policy and verifying output gradients and inspecting the graph.

[SPMD][Fusion] tracking - move global buffer to just before first fusion

Per Alisson - we can reduce memory overhead by having the global buffer first created at first use (i.e. just before first fusion) rather than the current instantiation at the...

[spmd] incorrect aten.expand call with nn.linear (expanded size must match existing size at dim 0)

This is to track/investigate the issue reported by Rich Zhu, where using permute to generate a transposed tensor for nn.linear, results in an incorrect aten.expand call. I've found two potential...

[SPMD] Remove Gradient tensor clones added during DTensor comm collective insertion

After expansion of DTensor communication operations, fx is inserting a clone operation to clone the gradient tensor. This operation will slow down the perf and add memory, but is technically...

pytests_test_gpu(0) will fail if allocated a non-4 gpu server - add guard/skip?

In running the pytests for a recent PR, I was allocated a 3 gpu server rather than 4 gpu. (presumably a bad gpu on a 4 gpu server, but unclear...

[spmd] torch.cat (aten.cat.default) not implemented for Distributed Tensor (tracking)

Implement a graph using torch.cat. convert it via SPMD Receive: raise NotImplementedError( NotImplementedError: Operator aten.cat.default does not have a DistributedTensor rule registered.) code location: File "/home/ubuntu/graph/spmd/api.py", line 110, in _get_dtensor_dispatch_graph...

tb shows blank screen instead of report. Console inspection = failed to load resource: net::ERR_CONTENT_LENGTH_MISMATCH

Hi - We're trying to consolidate on using tensorboard but sporadically hitting an issue where reports won't load and instead just get a blank white screen. This occurs on multiple...

add doc for adding custom dataset

per user request, we don't currently have any info on how to do this. (basically extend the hf_dataset file).

documentation

enhancement

selective compilation - norm layers only

This PR adds the option to selectively compile just the norm layers only, and is mainly targeted at RMSNorm. By compiling just the norm layers when using rmsnorm, we get...

CLA Signed

grouped gemm tutorial fails on H100 hopper...core dump, Assertion `false && "FenceInsertionPass does not supported WhileOp"' failed.

Tried to run the grouped gemm tutorial on Hopper/H100, https://github.com/openai/triton/blob/main/python/tutorials/11-grouped-gemm.py I realize this is an experimental tutorial, but I hit this same error while working on another kernel and meant...