cicirori comments

Results 10 comments of


                                            cicirori

[Questions] Why use sub computation other than a few instructions?

Or can these subcomputations be reused?

[Questions] Why use sub computation other than a few instructions?

``` HloModule SyncTensorsGraph.60039, input_output_alias={ {0}: (7, {}, may-alias), {1}: (64, {}, may-alias), {2}: (66, {}, may-alias), {3}: (62, {}, may-alias), {4}: (68, {}, may-alias), {5}: (80, {}, may-alias), {6}: (79,...

[Questions] Why use sub computation other than a few instructions?

> ``` > HloModule SyncTensorsGraph.60039, input_output_alias={ {0}: (7, {}, may-alias), {1}: (64, {}, may-alias), {2}: (66, {}, may-alias), {3}: (62, {}, may-alias), {4}: (68, {}, may-alias), {5}: (80, {}, may-alias),...

[Questions] Why use sub computation other than a few instructions?

> > ``` > > HloModule SyncTensorsGraph.60039, input_output_alias={ {0}: (7, {}, may-alias), {1}: (64, {}, may-alias), {2}: (66, {}, may-alias), {3}: (62, {}, may-alias), {4}: (68, {}, may-alias), {5}: (80,...

[Questions] Why use sub computation other than a few instructions?

Sorry, it's been a bit long, just recalled some other details. There is a lot of trivial computation that will definitely bring about increased compilation overhead from the choice of...

[Questions] Why use sub computation other than a few instructions?

> @cicirori Thanks for brining this interesting topic! Can you share more details about what code changes you made to reduce the compilation time by about 30%? One method I...

gradient checkpoint cause bigger memory usage on GPU

[simple_gc_test.zip](https://github.com/pytorch/xla/files/8860163/simple_gc_test.zip) This zip contains the hlo dump results with GC enabled/disabled. The number of model substructure cycles in the run that generated this dump was set to 4 instead of...

error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string<char>, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string<char>, at::Tensor>&’

same errors with pytorch 1.10.2 pre-built and python 3.9.

error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string<char>, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string<char>, at::Tensor>&’

> same error for me @Coldog2333 use high version gcc will fix this problem. I can't remember the exact version, but you might want to try 7.5 .

Support overlapping NCCL collective communication with compute on GPU

I think this optimization pass solves this problem to some extent: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/service/async_collective_creator.cc