cicirori

Results 10 comments of cicirori

``` HloModule SyncTensorsGraph.60039, input_output_alias={ {0}: (7, {}, may-alias), {1}: (64, {}, may-alias), {2}: (66, {}, may-alias), {3}: (62, {}, may-alias), {4}: (68, {}, may-alias), {5}: (80, {}, may-alias), {6}: (79,...

> ``` > HloModule SyncTensorsGraph.60039, input_output_alias={ {0}: (7, {}, may-alias), {1}: (64, {}, may-alias), {2}: (66, {}, may-alias), {3}: (62, {}, may-alias), {4}: (68, {}, may-alias), {5}: (80, {}, may-alias),...

> > ``` > > HloModule SyncTensorsGraph.60039, input_output_alias={ {0}: (7, {}, may-alias), {1}: (64, {}, may-alias), {2}: (66, {}, may-alias), {3}: (62, {}, may-alias), {4}: (68, {}, may-alias), {5}: (80,...

Sorry, it's been a bit long, just recalled some other details. There is a lot of trivial computation that will definitely bring about increased compilation overhead from the choice of...

> @cicirori Thanks for brining this interesting topic! Can you share more details about what code changes you made to reduce the compilation time by about 30%? One method I...

[simple_gc_test.zip](https://github.com/pytorch/xla/files/8860163/simple_gc_test.zip) This zip contains the hlo dump results with GC enabled/disabled. The number of model substructure cycles in the run that generated this dump was set to 4 instead of...

> same error for me @Coldog2333 use high version gcc will fix this problem. I can't remember the exact version, but you might want to try 7.5 .

I think this optimization pass solves this problem to some extent: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/service/async_collective_creator.cc