Open
merrymercy
opened this issue 4 years ago
•
0 comments
Incompatible changes
tensorflow/compiler/xla/python/xla_compiler.cc. The return value of spmd_output_sharding and spmd_parameters_shardings is not converted to tuple.
tensorflow/compiler/xla/service/hlo_verifier.cc. Temporarily disable the verification on fused instruction due to our CommonComputationElimination pass.
tensorflow/compiler/xla/service/reduce_scatter_utils.cc. The check of ar->GetModule()->config().use_spmd_partitioning() is removed.
tensorflow/compiler/xla/service/gpu/gpu_compiler.cc. Disable DotMerger because it breaks some auto-sharding tests with small dots.
tensorflow/compiler/xla/service/gpu/ir_emitter_unnested.cc: Use EmitRngGetAndUpdateStateThunk to generate custom rng thunks.
tensorflow/compiler/xla/service/gpu/nccl_all_to_all_thunk.cc. Temporarily disable the CanImplement check on all-to-all. Some valid all-to-alls are classified as invalid because the size-1 dimension are permuted during HLO->MLIR conversion.
tensorflow/compiler/xla/service/spmd/spmd_partitioner.cc::HandleRng. Use a different strategy for replicated rng instructions. We favor replicated computation over communication.
tensorflow/compiler/xla/service/spmd/spmd_partitioner.cc::HandleAllReduce. Use handle elementwise for profiling.
tensorflow/compiler/xla/service/optimization_barrier_expander.cc. OptimizationBarrier is converted to bitcast.
The commit 96b8219 about AllGather is reverted due to performance regression.
Improvements & Bug Fixes
tensorflow/core/platform/env.cc. Add random number to file names to avoid duplication.