tensorflow.push_pull CANNOT calculate AVERAGE, it always gives SUM
Describe the bug
In Tensorflow push_pull function, op can never equal Average since op is a value from Enum class while Average is just a str.
This bug causes the push_pull function in Tensorflow to always calculate the SUM of tensors instead of AVERAGE.
To Reproduce Steps to reproduce the behavior:
- add a log
print('Type of op and Average: ', type(op), type(Average))after lineop = handle_average_backwards_compatibility(op, average)in tensorflow.push_pull function - run whatever a program that uses tensorflow.push_pull, e.g. codes that use DistributedOptimizer.
- See the types of
opandAverage, which will be printed during the setup of Tensorflow graph - The log will show
Type of op and Average: <enum 'ReduceOps'> <class 'str'>
Expected behavior Here we cannot compare enum type with str type. There're two solutions to this problem:
- use
op.valueto compare withAveragein tensorflow.push_pull function - change
ReduceOpsclass from a Enum class to a normal class
Screenshots
here in tensorflow.ops the ReduceOps inherits Enum

here in tensorflow.__init__ the program directly compares op (which is RecudeOps.Average) to Average (which is a str)

the function uses op == Average to judge whether calculating average of tensors or sum of tensors
now it can only give sum of tensors.
Environment (please complete the following information): Whatever
- OS:
- GCC version:
- CUDA and NCCL version:
- Framework (TF, PyTorch, MXNet):
Additional context Add any other context about the problem here.
@pleasantrabbit I think we can support Average easily
@pleasantrabbit I think we can support
Averageeasily
Yes, we'll add Average support.
https://github.com/bytedance/byteps/pull/324 here's the pull request to fix this bug.