Martin Damgaard Nielsen issues

Repositories
Issues
Comments

Results 3 issues of


                                            Martin Damgaard Nielsen

Reduction in memory requirements: Add SplitInitializer for separate initialization

This dramatically reduces memory requirements, as there will no longer be kept an extra copy of the concatenated weight tensor for each timestep (During backprop)

Consumes too much memory during training for long sequences (See pull request)

mask_1_flat and mask_2_flat applied to gates twice?

https://github.com/tensorflow/mesh/blob/6b31c0fc9daf185aae2422976487f8db08fc7369/mesh_tensorflow/transformer/moe.py#L1694 It should not cause any issues I guess. Just unnecessary computation?