Henry Tsang
Henry Tsang
Hi, thanks for trying out ManagedCollisionCollection! Not sure if its a bug. The thing is, we are trying to (only) use ManagedCollisionCollection with rowwise sharding, which would shard the table...
@fangleigit Thanks. We are still actively developing MCH/ZCH, so we don't have a clear answer so far. Let us know if you have it figured out as well!
I don't think we currently support that. On the other hand, why adding more nodes are causing slowdown? This isn't something that I would expect.
Hi, the problem is the kernel library we use requires the dim to be divisible by 4. https://github.com/pytorch/FBGEMM/blob/2117dd33d4c4fd53cfadbf037b5fb3c0824cb00e/fbgemm_gpu/fbgemm_gpu/split_table_batched_embeddings_ops_training.py#L119 We can keep this issue open as a feature request though.
Hi, this seems like a question for core library https://github.com/pytorch/pytorch
Hi, thanks for the contribution. However, since we are planning to release 0.5.0 soon, we don't plan on reviewing this pr at this time. Maybe we will revisit next month.
* [not recommended] use padding so they have same length * use VBE in kjt
@tiankongdeguiji oh I misspoke. I meant to say you either need to pad the inputs, or use VBE. iirc you shouldn't need to pad the outputs of VBE, but admittedly...
not an expert, but can you try the sharded version of ebc? not sure if the unsharded ebc supports VBE very well
cc @MaggieMoss