triplet-reid icon indicating copy to clipboard operation
triplet-reid copied to clipboard

How could I use multi-gpu

Open CoinCheung opened this issue 7 years ago • 8 comments
trafficstars

It seems that the model only runs on a single gpu no matter how many gpus are available. If the space the model takes up is more than the volume of one gpu, there would be oom error. I can train the model on a single gpu with default configuration, but once I double the batch size and use two gpus, there is oom errors. How could I use multi-gpu in this case ?

CoinCheung avatar Sep 04 '18 12:09 CoinCheung

This sadly doesn't work out of the box in Tensorflow, you will need to adjust the code quite a bit for this to work. For example you could start by taking a look at this example. This is nothing we are planning on doing though.

Pandoro avatar Sep 10 '18 11:09 Pandoro

It seems that the model only runs on a single gpu no matter how many gpus are available. If the space the model takes up is more than the volume of one gpu, there would be oom error. I can train the model on a single gpu with default configuration, but once I double the batch size and use two gpus, there is oom errors. How could I use multi-gpu in this case ?

I have the same problem, did you solve it?

chris20181220 avatar Dec 20 '18 02:12 chris20181220

@chris20181220 yes, I reimplement it with pytorch, and my implementation supports multi-gpu working

CoinCheung avatar Dec 20 '18 02:12 CoinCheung

@chris20181220 yes, I reimplement it with pytorch, and my implementation supports multi-gpu working

@CoinCheung if I need to use tf, do u know how to fix?

chris20181220 avatar Dec 20 '18 02:12 chris20181220

@chris20181220 As the author said, it will be quite tedious and many code should be modified, I do not think I can do it now. Sorry I cannot help.

CoinCheung avatar Dec 20 '18 03:12 CoinCheung

@chris20181220 As the author said, it will be quite tedious and many code should be modified, I do not think I can do it now. Sorry I cannot help.

@CoinCheung OK thank you all the same, i try to modify

chris20181220 avatar Dec 20 '18 03:12 chris20181220

You should also be aware that there comes the question of how to do the triplet mining in the batch: mine on each GPU's batch independently, or gather all batch outputs to one fixed GPU and mine in the large complete batch there. There are trade-offs and it's not clear what is best.

Note: I have linked your re-implementation in our README as it could be useful for others. Let me know if you don't want this.

lucasb-eyer avatar Dec 20 '18 19:12 lucasb-eyer

Also keep in mind what you do with the batch normalization. When you split the batch, it could pay off to specifically split the batch to make two P×K/2 batches, instead of two P/2×K batches, unless you specifically sync your batch normalization across GPUs.

On Thu, Dec 20, 2018, 20:58 Lucas Beyer [email protected] wrote:

You should also be aware that there comes the question of how to do the triplet mining in the batch: mine on each GPU's batch independently, or gather all batch outputs to one fixed GPU and mine in the large complete batch there. There are trade-offs and it's not clear what is best.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/VisualComputingInstitute/triplet-reid/issues/58#issuecomment-449118265, or mute the thread https://github.com/notifications/unsubscribe-auth/AByRzXjXByl0VnGVqNr9WFP1PUIm4ATAks5u6-vsgaJpZM4WY3mf .

Pandoro avatar Dec 20 '18 20:12 Pandoro