Joshua Z. Zhang

Results 257 comments of Joshua Z. Zhang
trafficstars

the original paper take batch size =32, so that's 240 epoch equivalent.

There's no matplotlib involved during training, so I'm kind of confused. Can you post the full log please?

Can you simply run demo with your new model?

Normally we would rescale by batch-size, however, In my experiments, the behavior don't scale up when batch size is changed. The division by len(ctx) is a hack to the fact...

In makeLoss layer, gradients are assigned inside each device, thus effective batch-size is divided by len(ctx)

The original arXiv paper shows that it's 72.1. Which is almost identical. However, I checked that the author have updated some code, specifically modified some filter size and training hyper-parameters....

Just to make reuse of some temporary buffer without malloc new space.

@xioryu Do you have time writing out the results to files and use official Matlab code to verify the results?

I am going to retrain some of the models to be consistent with recent updates. Don't worry.