Jiajin Yu

Results 7 comments of Jiajin Yu

@alsrgv , thanks for the prompt reply For NCCL, we use ``` python /tensorflow_benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \ --data_format=NCHW --batch_size=64 --model=resnet50_v2 --optimizer=momentum \ --variable_update=replicated --nodistortions --allow_growth=True --all_reduce_spec=nccl\ --print_training_accuracy=True --num_epochs=1 --weight_decay=1e-4 \ --num_gpus=4 \...

the commit is `d7b68b146c82ee9b936bd196c9f1ed6d54f4a1c7` (fixed) for v2, etc. this is not intentional, we test both and neither of them converges as the same. I just pasted two versions. that was...

Sure. Let me rerun with the additional flag.

I am checking the code. I think you using some different data_dir? The code is like this ``` # Infere dataset name from data_dir if data_name is not provided. if...

@alsrgv , thanks a lot for working on this so quickly. Looking forward to your solution.

@alsrgv , thanks a lot for the fix.@lcytzk and I tested in our case and both Resnet and VGG, etc work fine.

> It is possible to save a model by TGI and reuse it. @dtlzhuangz, may I ask how to do that?