nntrainer
nntrainer copied to clipboard
[Wait for #2607] [ Layer ] Mixed Precision support for BN Layer
In this PR
This PR modify conv2d, lstm, batch normalization layer to support mixed precision. We need FP16 to read and copy to FP32 tensors to support inference and training. Especially, the Batch normalization layer needs more attention to support full-precision computation for mixed precision and half-precision for inference.
Commits to be reviewed in this PR
[ Layer ] Update Conv2D to support Mixed Precision
This PR update the conv2D Layer to support Mixed Precision (FP16). It is based on the PR https://github.com/nnstreamer/nntrainer/pull/2579
Resolves:
Self evaluation:
- Build test: [X]Passed [ ]Failed [ ]Skipped
- Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon [email protected]
[ Layer ] enable Mixed Precision in LSTM Layer
This commit enables mixed precision support for LSTM Layer.
Resolves:
Self evaluation:
- Build test: [X]Passed [ ]Failed [ ]Skipped
- Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon [email protected]
[Model ] Add Execution Mode in Compile
This PR add Execution Mode parameter when we compile. The default is ml::train::ExeuctionMode::TRAIN. Currently we do not support compiler optimization for inference mode such as batch normalization fusing, etc. But we will add more optimization depending on the exeuction mode.
Resolves:
Self evaluation:
- Build test: [X]Passed [ ]Failed [ ]Skipped
- Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon [email protected]
[ Layer ] Mixed Precision support for BN Layer
This PR includes Mixed Precision support for batch normalization layer. When the training, BN layer should run full precsion with FP16 Weight data. Therefore, Reading the FP16 data read and data coversion of the current Weight and Activation are required.
For the Inference, we do need compiler optimization like bn fusing. So it also includes execution mode parameters for compile.
Because of compilcate data conversion of bn layer, test case generation also needs to update, so that taking the fp16 input,output tensors and weights and converting FP32 weight for computation. For veification, we do need convert FP32 to FP16.
Self evaluation:
- Build test: [X]Passed [ ]Failed [ ]Skipped
- Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon [email protected]