neuralet Running Adaptive learning on Jetson TX2 and Jetson Nano

I was wonder if I can run the adaptive learning framework on edge devices such as Jetson TX2 and Jetson Nano. Since the teacher model is a really large model and also the student model require training I think it is a really challenging task to be run on low resource edge devices. First of all I will try to run it on a more powerful device like Jetson TX2 and if the attempt was successful I will try to run it on Jetson Nano. I will share my journey under this issue.

Sep 15 '20 19:09 alpha-carinae29

First I tried to run a large teacher model on Jetson TX2. In my first attempt I chose the TensorFlow NAS Faster RCNN model as my teacher model. To get inference from this model first I installed TensorFlow on Jetson TX2 with JetPack 4.3 base on this instruction by Nvidia. then I tried to download and run the NAS RCNN model which I couldn't and the process have been killed every time. I determined that the model requires more RAM than the Jetson TX2 has. In my next try I will try to convert the NAS RCNN model to TensorRT engine with TF-TRT tool and create a lighter model and see if it can be run on Jetson TX2.

Sep 15 '20 19:09 alpha-carinae29

I followed the Nvidia's instuction to convert a TensorFlow Saved Model to a TensorRT optimized version. However I still had RAM problem and the script have been killed. Then I tried to optimize the model on a GPU tower with lots of hardware resources. I sucessfully converted the model but again I had RAM problem when running the converted model on TX2. I think the optimization should be execute on the final target machine. I think now I will give up in running this NAS RCNN model as the teacher model and try to use IterDet as my teacher model.

Sep 15 '20 21:09 alpha-carinae29

So I tried to install and run IterDet on Jetson TX2. First I tried to install PyTorch 1.5.0 which is the backend of the IterDet model. Since the installed JetPack on my device was 4.3 I couldn't use Nvidia's container for running PyTorch on Jetson, also the PyTorch 1.5.0 wheels for Jetson is only available for JetPack 4.4. So I was forced to use a lower version of IterDet (V1 version) which is compatible with PyTorch 1.3.0. I started with the Smart Social Distancing docker image for the Jetson which has some necessary libraries and dependencies such as opencv. I followed this article and sucessfully installed PyTorch 1.3.0 and TorchVision 0.4.1 on the Jetson TX2. Now I needed to install IterDet V1 to run its model. I followed this instruction to install the IterDet. In the middle of installation process I found out that I should install some of the dependencies from the source since the PyPi wheels were incompatible with Jetson devices. I cloned mmcv (version 0.6.0 was compatible with IterDet V1) and brambox repositories and build these libraries from the source. At this stage I successfully installed the IterDet dependencies and everything was ready to run the model and getting inference.

Sep 18 '20 21:09 alpha-carinae29

For running the IterDet model I wrote an script inspired by the Adaptive Learning Teacher class. I observed that when I force the model to run on GPU I get this error: RuntimeError: cuda runtime error (7) : too many resources requested for launch at mmdet/ops/roi_align/src/roi_align_kernel.cu:139 It seems this error is a mmdetection error (mmdetection is the backend library of IterDet). see this issue for more information about this error. On the other hand this IterDet version that I was using had not CPU support. so inspired by the newer version which has the CPU support I changed some lines in mmcv and mmdetection source code and created a CPU compatible version of the IterDet. Finally I successfully got an inference from the CrowdHuman checkpoint of the model. The model runs with just 1GB of RAM and there wasn't RAM issues with this model anymore. However it is really really slow and for an video frame with 704x576 resolution it takes around 9:30 minutes to get the inference. you can check one of the sample results from the SoftBio dataset.

iterdet_sample

Sep 18 '20 21:09 alpha-carinae29

In the next step I should find a way to run this model faster. stay tuned :)

Sep 18 '20 21:09 alpha-carinae29

Before performing some optimizations to the IterDet model I decided to make sure that the TensorFlow Object Detection API which is the training API for the student model can run properly on Jetson TX2, So I installed TensorFlow 1.15.2 with the help of this instruction. Then I followed the installation guide of TensorFlow Object Detection API and installed it without a problem. Then I ran a mock training to profile the performance of TX2. With batch size equal to 2 it requires 3GB of RAM and each training step takes around 2.5 seconds for a SSD MobileNet V2 model. I am not sure if the model is running on GPU or CPU. I will update this comment after I monitored the GPU. So I concluded the training is not a problem in Jetson TX2 and the only think I should take care of is running teacher model with more reasonable inference time.

Sep 20 '20 22:09 alpha-carinae29

neuralet neuralet copied to clipboard

Running Adaptive learning on Jetson TX2 and Jetson Nano

neuralet
neuralet copied to clipboard