Kinect_Dataset_Builder
Kinect_Dataset_Builder copied to clipboard
#Kinect_Dataset_Builder
Kinect_Dataset_Builder is a repository containing a series of programs for constructing a Kinect video dataset or Kinect multi-view video dataset with Kinect sensor 2.0. The term “multi-view” means setting several sensors around a scene where activities or any other things else happen. Figure 1 shows three RGB images belonging to three views in a Kinect multi-view dataset at a moment.
Building a multi-view dataset:
If what you want is a multi-view one, for instance, three views, you’ve got to set three Kinect sensors at three different views surrounding the scene then start recording. But soon you will find that it’s nearly impossible for these three sensors to record three videos respectively with the same number of images, although these recorders may have promised to trigger at the same moment. For example, in three “video01”s recorded by three Kinect sensors, there is always a “video01” containing more images than the other two. It seems like all we can do is to select the beginning frame number and ending frame number of these videos with respect by our naked eyes and organize these information as input of a sampling program to make synchronization. It’s not the worst case that I do provide a program named “MultiViewAligner” to handle this. So feel proud of me! Then, similarly we can apply “ImagesRegistrater”, “BoundingBoxer” and “AnnotationProducer” to every view just like what we have done upon a single-view one as described in the last paragraph.
Brief introduction to the five programs:
Now, please allow me to give a brief introduction to these five handsome software.
- FroggyNect
- MultiViewAligner
- ImagesRegistrater
- BoundingBoxer
- AnnotationProducer
1. FroggyNect
FroggyNect,the Kinect Recorder, is enable to reach the following goals:
- Monitor: Fetch and show RGB, depth and skeleton images (skeleton images are drawn by using corresponding skeleton joints) from Kinect sensor in real time.
- Recorder: Store RGB, depth, infrared data as formatted images (jpg, png, and so on), skeleton data in text files in real time. (The skeleton text contains 25 skeleton joints’ coordinates in three spaces: camera, color and depth. It also contains the orientations of bones and a floor clip plane of that frame.)
You guys are sure to record data streams from Kinect sensor(s) meanwhile open the monitor to have a peek at what you are recording. The written fps of the images is displayed on the main UI of FroggyNect. It’s recommended that you use SSD and faster processor(s) in order to lose as little information as possible. Figure 2 shows the user interface of FroggyNect.
2. MultiViewAligner
If you have collected multiple views of videos by using several Kinect sensors with each connected with an instinct computer, you are supposed to use our MultiViewAligner to synchronize these views. All you have to do is to offer it a configuration text file. The contents of a sample configuration text file are listed in Figure 4:
3. ImagesRegistrater
This module contains exactly two sub-programs: “GetRegisParams” and “RegisProgs”. First, I shall explain what exactly registrations are conducted on our images. If you have already learn the structure of the Kinect sensor, you may know that the perspective of the RGB sensor is different from that of the infrared sensor. The production of the RGB images is relevant to the RGB sensor while the production of other kinds of images is relevant to the infrared sensor. This results in the difference of the angle of views of RGB images and other kinds of images (depth, infrared, long infrared and body index images). Actually, the RGB images own wider perspective than others in horizontal space while other kinds of images own wider perspective than RGB images in vertical space. What’s more, there is a zooming relation between RGB images and other images. Our programs solve these problems by finding the scale ratio and cropped ranges (crop some columns of RGB images and some rows of other kinds of images).
We use “GetRegisParams” to acquire the scale ratio of RGB to depth images and cropped ranges. Supplying several pairs of skeleton records to our program did this. Each pair of skeleton record contains 25 pairs of depth skeleton joints’ coordinates and 25 pairs of color (RGB) skeleton joints’ coordinates and belongs to a specific person in a specific frame. (We can get these data easily from the SkeletonInfo.txt inside the dataset) It’s recommended that you compute scale ratio and cropped ranges for each view of your dataset respectively. That’s is to say, if you want to estimate the parameters for a view, you have to provide our program with several pairs of skeleton records belonging skeletons show up in that view’s videos. Actually, the skeleton records are easily to be found in the ‘SkeletonInfo\SkeletonInfo.txt’. Just copy, paste, and eliminate the headers (“color_skeleton_coordinates =” and “depth_skeleton_coordinates=”).
Our experiments demonstrate that these parameters differ from views and we’d better calculate each combination of parameters for each view. Besides, it’s a good way to input as many pairs of skeleton records within the same view as possible to improve the accuracy upon the estimation of the parameters of a view. This data we need is listed in Figure 5.
4. BoundingBoxer
You can draw bounding boxes of people who appear in the RGB images in a selected video with the help of BoundingBoxer. Start the program, and you will see six buttons on the window.
Click the Append button, then we enter the append mode. In this mode, when you click the canvas twice, a rectangle (bounding box) will be drawn according to the two points captured by our program. You will be asked to input the person id as soon as the rectangle is drawn. And after a valid person id is given, a bounding box will display with the person’s name and id in case of our carelessness. You are free to draw whatever number of bounding boxes you want just like figure 10 shows until you click the “Cancel” button.
5. AnnotationProducer
“AnnotationProducer” is a simple Matlab script to transfer the bounding boxes text files into .mat file.