Refactor mdr_find_people
As discussed in #145, we need further discussion in how to handle mdr_find_people.
Yeah but
mdr_detect_personcan be extended to have 3D processing, instead of creating a new package. OpenPose and other methods can also be added later there.
mdr_find_people, I think, will potentially usemdr_detect_personfor detection but most likely also involve navigation and moving the head, which may need to be made into a more complex skill. It may also be redundant withfind_objectskill, depending on how we implement that.With that in mind, I think it's a good idea to merge the detection and 3D calculation into
mdr_find_people. There's a portion which handles inserting the person's info into the knowledge base, however, that may remain here inmdr_find_people._Originally posted by @minhnh in https://github.com/render_node/MDIzOlB1bGxSZXF1ZXN0UmV2aWV3VGhyZWFkMTc0NjAzMzE5OnYy/pull_request_review_threads/more_comments
From #168: The function to find out whether people are inside the arena should be refactored to get this from the topological map. This part can probably be assigned to one of the new members.
Sent with GitHawk
After a quick discussion with @henrikschnor today, I think we can consolidate the different person recognition functionalities into a people recognition action server.
- would combine
mdr_detect_person,mdr_recognize_emotion_action,mdr_gender_recognitionundermdr_planning/mdr_actions/mdr_perception_actions/into one action - detection from RGB images:
- face detection: either with Haar-cascade Detection in OpenCV, or SSD
widerfacemodel frommas_image_detection - full body detection: COCO model seems to work fine for this; it's useful for extracting full-body 3D points for detected people.
- pose and key point detection: OpenPose. Note: apparently you can load the Caffe model directly using OpenCV (documentation), so we may not need to compile OpenPose at all.
- face detection: either with Haar-cascade Detection in OpenCV, or SSD
- extract 3D info of people: this may require the action server to use point clouds, and extract RGB info from these point clouds for detection and recognition tasks. Alternatively,
message_filterscan also be used for synchronizing topics (Python API example exists now woohoo!). - ensemble of recognition models: age, emotion, gender can all be done on the detected RGB face/body images
- direct subscription to the image and point cloud topics to avoid sending additional data on the network
- proposed YAML configuration for initializing the action server
# assuming all the recognition models are Keras
recognition_models:
gender: /path/to/gender/model.h5
emotion: /path/to/emotion/model.h5
...
identity_model: /path/to/identity/model.h5
age_model: /path/to/age/model.h5 # doesn't exist yet
detection_models:
full_body_detection:
module: mas_image_detection.ssd_tensorflow
class: SSDTfModelsImageDetector
config_package: mas_image_detection
kwargs_file: configs/ssd_tensorflow_coco_kwargs.yml
class_file: configs/tf_models_coco_classes.yml
person_class: 1
key_point_detection:
# to be implemented still
model_path: /path/to/some/model
- proposed action specs
---
mas_perception_msgs/PeopleScene people_scene
---
- proposed
mas_perception_msgs/PeopleScene.msg(new)
sensor_msgs/Image scene_image # store the full image of the scene
mas_perception_msgs/Person[] people
sensor_msgs/RegionOfInterest[] face_rois
sensor_msgs/RegionOfInterest[] body_rois
- proposed
mas_perception_msgs/Person.msg
string name
float32 age # does not exist yet
sensor_msgs/Image face_image
sensor_msgs/Image body_image
sensor_msgs/PointCloud2 body_points
# a field for key points, possibly a float32[], but also does not exist yet
string[] attribute_names # gender, emotion,...
string[] attributes # e.g. male, angry,...
float32[] attribute_confidences # e.g. 0.71, 0.80,...
@argenos @alex-mitrevski @PatrickNa feel free to share thoughts on the topic
At a quick glance, this seems like a solid improvement, particularly because the gender and emotion recognition actions were not really actions in the first place.
I'd thus say go for it.
P.S. You might want to think about actual person recognition as well, namely matching the detected people with known names. I see the name field is already there in the proposed Person message, but then a recognition model will also be needed.
Good point on the name model. Edited! @robertocaiwu has also expressed interest in working on this issue.
A complete rewrite using things from mdr_find_people and the other packages is probably reasonable. However I won't be available until the end of next week, so @robertocaiwu feel free to start already and let me know it goes.
This sounds good! Can we find a way to split this issue so both @robertocaiwu and @henrikschnor can work in parallel?
After going through the rules again, there are three functionalities related to this: 1) the perception related part of detecting the person, 2) perception related part of recognition of different stuff and 3) the speech part of providing the description. I'd say that during the implementation we should be a bit careful of making this modular, in some cases we may not want to do all the above actions
@minhnh with this being mostly refactoring, can you take the lead? I think we need an initial version of a branch with the right structure, maybe then we can split the rest of the task between @robertocaiwu and @henrikschnor (depending on what comes out of the scenario tests)
sure I'll work on this later this week