python-tf-bodypix icon indicating copy to clipboard operation
python-tf-bodypix copied to clipboard

How hard would it be to port the multi-person functions?

Open Shamino0 opened this issue 4 years ago • 5 comments

The original TensorFlowJS code includes net.segmentMultiPerson and net.segmentMultiPersonParts, which allow code to more easily distinguish people when they appear in an image together.

I started looking at the code to see if this is something I could port, but I just don't understand TensorFlow or the existing implementations well enough to figure out how.

Please consider this for a future enhancement.

Thanks much.

Shamino0 avatar Feb 16 '21 20:02 Shamino0

Hi @Shamino0 I think this is a duplicate of #50 (perhaps not clear from the issue title)

I don't think it is very hard. It should just be working with segmentation outputs of the model.

What would you use it for?

de-code avatar Feb 16 '21 20:02 de-code

I don't think it's the same as #50 - that one seems to be asking about pose data (keypoints). I don't have a need for that information right now (although it may be helpful in the future).

Without getting into too much detail, the goal here is to identify heads (front, back, side, etc.) from live video. I've got it working right now by calling get_part_mask using the left_face and right_face parts. I then use a CV2 findContours function to identify all of the segmented regions and generate bounding boxes from them.

The problem is that when two heads are close to each other in the image, CV2 can't distinguish between them, because they are part of a single region, so I get one bounding box around them both.

I'm thinking that segmentMultiPersonParts will fix this, by generating an array of masks (or other related structures) from which I can compute one bounding box each.

Shamino0 avatar Feb 16 '21 20:02 Shamino0

Okay, fair point.

I believe the corresponding JavaScript code is in body-pix/src/multi_person/decode_multiple_masks_cpu.ts (in particular decodePersonInstancePartMasks)

de-code avatar Feb 16 '21 21:02 de-code

On further inspection it seems that in order to do it the same way as it is done in the upstream JS version, it would require the multi person pose detection as an input (#50).

de-code avatar Feb 19 '21 21:02 de-code

I started porting the person segmentation functions based on the pose-detection branch. Here's the diff so far: https://github.com/gmontamat/python-tf-bodypix/compare/pose-detection...gmontamat:multiperson-segmentation Feel free to take a look and contribute.

gmontamat avatar Jul 25 '21 23:07 gmontamat