ghost
ghost copied to clipboard
simswap mask
Hi guys, great job with the sber-swap implementation. The results are the best of any framework I've seen so far. The masking has been the only problem. Would it possible for you to implement anything similar to what Simswap has done for masking?
SimSwap use face parsing. Citation from SimSwap preparation: We use the face parsing from face-parsing.PyTorch for image postprocessing. Please download the relative file and place it in ./parsing_model/checkpoint from this link. If sber-swap developers made it possible to use this method, the final quality would be much better! Perhaps this would solve the problem of face jittering too. I really hope and expect that the developers will soon make this feature possible.
Hi guys, great job with the sber-swap implementation. The results are the best of any framework I've seen so far. The masking has been the only problem. Would it possible for you to implement anything similar to what Simswap has done for masking?
Yes this is definitely needed!
`import numpy as np import cv2 import os from parsing_model.model import BiSeNet import torchvision.transforms as transforms import torch
def encode_segmentation_rgb(segmentation, no_neck=True): parse = segmentation
face_part_ids = [1, 2, 3, 4, 5, 6, 10, 12, 13] if no_neck else [1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 13, 14]
mouth_id = 11
# hair_id = 17
face_map = np.zeros([parse.shape[0], parse.shape[1]])
mouth_map = np.zeros([parse.shape[0], parse.shape[1]])
# hair_map = np.zeros([parse.shape[0], parse.shape[1]])
for valid_id in face_part_ids:
valid_index = np.where(parse==valid_id)
face_map[valid_index] = 255
valid_index = np.where(parse==mouth_id)
mouth_map[valid_index] = 255
# valid_index = np.where(parse==hair_id)
# hair_map[valid_index] = 255
#return np.stack([face_map, mouth_map,hair_map], axis=2)
return np.stack([face_map, mouth_map], axis=2)
def expand_eyebrows(lmrks, eyebrows_expand_mod=1.0):
lmrks = np.array( lmrks.copy(), dtype=np.int32 )
# Top of the eye arrays
bot_l = lmrks[[35, 41, 40, 42, 39]]
bot_r = lmrks[[89, 95, 94, 96, 93]]
# Eyebrow arrays
top_l = lmrks[[43, 48, 49, 51, 50]]
top_r = lmrks[[102, 103, 104, 105, 101]]
# Adjust eyebrow arrays
lmrks[[43, 48, 49, 51, 50]] = top_l + eyebrows_expand_mod * 0.5 * (top_l - bot_l)
lmrks[[102, 103, 104, 105, 101]] = top_r + eyebrows_expand_mod * 0.5 * (top_r - bot_r)
return lmrks
def get_mask(image: np.ndarray, landmarks: np.ndarray) -> np.ndarray: """ Get face mask of image size using given landmarks of person """
img_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
mask = np.zeros_like(img_gray)
points = np.array(landmarks, np.int32)
convexhull = cv2.convexHull(points)
cv2.fillConvexPoly(mask, convexhull, 255)
n_classes = 19
net = BiSeNet(n_classes=n_classes)
net.cuda()
save_pth = os.path.join('./parsing_model/checkpoint', '79999_iter.pth')
net.load_state_dict(torch.load(save_pth))
net.eval()
to_tensor = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
])
with torch.no_grad():
img = to_tensor(image)
img = torch.unsqueeze(img, 0)
img = img.cuda()
out = net(img)[0]
parsing = out.squeeze(0).cpu().numpy().argmax(0)
print(np.unique(parsing))
vis_parsing_anno = parsing.copy().astype(np.uint8)
tgt_mask = encode_segmentation_rgb(vis_parsing_anno)
print("mask", mask)
print("tgt_mask", tgt_mask)
return tgt_mask`
I was able to successfully run the use_mask function from simpswap to return some object. I tried to replace the get_mask function with the use_mask code from simswap. I'm getting the following error. ValueError: operands could not be broadcast together with shapes (1024,682,1,2) (1024,682,3) Any idea on how to fix this? mask is the result from original code and tgt_mask is the result from simswap code.
mask [[0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0]]
tgt_mask [[[0. 0.] [0. 0.] [0. 0.] ... [0. 0.] [0. 0.] [0. 0.]]
[[0. 0.] [0. 0.] [0. 0.] ... [0. 0.] [0. 0.] [0. 0.]]
[[0. 0.] [0. 0.] [0. 0.] ... [0. 0.] [0. 0.] [0. 0.]]
...
[[0. 0.] [0. 0.] [0. 0.] ... [0. 0.] [0. 0.] [0. 0.]]
[[0. 0.] [0. 0.] [0. 0.] ... [0. 0.] [0. 0.] [0. 0.]]
[[0. 0.] [0. 0.] [0. 0.] ... [0. 0.] [0. 0.] [0. 0.]]]
Here is the colab notebook for the full code.
https://colab.research.google.com/gist/quantarb/375cbd523b459b761c3556a372c76fac/sber-swap.ipynb
I figured out how to get the simswap use_mask to work with sber-swap. You need to copy the parsing_model folder from simswap git repo and model file to use face-parsing.PyTorch. After that, you will need to change the masks.py to the code here.
https://gist.githubusercontent.com/quantarb/47df63a41532affbeca08d31ba13bc18/raw/d483a32d3391f95b0da4b681dcc8e9e89dad563d/masks.py
I updated the colab notebook to download all the necessary files to run face-parsing.PyTorch and modifies masks.py to use it.
https://colab.research.google.com/gist/quantarb/9b7775d27a9b6bac46f19147bb452f62/sber-swap.ipynb
Thanks! I will try this tomorrow. How were the results?
I must warn you that my implementation is really basic. Due to my limited computer vision knowledge, I did not implement all of simswap's masking features like smoothing.
Finally, I am constructing a face-parsing model for each face swap, which is inefficient computationally. I didn't see a good way to pass the parsing model through the model inference function. I think the authors or someone else could do much better job than my rough implementation.
The first image is the original sberswap.
The second image is with my implementation.
I cleaned up the code and improved the documentation. This notebook has examples of how to do a single image swap, single video swap, and a folder swap. I tried to use examples from the github so people can run the code without recreate my personal folder structure.
https://colab.research.google.com/gist/quantarb/15904518bacd7ebb5b046f95982955fb/sber-swap.ipynb
nice job!image swap works well but video swap result skip a lot of facial alignments,certainly needs an improvment. thank you for your work!
nice job!image swap works well but video swap result skip a lot of facial alignments,certainly needs an improvment. thank you for your work!
Please see the new colab notebook. I realized the notebook was using an old version of the masks.py file. I updated the notebook to download the correct file and included new fuctionality to extract faces for the target folder. The new face detection works a lot better, but still needs some work.
https://colab.research.google.com/gist/quantarb/6327c2ef8f72ebeb0e41541f8476f4ce/sber-swap.ipynb
nice job!image swap works well but video swap result skip a lot of facial alignments,certainly needs an improvment. thank you for your work!
Please see the new colab notebook. I realized the notebook was using an old version of the masks.py file. I updated the notebook to download the correct file and included new fuctionality to extract faces for the target folder. The new face detection works a lot better, but still needs some work.
https://colab.research.google.com/gist/quantarb/6327c2ef8f72ebeb0e41541f8476f4ce/sber-swap.ipynb
Great job, thanks a lot! I'd like to take this opportunity to ask you to try porting SimSwap's ability to change mask height and width. In reverse2original.py
it is kernel = np.ones((40,40),np.uint8)
. First 40 - do not change the height of the face or can be replaced with your own values in px., second 40 - width. In sber-swap, this works in a similar way, but still gives a very unpredictable result. For example, if in SimSwap I know that the height of the face is 400 px. then operating with this value, I can change 40 to 350, for example, or to the one I need. In sber-swap, this does not work like that and you have to select completely different values manually. Maybe in sber-swap it works a little wrong due to the fact that there is still a height above the eyebrows and a slightly different blurring of the mask around the edges? In general, if you could transfer this feature here as in SimSwap, it would be really great!
nice job!image swap works well but video swap result skip a lot of facial alignments,certainly needs an improvment. thank you for your work!
Please see the new colab notebook. I realized the notebook was using an old version of the masks.py file. I updated the notebook to download the correct file and included new fuctionality to extract faces for the target folder. The new face detection works a lot better, but still needs some work. https://colab.research.google.com/gist/quantarb/6327c2ef8f72ebeb0e41541f8476f4ce/sber-swap.ipynb
Great job, thanks a lot! I'd like to take this opportunity to ask you to try porting SimSwap's ability to change mask height and width. In
reverse2original.py
it iskernel = np.ones((40,40),np.uint8)
. First 40 - do not change the height of the face or can be replaced with your own values in px., second 40 - width. In sber-swap, this works in a similar way, but still gives a very unpredictable result. For example, if in SimSwap I know that the height of the face is 400 px. then operating with this value, I can change 40 to 350, for example, or to the one I need. In sber-swap, this does not work like that and you have to select completely different values manually. Maybe in sber-swap it works a little wrong due to the fact that there is still a height above the eyebrows and a slightly different blurring of the mask around the edges? In general, if you could transfer this feature here as in SimSwap, it would be really great!
I'm attempting to transfer face detection, mask smoothing, and all other functionality. On my notebook, sber-swap still is performing the face detection and simswap is only doing the masking. This is quite difficult for me since I don't really understand the code.
I've been working on this. I added GPEN for the face restoration, SimSwap mask, optimized the code, and cut down the existing code base by like 70%. Still testing, but I hope to release a framework soon for all single-shot models.
One thing I have noticed is that with video, the simswap mask doesn't work better because it misses a lot of frames. Need to figure that out.
I've been working on this. I added GPEN for the face restoration, SimSwap mask, optimized the code, and cut down the existing code base by like 70%. Still testing, but I hope to release a framework soon for all single-shot models.
That would be really awesome. Do you have an ETA on your release?
Not yet. I have GPEN and GFPGAN working for upscaling. I am generalizing the Framework to also be able to use the SimSwap model too.
If this is not outdated now, wav2lip-HQ uses modified faceparsing-mask in a much easier way I think. I've implemented that in some of my projects. Just cloning the parsing folder and adding 3 lines of code to get the mask. You also need the checkpoints ...79999it..