X-AnyLabeling
X-AnyLabeling copied to clipboard
strange segmentation result with EfficientVitSAM L0/L1
Hi CVHub Team First of all, thank you for the great work you did. It has been very helpful for me to test some latest CV Models with X-AnyLabeling. The onnx model zoos are especially helpful for me personally. In one of my recent projects, we rely much on the Segment Anything(SAM), and we need a lightweight SAM, so i am playing with EfficientVit-SAM. it turns out that the EfficientVit-SAM models(both L0 and L1) delivered by your teams produce some strange results. I tested coordinates prompt only. It seems that the input coordinate prompts are somehow shifted. I see that the onnx model receives raw image as input, so i presume that there is something wrong with the preprocessing(resize+crop+padding) that is integrated/exported into the onnx model. My testing environment: Windows 11 Version 21H2 OS build 22000.1335 Python 3.10 x-anylabel version latest main onnx 1.13.1 onnxruntime 1.16.3 The testing image can be downloaded from this link: https://ncs.ivitec.com/s/Kkx6Yz2dnpB6ztt
If you give point coordinate (X:970 Y:1247) as positive point prompt, the perimeter board is segmented, but the point is actually located on the soccer field
I appreciate any feedback on this issue
Hi, @bitsun:
Thank you for reaching out. Upon further investigation, I have found that the current image, which might include internal rotation, does not seem to be the root cause of the issue.
And, the model, in general, performs satisfactorily across most scenarios. It appears that the problem lies more in the model's capability rather than any specific issues with pre-processing or the input coordinates.
It is likely that the EfficientVit-SAM model, particularly its L0 versions, even L1? may not be well-suited or powerful enough to handle the requirements of your particular task. We recommend exploring alternative models that could better address the complexity of the segmentation challenge you're encountering.
Please feel free to reach out if you need assistance in identifying or implementing other suitable models for your project.
Best regards, CVHub
Hi CVHub Team
I tested the same image(with the same point prompt) with code from the official website of EfficientVitSam(https://github.com/mit-han-lab/efficientvit),and it returns reasonable result.
So i don't think it is due to model capability or image complexity. Of course the onnx model from the official website is slightly different from the onnx files provided by CVHub, but the preprocessing is not included in the official onnx model pipeline.
Thank you for sharing this valuable feedback. I believe there might be a gap between them, and I will take some time to figure it out.