FastSAM icon indicating copy to clipboard operation
FastSAM copied to clipboard

Convert FastSAM to onnx and coreml format

Open okdha1234 opened this issue 1 year ago • 3 comments

Hi,

Has someone convert FastSAM to the onnx or coreml format? Since, FastSam is based on YOLOV8 model and takes img.path as input, how to get an image trace for this and convert to core ml ormat? Also, how to convert it to coreml format such that output can be of variable size ?

okdha1234 avatar Jun 28 '23 14:06 okdha1234

You can write a simple script to export the vision model to CoreML using the Ultralytics export API:

from fastsam import FastSAM

fsam = FastSAM("weights/FastSAM-x.pt")
fsam.export(format="coreml")

This already achieves a great performance but note that the performance and model size on disk can be improved by using the MLProgram format which use FP16 storage and inference by default. Here is an example for the S variant:

from fastsam import FastSAM
import coremltools as ct
import torch


def main():
    fsam = FastSAM("weights/FastSAM-s.pt")
    fsam.export(format="torchscript")
    imgsz = fsam.overrides["imgsz"]

    ts = torch.load("weights/FastSAM-s.torchscript")

    model = ct.convert(
        ts,
        inputs=[
            ct.ImageType(
                name="image", scale=1 / 255, bias=(0, 0, 0), shape=(1, 3, imgsz, imgsz)
            )
        ],
        convert_to="mlprogram",
    )
    model.save("weights/FastSAM-s.mlpackage")


if __name__ == "__main__":
    main()

You would still need to export the prompt model or keep it in torch/torchscript.

laclouis5 avatar Jun 28 '23 17:06 laclouis5

@laclouis5 Thanks a lot, is there a way to export FastSAM to coreml model such that it can accept variable size input instead of fixed shape(1024,1024)

okdha1234 avatar Jun 29 '23 05:06 okdha1234

@okdha1234 There is a way, though I don't know if this model will have optimal performance when using a different input shape than the default one. Here is an example of an export which accepts two input shapes (more can be added):

from fastsam import FastSAM
import coremltools as ct
import torch


def main():
    fsam = FastSAM("weights/FastSAM-s.pt")
    fsam.export(format="torchscript")
    imgsz = fsam.overrides["imgsz"]

    ts = torch.load("weights/FastSAM-s.torchscript")

    model = ct.convert(
        ts,
        inputs=[
            ct.ImageType(
                name="image",
                scale=1 / 255,
                bias=(0, 0, 0),
                shape=ct.EnumeratedShapes(
                    shapes=[
                        (1, 3, imgsz // 2, imgsz // 2),
                        (1, 3, imgsz, imgsz),
                    ],
                    default=(1, 3, imgsz, imgsz),
                ),
            )
        ],
        convert_to="mlprogram",
    )
    model.save("weights/FastSAM-s.mlpackage")


if __name__ == "__main__":
    main()

laclouis5 avatar Jun 29 '23 20:06 laclouis5

We would close this issue for now to keep the issue tracker organized. However, if the problem persists or if you have any further questions, please feel free to comment here or open a new issue. We value your input and are happy to assist further.

an-yongqi avatar Jul 06 '23 01:07 an-yongqi

@okdha1234 I think fsam.export(dynamic=True) can do this. Source: /Ultralytics/yolo/cfg/default.yaml:75

zimond avatar Jul 12 '23 09:07 zimond

@laclouis5 Thanks much for your help, I had used your answer to generate a mlpackage and got the VNCoreMLFeatureValueObservation successfully , but when I try to visualize the multiArrayValue, I only get the nonsense. I want to know the exact struct of the output in CoreML and how to visualize the segments.

foxstudiohua avatar Mar 31 '24 13:03 foxstudiohua

@laclouis5 Thanks much for your help, I had used your answer to generate a mlpackage and got the VNCoreMLFeatureValueObservation successfully , but when I try to visualize the multiArrayValue, I only get the nonsense. I want to know the exact struct of the output in CoreML and how to visualize the segments.

Sorry to trouble you, I figure out the reason: I simply use the solution for DeepLabV3 which output is just 513x513 not fastSAM's 1x32x256x256

foxstudiohua avatar Apr 01 '24 00:04 foxstudiohua

@laclouis5 Thanks much for your help, I had used your answer to generate a mlpackage and got the VNCoreMLFeatureValueObservation successfully , but when I try to visualize the multiArrayValue, I only get the nonsense. I want to know the exact struct of the output in CoreML and how to visualize the segments.

Sorry to trouble you, I figure out the reason: I simply use the solution for DeepLabV3 which output is just 513x513 not fastSAM's 1x32x256x256

@foxstudiohua The CoreML model shows there are two outputs: 1×37×8400 and 1×32×160×60, do you know what's the first one? and How do you get 1x32x256x256 output?

SearchDream avatar Apr 05 '24 11:04 SearchDream

@laclouis5 Thanks much for your help, I had used your answer to generate a mlpackage and got the VNCoreMLFeatureValueObservation successfully , but when I try to visualize the multiArrayValue, I only get the nonsense. I want to know the exact struct of the output in CoreML and how to visualize the segments.

Sorry to trouble you, I figure out the reason: I simply use the solution for DeepLabV3 which output is just 513x513 not fastSAM's 1x32x256x256

@foxstudiohua The CoreML model shows there are two outputs: 1×37×8400 and 1×32×160×60, do you know what's the first one? and How do you get 1x32x256x256 output?

I got the information from here https://github.com/orgs/ultralytics/discussions/3417, and I think my 256x256 is just different config when convert model to CoreML~

foxstudiohua avatar Apr 06 '24 12:04 foxstudiohua

@laclouis5 Thanks much for your help, I had used your answer to generate a mlpackage and got the VNCoreMLFeatureValueObservation successfully , but when I try to visualize the multiArrayValue, I only get the nonsense. I want to know the exact struct of the output in CoreML and how to visualize the segments.

Sorry to trouble you, I figure out the reason: I simply use the solution for DeepLabV3 which output is just 513x513 not fastSAM's 1x32x256x256

@foxstudiohua The CoreML model shows there are two outputs: 1×37×8400 and 1×32×160×60, do you know what's the first one? and How do you get 1x32x256x256 output?

I got the information from here https://github.com/orgs/ultralytics/discussions/3417, and I think my 256x256 is just different config when convert model to CoreML~

Thanks :)

SearchDream avatar Apr 06 '24 23:04 SearchDream