SemiBin icon indicating copy to clipboard operation
SemiBin copied to clipboard

Possible to support safetensors file

Open SantaMcCloud opened this issue 6 months ago • 3 comments

Hello,

currently Semibin is using pickel files for the model as output and input and i wanted to ask if might be possible to make Semibin either support safetensors files or change it such that the model is saved in a safetensors files.

The reason why i ask is that SemiBin is a tool which can be run via galaxy (an open source project to run bioinformatic tools for users all over the world) and open the possibility to upload pickel files are not that good since they can cause problem since the can contain code harmulf code.

SantaMcCloud avatar Oct 16 '25 22:10 SantaMcCloud

In the newer versions, we use pytorch files (using the extension .pt, which IIRC is the recommended one) and load in safe mode. We do try to reload in unsafe mode for backwards-compatibility, but maybe that should be removed in the next version (also simplifies the code)

luispedro avatar Oct 17 '25 04:10 luispedro

Thanks for the feedback @luispedro. @SantaMcCloud will try to come up with converter python scripts safetensors <-> pt, such that the Galaxy tools will only need to handle safetensor. Maybe @SantaMcCloud can cross link the scripts here and you can include parts later in semibin to support safetensors directly.

The problem with pt is that we can not allow pickle files to be used on public infrastructure (where we can trust users and applications only to a limited ammount) since it seems to easy to execute arbitrary code.

bernt-matthias avatar Oct 17 '25 08:10 bernt-matthias

Yes i can link the script for convert them if @luispedro want it!

SantaMcCloud avatar Oct 17 '25 21:10 SantaMcCloud