api-inference-community icon indicating copy to clipboard operation
api-inference-community copied to clipboard

Add Img2Mesh, Text2Mesh option for inference

Open lalalune opened this issue 1 year ago • 3 comments

Is your feature request related to a problem? Please describe. Many models are coming online from several directions which enable users to generate meshes unconditionally, from text guidance or image prior. These projects are harder to coordinate on because they are not well represented in HuggingFace's model hub or inference API, and that affects downstream work like Microsoft's MII inference pipeline which is tightly integrated with HuggingFace.

The goal of this feature request is to, looking at the future, consider adding 3D mesh tasks as a standard task type.

Example of Img2Mesh https://github.com/monniert/unicorn

Example of Text2Mesh https://github.com/ashawkey/stable-dreamfusion

Example of Unconditional Mesh Generation https://nv-tlabs.github.io/GET3D

Example of text-guided animation with motion diffusion https://github.com/GuyTevet/motion-diffusion-model

Describe the solution you'd like Add support for 3D mesh responses. This is similar to images, but the mesh and texture can be separated in some format cases, so this will need to be considered. Some meshes may also have multiple parts or images, although in practice no model has done this.

The popular formats this takes are the following:

  1. .OBJ model, .PNG texture and .MTL material description
  2. FBX model with texture embedded
  3. GLB (binary GLTF) model with texture embedded
  4. Raw numpy, npz or npy file array
  5. ZIP file containing some custom data or other format

lalalune avatar Oct 24 '22 01:10 lalalune