ai-hub-models looking for qfd360_sl_model.pt for facedetlite model.py

The example model.py at https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/face_det_lite/model.py and https://huggingface.co/qualcomm/Lightweight-Face-Detection-Quantized refer3ences a parameter checkpoint file named qfd360_sl_model.pt DEFAULT_WEIGHTS = "qfd360_sl_model.pt"

But, this checkpoint file is not provided in the adjacent https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models/face_det_lite

At this other location, https://huggingface.co/qualcomm/Lightweight-Face-Detection-Quantized/tree/main there are "quantized" model weights associated with qualcomm Lightweight-Face-Detection-Quantized

So, there is a file mismatch between model.py (looking for qfd360_sl_model.pt) and the elsewhere available pretrained model parameters. So, 1) please explain how to convert model.py to load parameters from the available at https://huggingface.co/qualcomm/Lightweight-Face-Detection-Quantized/tree/main and 2) please provide the referenced qfd360_sl_model.pt at https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/face_det_lite/

Jan 03 '25 23:01 MartialTerran

When you run export.py for this model, the weights used for the model would be downloaded to your local machine. Then, the model format is loaded with the downloaded weights and traced torch script model is created. This model is then uploaded to AI Hub for compiled to be run on device. Similarly for quantized models, you can run the export script to get the model files.

Huggingface repo hosts the three target formats -QNN, ONN, Lite RT and not the torch script model / weights.

Please let us know if you hit any issues when running the export scripts.

Jan 07 '25 20:01 shreyajn

Thank you for writing.

Google Gemini tells me that the export,py will "compile" the specified AI model for inference on the selected snapdragon hardware. (See below). But, it is not clear that it will be possible to send arbitrary inputs and receive outputs to the compiled model on the remote snapdragon hardware using export.py "Submits an inference job to run the compiled model on sample inputs and collect output data" inference only on " sample inputs " But it is not clear that I will be able to combine the operations of two or more AI Hub Models and use arbitrary inputs. So, for development, I wanted to operate the AI Hub models/weights on local Windows PC (not snapdragon). It seems that there is a python script for each model, e.g., ShuffleNetV2_model.py somewhere for each AI Hub model, that can in some manner be run on my local PC. I will probably be able to use Gemini to build a standalone ShuffleNetV2_model.py and inter-operate two or more AI hub models on local PC for development.

Google Gemini tells me:

The code you provided defines a script for exporting a ShuffleNetV2 model, optimizing it for on-device inference, and optionally profiling and inferencing it on a target device using the Qualcomm AI Hub. Let's break down what it exports:

Key Exports:

Compiled Model:

  The primary output of this script is a compiled version of the
  ShuffleNetV2 model.
  -

  This compiled model is optimized for a specific target device and
  runtime (e.g., TFLite, QNN).
  -

  It's the essential artifact needed to deploy the model for on-device
  inference.
  -

  The compiled model is saved to the specified output_dir (or
  build/shufflenet_v2 by default).
  2.

Profiling Data (Optional):

  If skip_profiling is False, the script submits a profiling job to
  Qualcomm AI Hub.
  -

  The job runs the compiled model on a real device and collects
  performance metrics (latency, memory usage, etc.).
  -

  This data can be downloaded and analyzed to understand the model's
  performance characteristics on the target hardware.
  -

  The print_profile_metrics_from_job function displays a summary of
  this data.
  3.

Inference Results (Optional):

  If skip_inferencing is False, the script submits an inference job to
  Qualcomm AI Hub.
  -

  The job runs the compiled model on the target device using sample
  input data.
  -

  The output of the inference is downloaded.
  -

  The print_inference_metrics function compares the on-device inference
  results with the original PyTorch model's output to assess accuracy.
  4.

Metadata and Reports:

  The script generates metadata about the compile, profile, and
  inference jobs, which is stored in hub.client.CompileJob,
  hub.client.ProfileJob, and hub.client.InferenceJob objects,
  respectively. These are part of the ExportResult returned by
  export_model.
  -

  It also prints a command-line example (using print_on_target_demo_cmd)
  that demonstrates how to run the compiled model on the target

device using the qai_hub_model_tools package, a locally available set of tools. This helps users get started with deploying and testing their exported models.

In essence, the script exports the following:

Tangible Artifacts:

  The compiled model file, ready for on-device deployment.
  -

  Profiling data (if requested).
  -

  Inference output data (if requested).
  -

Intangible Outputs:

  Job metadata for tracking and management.
  -

  Printed summaries of profiling and inference results.
  -

  A command-line example for on-device execution.

How it Works (Simplified):

Model Preparation: Loads the ShuffleNetV2 model from qai_hub_models, traces it using torch.jit.trace, and prepares it for compilation. 2.

Compilation: Submits a compile job to Qualcomm AI Hub to convert the traced model into a device-optimized format. 3.

Profiling (Optional): Submits a profile job to run the compiled model on a real device and collect performance metrics. 4.

Inference (Optional): Submits an inference job to run the compiled model on sample inputs and collect output data. 5.

Download & Summary: Downloads the compiled model and optionally the profiling/inference results, then prints summaries and instructions.

In Summary:

The script's main purpose is to export a ready-to-deploy, optimized version of the ShuffleNetV2 model for a specified target device using the Qualcomm AI Hub platform. It also provides tools and information to help users profile, test, and deploy their models effectively.

On Tue, Jan 7, 2025 at 3:25 PM Shreya Jain @.***> wrote:

When you run export.py for this model, the weights used for the model would be downloaded to your local machine. Then, the model format is loaded with the downloaded weights and traced torch script model is created. This model is then uploaded to AI Hub for compiled to be run on device. Similarly for quantized models, you can run the export script to get the model files.

Huggingface repo hosts the three target formats -QNN, ONN, Lite RT and not the torch script model / weights.

— Reply to this email directly, view it on GitHub https://github.com/quic/ai-hub-models/issues/146#issuecomment-2576157776, or unsubscribe https://github.com/notifications/unsubscribe-auth/BHT6KT5PJRO73FX6OJHH5Z32JQZ2ZAVCNFSM6AAAAABUSQOAJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZWGE2TONZXGY . You are receiving this because you authored the thread.Message ID: @.***>

Jan 08 '25 08:01 MartialTerran

Hello,

You can call from_pretrained() on the model to download that weight file and instantiate the pyTorch model automatically.

Closing this out due to inactivity. Please reopen the issue or create a new issue if still have questions on this topic.

Aug 01 '25 20:08 kory