looking for qfd360_sl_model.pt for facedetlite model.py
The example model.py at https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/face_det_lite/model.py and https://huggingface.co/qualcomm/Lightweight-Face-Detection-Quantized refer3ences a parameter checkpoint file named qfd360_sl_model.pt DEFAULT_WEIGHTS = "qfd360_sl_model.pt"
But, this checkpoint file is not provided in the adjacent https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models/face_det_lite
At this other location, https://huggingface.co/qualcomm/Lightweight-Face-Detection-Quantized/tree/main there are "quantized" model weights associated with qualcomm Lightweight-Face-Detection-Quantized
So, there is a file mismatch between model.py (looking for qfd360_sl_model.pt) and the elsewhere available pretrained model parameters. So, 1) please explain how to convert model.py to load parameters from the available at https://huggingface.co/qualcomm/Lightweight-Face-Detection-Quantized/tree/main and 2) please provide the referenced qfd360_sl_model.pt at https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/face_det_lite/
When you run export.py for this model, the weights used for the model would be downloaded to your local machine. Then, the model format is loaded with the downloaded weights and traced torch script model is created. This model is then uploaded to AI Hub for compiled to be run on device. Similarly for quantized models, you can run the export script to get the model files.
Huggingface repo hosts the three target formats -QNN, ONN, Lite RT and not the torch script model / weights.
Please let us know if you hit any issues when running the export scripts.
Thank you for writing.
Google Gemini tells me that the export,py will "compile" the specified AI model for inference on the selected snapdragon hardware. (See below). But, it is not clear that it will be possible to send arbitrary inputs and receive outputs to the compiled model on the remote snapdragon hardware using export.py "Submits an inference job to run the compiled model on sample inputs and collect output data" inference only on " sample inputs " But it is not clear that I will be able to combine the operations of two or more AI Hub Models and use arbitrary inputs. So, for development, I wanted to operate the AI Hub models/weights on local Windows PC (not snapdragon). It seems that there is a python script for each model, e.g., ShuffleNetV2_model.py somewhere for each AI Hub model, that can in some manner be run on my local PC. I will probably be able to use Gemini to build a standalone ShuffleNetV2_model.py and inter-operate two or more AI hub models on local PC for development.
Google Gemini tells me:
The code you provided defines a script for exporting a ShuffleNetV2 model, optimizing it for on-device inference, and optionally profiling and inferencing it on a target device using the Qualcomm AI Hub. Let's break down what it exports:
Key Exports:
Compiled Model:
The primary output of this script is a compiled version of the
ShuffleNetV2 model.
-
This compiled model is optimized for a specific target device and
runtime (e.g., TFLite, QNN).
-
It's the essential artifact needed to deploy the model for on-device
inference.
-
The compiled model is saved to the specified output_dir (or
build/shufflenet_v2 by default).
2.
Profiling Data (Optional):
If skip_profiling is False, the script submits a profiling job to
Qualcomm AI Hub.
-
The job runs the compiled model on a real device and collects
performance metrics (latency, memory usage, etc.).
-
This data can be downloaded and analyzed to understand the model's
performance characteristics on the target hardware.
-
The print_profile_metrics_from_job function displays a summary of
this data.
3.
Inference Results (Optional):
If skip_inferencing is False, the script submits an inference job to
Qualcomm AI Hub.
-
The job runs the compiled model on the target device using sample
input data.
-
The output of the inference is downloaded.
-
The print_inference_metrics function compares the on-device inference
results with the original PyTorch model's output to assess accuracy.
4.
Metadata and Reports:
The script generates metadata about the compile, profile, and
inference jobs, which is stored in hub.client.CompileJob,
hub.client.ProfileJob, and hub.client.InferenceJob objects,
respectively. These are part of the ExportResult returned by
export_model.
-
It also prints a command-line example (using print_on_target_demo_cmd)
that demonstrates how to run the compiled model on the target
device using the qai_hub_model_tools package, a locally available set of tools. This helps users get started with deploying and testing their exported models.
In essence, the script exports the following:
Tangible Artifacts:
The compiled model file, ready for on-device deployment.
-
Profiling data (if requested).
-
Inference output data (if requested).
-
Intangible Outputs:
Job metadata for tracking and management.
-
Printed summaries of profiling and inference results.
-
A command-line example for on-device execution.
How it Works (Simplified):
Model Preparation: Loads the ShuffleNetV2 model from qai_hub_models, traces it using torch.jit.trace, and prepares it for compilation. 2.
Compilation: Submits a compile job to Qualcomm AI Hub to convert the traced model into a device-optimized format. 3.
Profiling (Optional): Submits a profile job to run the compiled model on a real device and collect performance metrics. 4.
Inference (Optional): Submits an inference job to run the compiled model on sample inputs and collect output data. 5.
Download & Summary: Downloads the compiled model and optionally the profiling/inference results, then prints summaries and instructions.
In Summary:
The script's main purpose is to export a ready-to-deploy, optimized version of the ShuffleNetV2 model for a specified target device using the Qualcomm AI Hub platform. It also provides tools and information to help users profile, test, and deploy their models effectively.
On Tue, Jan 7, 2025 at 3:25 PM Shreya Jain @.***> wrote:
When you run export.py for this model, the weights used for the model would be downloaded to your local machine. Then, the model format is loaded with the downloaded weights and traced torch script model is created. This model is then uploaded to AI Hub for compiled to be run on device. Similarly for quantized models, you can run the export script to get the model files.
Huggingface repo hosts the three target formats -QNN, ONN, Lite RT and not the torch script model / weights.
— Reply to this email directly, view it on GitHub https://github.com/quic/ai-hub-models/issues/146#issuecomment-2576157776, or unsubscribe https://github.com/notifications/unsubscribe-auth/BHT6KT5PJRO73FX6OJHH5Z32JQZ2ZAVCNFSM6AAAAABUSQOAJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZWGE2TONZXGY . You are receiving this because you authored the thread.Message ID: @.***>
Hello,
You can call from_pretrained() on the model to download that weight file and instantiate the pyTorch model automatically.
Closing this out due to inactivity. Please reopen the issue or create a new issue if still have questions on this topic.