ludwig
ludwig copied to clipboard
Torchscript execution via onnx. Was: Torchscript execution for skeleton bone category guessing
Describe the bug
Hi fire! I worked on some of the torchscript stuff so can help out here a bit.
Next step is probably checking that the exported model works as expected. You can verify this by making a Python dict of inputs, where each key is the CSV column name used for model training and each value is the corresponding inputs.
I took a look at your repository link– for text features, the model will expect a list of strings of length batch_size. For numerical features, the model will expect a tensor.
Once you have this, pass it in to the model and inspect the outputs. Let me know how this goes– super happy to answer any questions you have along the way.
@geoffreyangus
To Reproduce
No tutorial or quick start to using torchscript.
I don't how to get the feature order needed for inputs in c++.
Probably the first step is to do this in python.
Expected behavior
Able to input test data and get the label which is binary, and its probability so I can compute the resulting corespondence to from one skeleton (source) to another skeleton (sink).
So based on advice from @JosephCatrambone I was able to get some results only using category and "number/boolean" features.
I wish to integrate this into my single binary Godot Engine fork . See https://pytorch.org/tutorials/advanced/cpp_export.html.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
- OS: Windows 11
- Ludwig version
Additional context
Attached model in the source folder.
data.zip This is a second run so minor changes may result from the seed changing.
I'll write down my notes.
Here's sample data.
- label is binary.
- "text" columns
- vector of floats
label sink_bone sink_bone_category sink_bone_hierarchy sink_title sink_version sink_exporter_version sink_spec_version source_bone source_bone_category source_bone_hierarchy source_title source_version source_exporter_version source_spec_version vector
true Hips VRM_BONE_CATEGORY_TORSO hips V-Sekai Adult Male 33 saturday06_blender_vrm_exporter_experimental_1.11.0 0.0 Hips VRM_BONE_CATEGORY_TORSO hips V-Sekai Adult Male 33 saturday06_blender_vrm_exporter_experimental_1.11.0 0.0 -0.00184498226736 1.01452004909515 -0.0192606896162 1 0.00000114936358 0.00000211599786 -0.00000114936381 -0.54435086250305 0.83885759115219 0.00000211599786 -0.83885759115219 -0.54435086250305 0.59049874544144 0.24427227675915 0.44663000106812 0 0.04477611940299 0 0 0 -0.00184498226736 1.01452004909515 -0.0192606896162 1 0.00000114936358 0.00000211599786 -0.00000114936381 -0.54435086250305 0.83885759115219 0.00000211599786 -0.83885759115219 -0.54435086250305 0.59049874544144 0.24427227675915 0.44663000106812 0 0.04477611940299 0 0 0
Hi @fire,
I appreciate the detailed report– it's super helpful.
Documentation for torchscript is currently in progress. That said, I'm taking a look at what you have right now and will post a response here shortly. Thanks!
Hi @fire,
Thanks for your patience– I've put together a sample notebook that demonstrates inference with the provided torchscript model. Hopefully this is along the lines of what you're looking for– let me know what you think and if you have any follow-up questions.
Thanks for the amazing answer, I'll spend the next few moments playing with it.
Do you think recreating to_inference_module_input_from_dataframe in c++ is a large effort?
The idea is to remove python from the inference so it be in a single windows/linux/mac binary merged with the rest of the godot engine code.
Hi fire,
Good question! For this particular use case, if you inspect the outputs of to_inference_module_input_from_dataframe, the only step taken by the function is creating a List[str] object of length batch_size for each feature. Because of this, you should be able implement a similar function in C++ in a relatively straightforward manner. Let me know how that goes, happy to answer any more questions you may have 🙂
I decided to go back to a simpler problem.
pip install ludwig
ludwig datasets download mnist
set NUMEXPR_MAX_THREADS=30
ludwig.exe train --dataset mnist_dataset.csv --config ./config.yaml
ludwig export_torchscript --model_path results/experiment_run_18/model/ --output_path mnist_dataset
pip install onnx
# TODO convert torchscript to onnx
# TODO convert onnx to ncnn
import os
import numpy as np
import pandas as pd
import torch
from sklearn.metrics import roc_auc_score, accuracy_score
from tqdm import tqdm
from ludwig.utils.data_utils import load_json
from ludwig.utils.inference_utils import to_inference_module_input_from_dataframe
df = pd.read_csv('mnist_dataset.csv', sep='\t')
inference_module = torch.jit.load('mnist_dataset/inference_predictor-cuda.pt')
config = load_json('results/experiment_run_18/model/model_hyperparameters.json')
sample_df = df.sample(5)
sample_input = to_inference_module_input_from_dataframe(sample_df, config)
ludwig.exe preprocess --dataset mnist_dataset.csv --preprocessing_config config.yaml
How would I load mnist_dataset.meta.json into the above script
The previous sample was unable to load the image of the digit.
>>> sample_input = to_inference_module_input_from_dataframe(sample_df, config)
Traceback (most recent call last):
File "C:\Users\elee\scoop\apps\python\current\lib\site-packages\pandas\core\indexes\base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'image_path'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\elee\scoop\apps\python\current\lib\site-packages\ludwig\utils\inference_utils.py", line 53, in to_inference_module_input_from_dataframe
dataset[if_config[COLUMN]],
File "C:\Users\elee\scoop\apps\python\current\lib\site-packages\pandas\core\frame.py", line 3458, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\elee\scoop\apps\python\current\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'image_path'
One workaround is ??? process into the json + hdf5 package and then process in ludwig.
Example problem:
- training https://ludwig.ai/latest/examples/mnist/
- then saving a torchscript,
- then converting to onnx
- executing via the onnx runtime for directcompute (microsoft) and cuda (windows, linux, and nvidia)
- executing via the ncnn runtime for vulkan compute acceleration on 5 platforms
I got stuck converting the image data to a suitable form.
Here's the example schema.
# config.yaml
input_features:
- name: image_path
type: image
encoder:
type: stacked_cnn
conv_layers:
- num_filters: 32
filter_size: 3
pool_size: 2
pool_stride: 2
- num_filters: 64
filter_size: 3
pool_size: 2
pool_stride: 2
dropout: 0.4
fc_layers:
- output_size: 128
dropout: 0.4
output_features:
- name: label
type: category
My goal is be able to run the various ML diffusion ecosystem tools.
- image prompt -> text (https://huggingface.co/spaces/pharma/CLIP-Interrogator/tree/main)
- image prompt, text prompt -> image
Hi @fire, we'll take a look at this.
Hi @fire, good to hear from you again 🙂
What version of Ludwig are you running?
I'm using ludwig 6, but I assume https://github.com/ludwig-ai/ludwig/releases/tag/v0.6.4 will work.
I did some thinking about the problem, and at this point I'm going to start with the hello world which is mnist digit image to label.
There's the inverse problem which image diffusion models do really well is digit (number) prompt to image.
Hi @fire! Sorry for the delay. I'm out of town this week, but am working on reproducing your issue and will get back to you shortly.
Hi @fire! Getting started on this again. Where does mnist_dataset.csv come from / what are some sample rows in there? I ran ludwig datasets download mnist -o ~/Downloads/issue2292/mnist in the v0.6 tag of the repository and got the following directory structure:
Downloads/
issue2292/
mnist/
mnist.parquet
testing/
... # image directories
training/
... # image directories
The output of the command gives me a Parquet file, so just wanted to double-check that we have the same values in there.
Hi @fire, just wanted to follow up with a sample notebook that goes through my own process of downloading MNIST, then training and exporting a TorchScript model.
The main things I discovered worth mentioning:
- The MNIST dataset has relative paths to the images. I had to update the
image_pathvalues from relative paths to absolute paths in the Pandas DataFrame to get them to load successfully. - I used the
load_pathskeyword argument into_inference_module_input_from_dataframeto ensure that the image path columns are actually loaded from file as image tensors - I used the
ludwig.models.inference.InferenceModuleclass to load the preprocessor, predictor, and postprocessor artifacts as a single torch.nn.Module for easy TorchScript export.
Unfortunately, I wasn't able to get the TorchScript > ONNX export to work– Ludwig relies Dictionary objects for pre- and post-processing. There are several open issues to try and expand their ability to do so:
- https://github.com/pytorch/pytorch/issues/81482
- https://github.com/pytorch/pytorch/issues/87785
Would it be possible to use another serving framework like NVIDIA Triton for your use case?
I'm a bit under the weather but let me describe the use case.
I have a game and one of the targets is a mobile device like the Meta Quest or the iPhone.
To match this toy example, I want to give the model images of digits and have them inference on the gpu for acceleration because in the final use case calculating is too slow for the CPU and battery life is important.
I vaguely remember NVIDIA Triton was a python based server. So it has three minuses. 1. it needs a python runtime 2. it needs a CUDA runtime 3. the server needs to open ports.
I'll do some searching too!
My proposal was for onnx -> ncnn for iot runtimes. Let me know if you have other suggestions.
Hi @fire, sorry for the late response here. To your points:
- NVIDIA Triton does support C++ backends: https://github.com/triton-inference-server/pytorch_backend
- I do believe that it needs a CUDA runtime, but I think you would need that for most GPU inference cases
- I'm not sure what you mean here
Let me know what you think!
I am looking at stablehlo
- convert ludwig outputs to stablehlo
- execute on cuda, cpu, vulkan compute with IREE
Probably will open a new issue.