ludwig Torchscript execution via onnx. Was: Torchscript execution for skeleton bone category guessing

Describe the bug

Hi fire! I worked on some of the torchscript stuff so can help out here a bit.

Next step is probably checking that the exported model works as expected. You can verify this by making a Python dict of inputs, where each key is the CSV column name used for model training and each value is the corresponding inputs.

I took a look at your repository link– for text features, the model will expect a list of strings of length batch_size. For numerical features, the model will expect a tensor.

Once you have this, pass it in to the model and inspect the outputs. Let me know how this goes– super happy to answer any questions you have along the way.

@geoffreyangus

To Reproduce

No tutorial or quick start to using torchscript.

I don't how to get the feature order needed for inputs in c++.

Probably the first step is to do this in python.

Expected behavior

Able to input test data and get the label which is binary, and its probability so I can compute the resulting corespondence to from one skeleton (source) to another skeleton (sink).

So based on advice from @JosephCatrambone I was able to get some results only using category and "number/boolean" features.

I wish to integrate this into my single binary Godot Engine fork . See https://pytorch.org/tutorials/advanced/cpp_export.html.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

OS: Windows 11
Ludwig version

Additional context

Attached model in the source folder.

src.zip

export_torchscript.zip

data.zip This is a second run so minor changes may result from the seed changing.

Jul 18 '22 22:07 fire

I'll write down my notes.

Here's sample data.

label is binary.
"text" columns
vector of floats

label	sink_bone	sink_bone_category	sink_bone_hierarchy	sink_title	sink_version	sink_exporter_version	sink_spec_version	source_bone	source_bone_category	source_bone_hierarchy	source_title	source_version	source_exporter_version	source_spec_version	vector
true	Hips	VRM_BONE_CATEGORY_TORSO	 hips 	V-Sekai Adult Male	33	saturday06_blender_vrm_exporter_experimental_1.11.0	0.0	Hips	VRM_BONE_CATEGORY_TORSO	 hips 	V-Sekai Adult Male	33	saturday06_blender_vrm_exporter_experimental_1.11.0	0.0	-0.00184498226736 1.01452004909515 -0.0192606896162 1 0.00000114936358 0.00000211599786 -0.00000114936381 -0.54435086250305 0.83885759115219 0.00000211599786 -0.83885759115219 -0.54435086250305 0.59049874544144 0.24427227675915 0.44663000106812 0 0.04477611940299 0 0 0 -0.00184498226736 1.01452004909515 -0.0192606896162 1 0.00000114936358 0.00000211599786 -0.00000114936381 -0.54435086250305 0.83885759115219 0.00000211599786 -0.83885759115219 -0.54435086250305 0.59049874544144 0.24427227675915 0.44663000106812 0 0.04477611940299 0 0 0

Jul 18 '22 22:07 fire

Hi @fire,

I appreciate the detailed report– it's super helpful.

Documentation for torchscript is currently in progress. That said, I'm taking a look at what you have right now and will post a response here shortly. Thanks!

Jul 19 '22 23:07 geoffreyangus

Hi @fire,

Thanks for your patience– I've put together a sample notebook that demonstrates inference with the provided torchscript model. Hopefully this is along the lines of what you're looking for– let me know what you think and if you have any follow-up questions.

Jul 19 '22 23:07 geoffreyangus

Thanks for the amazing answer, I'll spend the next few moments playing with it.

Jul 20 '22 00:07 fire

Do you think recreating to_inference_module_input_from_dataframe in c++ is a large effort?

The idea is to remove python from the inference so it be in a single windows/linux/mac binary merged with the rest of the godot engine code.

Jul 20 '22 16:07 fire

Hi fire,

Good question! For this particular use case, if you inspect the outputs of to_inference_module_input_from_dataframe, the only step taken by the function is creating a List[str] object of length batch_size for each feature. Because of this, you should be able implement a similar function in C++ in a relatively straightforward manner. Let me know how that goes, happy to answer any more questions you may have 🙂

Jul 21 '22 18:07 geoffreyangus

I decided to go back to a simpler problem.

pip install ludwig
ludwig datasets download mnist
set NUMEXPR_MAX_THREADS=30
ludwig.exe train  --dataset mnist_dataset.csv --config ./config.yaml
ludwig export_torchscript --model_path results/experiment_run_18/model/ --output_path mnist_dataset
pip install onnx
# TODO convert torchscript to onnx 
# TODO convert onnx to ncnn

Oct 26 '22 23:10 fire

import os
import numpy as np
import pandas as pd
import torch
from sklearn.metrics import roc_auc_score, accuracy_score
from tqdm import tqdm
from ludwig.utils.data_utils import load_json
from ludwig.utils.inference_utils import to_inference_module_input_from_dataframe

df = pd.read_csv('mnist_dataset.csv', sep='\t')
inference_module = torch.jit.load('mnist_dataset/inference_predictor-cuda.pt')
config = load_json('results/experiment_run_18/model/model_hyperparameters.json')
sample_df = df.sample(5)
sample_input = to_inference_module_input_from_dataframe(sample_df, config)

ludwig.exe preprocess --dataset mnist_dataset.csv --preprocessing_config config.yaml

How would I load mnist_dataset.meta.json into the above script

Oct 26 '22 23:10 fire

The previous sample was unable to load the image of the digit.

>>> sample_input = to_inference_module_input_from_dataframe(sample_df, config)
Traceback (most recent call last):
  File "C:\Users\elee\scoop\apps\python\current\lib\site-packages\pandas\core\indexes\base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'image_path'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\elee\scoop\apps\python\current\lib\site-packages\ludwig\utils\inference_utils.py", line 53, in to_inference_module_input_from_dataframe
    dataset[if_config[COLUMN]],
  File "C:\Users\elee\scoop\apps\python\current\lib\site-packages\pandas\core\frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\Users\elee\scoop\apps\python\current\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'image_path'

One workaround is ??? process into the json + hdf5 package and then process in ludwig.

Oct 27 '22 00:10 fire

Example problem:

training https://ludwig.ai/latest/examples/mnist/
then saving a torchscript,
then converting to onnx
executing via the onnx runtime for directcompute (microsoft) and cuda (windows, linux, and nvidia)
executing via the ncnn runtime for vulkan compute acceleration on 5 platforms

I got stuck converting the image data to a suitable form.

Here's the example schema.

# config.yaml
input_features:
- name: image_path
  type: image
  encoder: 
      type: stacked_cnn
      conv_layers:
        - num_filters: 32
          filter_size: 3
          pool_size: 2
          pool_stride: 2
        - num_filters: 64
          filter_size: 3
          pool_size: 2
          pool_stride: 2
          dropout: 0.4
      fc_layers:
        - output_size: 128
          dropout: 0.4

output_features:
 - name: label
   type: category

Oct 27 '22 00:10 fire

My goal is be able to run the various ML diffusion ecosystem tools.

image prompt -> text (https://huggingface.co/spaces/pharma/CLIP-Interrogator/tree/main)
image prompt, text prompt -> image

Oct 27 '22 00:10 fire

Hi @fire, we'll take a look at this.

Oct 31 '22 22:10 dalianaliu

Hi @fire, good to hear from you again 🙂

What version of Ludwig are you running?

Oct 31 '22 23:10 geoffreyangus

I'm using ludwig 6, but I assume https://github.com/ludwig-ai/ludwig/releases/tag/v0.6.4 will work.

I did some thinking about the problem, and at this point I'm going to start with the hello world which is mnist digit image to label.

There's the inverse problem which image diffusion models do really well is digit (number) prompt to image.

Oct 31 '22 23:10 fire

Hi @fire! Sorry for the delay. I'm out of town this week, but am working on reproducing your issue and will get back to you shortly.

Nov 03 '22 20:11 geoffreyangus

Hi @fire! Getting started on this again. Where does mnist_dataset.csv come from / what are some sample rows in there? I ran ludwig datasets download mnist -o ~/Downloads/issue2292/mnist in the v0.6 tag of the repository and got the following directory structure:

Downloads/
    issue2292/
        mnist/
            mnist.parquet
            testing/
                ...  # image directories
            training/
                ...  # image directories

The output of the command gives me a Parquet file, so just wanted to double-check that we have the same values in there.

Nov 08 '22 18:11 geoffreyangus

Hi @fire, just wanted to follow up with a sample notebook that goes through my own process of downloading MNIST, then training and exporting a TorchScript model.

The main things I discovered worth mentioning:

The MNIST dataset has relative paths to the images. I had to update the image_path values from relative paths to absolute paths in the Pandas DataFrame to get them to load successfully.
I used the load_paths keyword argument in to_inference_module_input_from_dataframe to ensure that the image path columns are actually loaded from file as image tensors
I used the ludwig.models.inference.InferenceModule class to load the preprocessor, predictor, and postprocessor artifacts as a single torch.nn.Module for easy TorchScript export.

Unfortunately, I wasn't able to get the TorchScript > ONNX export to work– Ludwig relies Dictionary objects for pre- and post-processing. There are several open issues to try and expand their ability to do so:

https://github.com/pytorch/pytorch/issues/81482
https://github.com/pytorch/pytorch/issues/87785

Would it be possible to use another serving framework like NVIDIA Triton for your use case?

Nov 08 '22 19:11 geoffreyangus

I'm a bit under the weather but let me describe the use case.

I have a game and one of the targets is a mobile device like the Meta Quest or the iPhone.

To match this toy example, I want to give the model images of digits and have them inference on the gpu for acceleration because in the final use case calculating is too slow for the CPU and battery life is important.

I vaguely remember NVIDIA Triton was a python based server. So it has three minuses. 1. it needs a python runtime 2. it needs a CUDA runtime 3. the server needs to open ports.

I'll do some searching too!

My proposal was for onnx -> ncnn for iot runtimes. Let me know if you have other suggestions.

Nov 08 '22 20:11 fire

Hi @fire, sorry for the late response here. To your points:

NVIDIA Triton does support C++ backends: https://github.com/triton-inference-server/pytorch_backend
I do believe that it needs a CUDA runtime, but I think you would need that for most GPU inference cases
I'm not sure what you mean here

Let me know what you think!

Nov 21 '22 21:11 geoffreyangus

I am looking at stablehlo

convert ludwig outputs to stablehlo
execute on cuda, cpu, vulkan compute with IREE

Probably will open a new issue.

Apr 20 '23 18:04 fire

ludwig ludwig copied to clipboard

Torchscript execution via onnx. Was: Torchscript execution for skeleton bone category guessing

ludwig
ludwig copied to clipboard