tvm-webgpu-example icon indicating copy to clipboard operation
tvm-webgpu-example copied to clipboard

Preprocessing and Model Expectations

Open videetparekh opened this issue 5 years ago • 0 comments

Hi Tianqi,

I'm using this tutorial as a guide to host one of my own MobileNetv2 models. I had a couple of questions regarding the preprocessing you are doing. I'm not used to JS so I'm very lost in this bit of code right here:

function preprocImage(imageData) {
    const width = imageData.width;
    const height = imageData.height;
    const npixels = width * height;

    const rgbaU8 = imageData.data;

    // Drop alpha channel. Resnet does not need it.
    const rgbU8 = new Uint8Array(npixels * 3);
    console.log(rgbU8.length)
    for (let i = 0; i < npixels; ++i) {
        rgbU8[i * 3] = rgbaU8[i * 4];
        rgbU8[i * 3 + 1] = rgbaU8[i * 4 + 1];
        rgbU8[i * 3 + 2] = rgbaU8[i * 4 + 2];
    }

    // Cast to float and normalize.
    const rgbF32 = new Float32Array(npixels * 3);
    for (let i = 0; i < npixels; ++i) {
        rgbF32[i * 3] = (rgbU8[i * 3] - 123.0) / 58.395;
        rgbF32[i * 3 + 1] = (rgbU8[i * 3 + 1] - 117.0) / 57.12;
        rgbF32[i * 3 + 2] = (rgbU8[i * 3 + 2] - 104.0) / 57.375;
    }

    // Transpose. Resnet expects 3 greyscale images.
    const data = new Float32Array(npixels * 3);
    for (let i = 0; i < npixels; ++i) {
        data[i] = rgbF32[i * 3];
        data[npixels + i] = rgbF32[i * 3 + 1];
        data[npixels * 2 + i] = rgbF32[i * 3 + 2];
    }
    return data;
}

My input to this function is

ImageData {
  data: Uint8ClampedArray(200704) [
    14, 12, 26, 255, 13, 11, 25, 255, 11,  9, 22, 255,
    11,  9, 20, 255, 11, 11, 21, 255, 11, 14, 21, 255,
    11, 15, 18, 255, 10, 16, 16, 255, 12, 21, 18, 255,
    18, 29, 23, 255, 23, 36, 26, 255, 24, 38, 25, 255,
    17, 33, 20, 255, 15, 32, 16, 255, 17, 36, 17, 255,
    21, 40, 21, 255, 17, 27, 18, 255, 23, 33, 25, 255,
    25, 31, 27, 255, 22, 28, 26, 255, 24, 28, 29, 255,
    29, 30, 34, 255, 43, 42, 47, 255, 64, 63, 69, 255,
    70, 69, 74, 255,
    ... 200604 more items
  ]
}

I require an array of shape (1,3,224,224), which I believe is the traditional MNet input.

Questions I had:

  1. I don't understand what the output of this should be. I ran it myself and I get a Float32 Array of shape (1,150528). Is this how the MXNet Resnet/MobileNetv1 expects the input to be? Would it be possible to share a quick overview of what exactly you do here so I can manipulate it appropriately for my model?
  2. Is there a way to generate this array-like data without using a Canvas to draw it (Parse URI directly)
  3. Can TVM web dist be minified to make it more storage efficient?
  4. Is there a better way to reach you? I'm doing a ton of work trying to understand this side of TVM and it would help to be able to reach out to you to learn more.

videetparekh avatar May 13 '20 21:05 videetparekh