pycuda icon indicating copy to clipboard operation
pycuda copied to clipboard

3D Textures not working with uint8 type

Open ichlubna opened this issue 3 years ago • 5 comments

I have edited the test example and changed the type from float to uint8, expecting to use it as unsigned char in the kernel. The output values are, however not the same, they seem to be bit-shifted or misaligned when being read from the memory. Below is the code, the expected result is that the printed values at the end are the same (simplified to copying of just the first value). Now I am getting the expected 20 and the wrong 161. I apologize if this is my lack of understanding but I haven't found anything regarding this topic in the docs and thus I consider this a bug.

import numpy as np
import pycuda.driver as drv
import pycuda.autoinit
from pycuda.compiler import SourceModule

shape = (2, 4, 8)

a = np.asarray(np.full(shape,20), dtype=np.uint8, order="F")
descr = drv.ArrayDescriptor3D()
descr.width = shape[0]
descr.height = shape[1]
descr.depth = shape[2]
descr.format = drv.dtype_to_array_format(a.dtype)
descr.num_channels = 1
descr.flags = 0
ary = drv.Array(descr)
copy = drv.Memcpy3D()
copy.set_src_host(a)
copy.set_dst_array(ary)
copy.width_in_bytes = copy.src_pitch = a.strides[1]
copy.src_height = copy.height = shape[1]
copy.depth = shape[2]
copy()

mod = SourceModule(
    """
texture<unsigned char, 3, cudaReadModeElementType> mtx_tex;

__global__ void copy_texture(unsigned char *dest)
{
  if(threadIdx.x + threadIdx.y + threadIdx.z == 0)
      dest[0] = tex3D(mtx_tex, 0, 0, 0);
}
"""
)
copy_texture = mod.get_function("copy_texture")
mtx_tex = mod.get_texref("mtx_tex")
mtx_tex.set_array(ary)
dest = np.zeros(shape, dtype=np.uint8, order="F")
copy_texture(drv.Out(dest), block=shape, texrefs=[mtx_tex])
print(a[0][0][0])
print(dest[0][0][0])

This seems to be related to the memory alignment rules. When the descriptor is set to uint32 (descr.format = drv.dtype_to_array_format(np.uint32)) instead of the input array type, the right values can be obtained when accessed as unsigned char in the kernel. This is a workaround, or a correct solution? In case of 4 8bit channels, it would be necessary to pass the values reinterpreted as an array of integers?

ichlubna avatar Oct 24 '22 11:10 ichlubna

Here is the working example of both one and four channels. Is this really the right approach?

import numpy as np
import pycuda.driver as drv
import pycuda.autoinit
from pycuda.compiler import SourceModule

width = 100
height = 200
count = 5
blockSize = (256,1,1)

def copyAndGetArr(a):
    descr = drv.ArrayDescriptor3D()
    descr.width = width
    descr.height = height
    descr.depth = count
    descr.format = drv.dtype_to_array_format(np.uint32)
    descr.num_channels = 1
    descr.flags = 0
    ary = drv.Array(descr)
    copy = drv.Memcpy3D()
    copy.set_src_host(a)
    copy.set_dst_array(ary)
    copy.width_in_bytes = copy.src_pitch = a.strides[1]
    copy.src_height = copy.height = height
    copy.depth = count
    copy()
    return ary

def one():
    a = np.full((count, height, width), 20, np.uint8)
    ary = copyAndGetArr(a)

    mod = SourceModule(
        """
    texture<unsigned char, 3, cudaReadModeElementType> mtx_tex;

    __global__ void copy_texture()
    {
      if(threadIdx.x == 0)
          printf("\\n %d \\n", tex3D(mtx_tex, 0, 2, 1));
    }
    """
    )
    copy_texture = mod.get_function("copy_texture")
    mtx_tex = mod.get_texref("mtx_tex")
    mtx_tex.set_array(ary)
    shape = (256,1,1)
    copy_texture(block=blockSize, texrefs=[mtx_tex])

def four():
    channels = 4
    a = np.full((count, height, width, channels), 20, np.uint8)
    a[0][0][0][0] = 1;
    a[0][0][0][1] = 2;
    a[0][0][0][2] = 3;
    a[0][0][0][3] = 4;
    ary = copyAndGetArr(a.view(">u4"))

    mod = SourceModule(
        """
    texture<int, 3, cudaReadModeElementType> mtx_tex;

    __global__ void copy_texture()
    {
      if(threadIdx.x == 0)
      {
        auto px{tex3D(mtx_tex, 0, 0, 0)};
        auto p{*reinterpret_cast<uchar4*>(&px)};
        printf("\\n %d %d %d %d \\n", p.x, p.y, p.z, p.w);
      }
    }
    """
    )
    copy_texture = mod.get_function("copy_texture")
    mtx_tex = mod.get_texref("mtx_tex")
    mtx_tex.set_array(ary)
    copy_texture(block=blockSize, texrefs=[mtx_tex])

one()
four()

ichlubna avatar Oct 24 '22 15:10 ichlubna

What format comes out of descr.format = drv.dtype_to_array_format(a.dtype)? Is that the correct one in your view?

inducer avatar Oct 24 '22 19:10 inducer

@inducer Yes, the line is taken from the test example, right? I would expect it to simply contain the type of one element since we can also provide number of channels etc. It is this in the first code I posted:

print(a.dtype)
descr.format = drv.dtype_to_array_format(a.dtype)
print(descr.format)

Output:

uint8
UNSIGNED_INT8

I expected the descriptor to deal with the alignment since we need to set all the attributes.

Where is the problem? Is the second example I posted the way it should be used? Why can't I access the elements in kernel as uchar4 directly? Or how to do so?

ichlubna avatar Oct 25 '22 06:10 ichlubna

That looks OK. Could you reference the Nvidia docs on how this is intended to work? It's been years since I've directly worked with textures, I don't remember.

inducer avatar Oct 25 '22 14:10 inducer

Hmm it seems like the plain Cuda way is slightly different, looking here and here. What I would expect in pycuda is this:

a = np.full((count, height, width), 20, np.uint8)
descr.format = a.dtype
descr.num_channels = 1
...
texture<unsigned char, 3, cudaReadModeElementType> mtx_tex

We have a texture of unsigned char values. The type is set to the right type and one channel. When reading the values with tex3D, I would expect to see the one channel value as the result. Now I am getting 161 instead of 20 which I don't understand, since all values are bytes - how would alignment break such values?

Here:

channels = 4
a = np.full((count, height, width, channels), 20, np.uint8)
descr.format = a.dtype
descr.num_channels = channels
...
texture<uchar4, 3, cudaReadModeElementType> mtx_tex

I would expect the classic RGBA texture. Right now, it seems like the num_channels doesn't do much to the result. The only way I made it work are the examples above.

My concern or misunderstanding is about the combination of descr.format and descr.num_channels in regards to the texture definition and reading in the kernel.

ichlubna avatar Oct 26 '22 09:10 ichlubna