micropython-ulab icon indicating copy to clipboard operation
micropython-ulab copied to clipboard

Dtype mod

Open v923z opened this issue 4 years ago • 11 comments

This PR adds the option to extend ulab in a transparent way. This means that the user is able to add their own data container in the C implementation, and if they supply a readout function, then various numpy methods should be able to access the data in the container. Such a facility could be exploited to process data that do not reside in RAM, either because they are not available, or because the amount would be prohibitive.

Two possible use cases are

  1. implementing complicated generator expressions
  2. processing image data that contain megapixels of information (openmv, https://github.com/openmv/openmv/issues/881; pixels in the image can be accessed via https://github.com/openmv/openmv/blob/master/src/omv/modules/py_image.c#L402)

The type definition of ndarray is extended with a blocks_block_obj_t structure: https://github.com/v923z/micropython-ulab/blob/3227831a0adfc70c090664d8c8f9ae212a9a220a/code/ndarray.h#L70-L94

In blocks_block_obj_t, a pointer to the readout function can be attached, *arrfunc, as well as a temporary container, *subarray, can be pointed to. The subarray has to be able to hold a single line of data, i.e., subarray must be at least as long as the longest axis of the tensor. This single line can than be passed to the innermost loop of all numerical functions, binary operators, etc. An example is the summation macro https://github.com/v923z/micropython-ulab/blob/3227831a0adfc70c090664d8c8f9ae212a9a220a/code/numpy/numerical/numerical.h#L61-L78

The user can then simply attach their readout function by defining a type https://github.com/v923z/micropython-ulab/blob/3227831a0adfc70c090664d8c8f9ae212a9a220a/code/user/user.c#L86-L127

The example above calculates the sum of squares.

In python, the mock-up looks like this

from ulab import blocks
from ulab import user
from ulab import numpy as np

f = blocks.ndarray(shape=(5,5), transformer=user.imreader(), dtype=np.uint8)

print(f)
print(np.sum(f, axis=1))
for i in f:
    print(i, np.sum(i, axis=0))

Slicing, indexing and the like happens in the usual way, since even in the standard case, such operations only update the array header, and move the position pointer.

In a numerical operation, a tensor is always traversed along an axis. Given the position of the data pointer, the coordinates of the pointer position can be calculated with the help of the size_t *blocks_coords_from_pointer(void *p1, ndarray_obj_t *ndarray) function: https://github.com/v923z/micropython-ulab/blob/3227831a0adfc70c090664d8c8f9ae212a9a220a/code/blocks/blocks.c#L30-L49, and hence the imreader can easily fill up the subarray.

Such a construct should not support the buffer protocol, since there might not be an easy way of resolving what should happen: https://github.com/v923z/micropython-ulab/pull/327#issuecomment-782923838, https://github.com/v923z/micropython-ulab/pull/327#issuecomment-782945688

Passing arguments to the type (imreader above) should be possible as, e.g., with ndarray.

Outstanding issues:

  1. sort out those functions that do not make sense for such a structure (e.g., rolling might not be relevant)
  2. what should happen with overflows? These could be handled by declaring the image of float type, but I am not sure, whether this would lead to problems later on.
  3. memory footprint, reducing the size of the extra payload in dtype. This last question is probably sorted out by attaching the extra structure to the ndarray only then, when it is needed. We carry only a pointer, but RAM is reserved for it only in https://github.com/v923z/micropython-ulab/blob/3227831a0adfc70c090664d8c8f9ae212a9a220a/code/blocks/blocks.c#L115

Functions and features

  • [ ] binary operators, broadcasting
  • [x] sum
  • [x] mean
  • [x] std
  • [ ] sorting
  • [ ] diff
  • [ ] flip
  • [ ] fft
  • [ ] convolve
  • [ ] interp, scipy.signal?
  • [ ] numerical functions
  • [ ] polyfit, polyval
  • [ ] comparison functions

v923z avatar Feb 18 '21 21:02 v923z

This is more or less what we need to support images. The data type we'd use would be a float. Since users have often asked to do weird things with images that you can't do with ints.

What questions do you need answered from me? Given a row/color_channel it's very easy for me to fill a float array with pixel values. I can also easily fill a column.

kwagyeman avatar Feb 19 '21 04:02 kwagyeman

@kwagyeman

This is more or less what we need to support images. The data type we'd use would be a float. Since users have often asked to do weird things with images that you can't do with ints.

I don't have to know what data type you want to hold, int, or float. This will be resolved automatically, when the function pointer is called.

What questions do you need answered from me? Given a row/color_channel it's very easy for me to fill a float array with pixel values. I can also easily fill a column.

I think, we are pretty much on the same page. Let me iron out the implementation, and we can then pick it up from there.

v923z avatar Feb 19 '21 06:02 v923z

@kwagyeman @iabdalkader What should happen with this construct, if the one wants to use the buffer protocol? Some context can be found here https://github.com/v923z/micropython-ulab/issues/335, and here https://github.com/v923z/micropython-ulab/issues/328,

The problem is that in https://github.com/v923z/micropython-ulab/blob/42212622ff3c49b5f03907aac8788a759e550ba0/code/ndarray.c#L2031-L2040 we have to set a pointer to the underlying data, which I won't hold, because self->array will not point to actual data.

I think the cleanest solution is to simply bail out, if the special flag is set for an ndarray. Do you agree? But you still have to point this out in your documentation.

v923z avatar Feb 21 '21 20:02 v923z

We have a buffer protocol for the image object already. There is no need to duplicate it. So bailing makes sense.

kwagyeman avatar Feb 21 '21 23:02 kwagyeman

We have a buffer protocol for the image object already. There is no need to duplicate it. So bailing makes sense.

OK, thanks!

v923z avatar Feb 22 '21 05:02 v923z

@v923z Would you like to get on our slack? Email [email protected]

kwagyeman avatar Feb 22 '21 06:02 kwagyeman

@v923z Would you like to get on our slack? Email [email protected]

@kwagyeman Thanks for the invitation! Sure!

v923z avatar Feb 23 '21 21:02 v923z

You need to email me since you hide all contact info on your public profile.

kwagyeman avatar Feb 24 '21 02:02 kwagyeman

@kwagyeman, @iabdalkader I have updated my original comment, and uploaded a working prototype. At the moment, only sum, std, and mean are supported, and you can iterate over the tensor elements. Adding the rest is not hard, but we should first converge on an interface function. The example

from ulab import user
from ulab import numpy as np

f = blocks.ndarray(shape=(5,5), transformer=user.imreader(), dtype=np.uint8)

print(f)
print(np.sum(f, axis=1))
for i in f:
    print(i, np.sum(i, axis=0))

creates an ndarray header with shape=(5,5), attaches the function pointer via user.imreader, and sets the dtype to uint8. From this point, f behaves like an ndarray, except, it fetches the data by means of the function pointer, if data are needed. I simply return square numbers in https://github.com/v923z/micropython-ulab/blob/3227831a0adfc70c090664d8c8f9ae212a9a220a/code/user/user.c#L95-L98. This is, where one would implement a function that reads actual pixels.

I think this solution is quite flexible (by which I mean that the ulab core doesn't have to know anything about the implementation of imreader, it can completely be detached), but I might very well have overlooked something. Let me know what you think.

What would, perhaps, be great is, if we could call the .print function of imreader, so that in https://github.com/v923z/micropython-ulab/blob/3227831a0adfc70c090664d8c8f9ae212a9a220a/code/blocks/blocks.c#L51-L65, we could also indicate, which transformer is used in a particular block. I haven't yet found a way of doing this, however.

v923z avatar Mar 04 '21 20:03 v923z

@v923z This is very interesting. I am unsure: is this only for reading out-of-RAM data? Do you need a corresponding write function?

dhalbert avatar Apr 02 '21 19:04 dhalbert

@dhalbert

is this only for reading out-of-RAM data? Do you need a corresponding write function?

At the moment, this would only read; we could think about adding a write function.

v923z avatar Apr 02 '21 20:04 v923z