xtensor icon indicating copy to clipboard operation
xtensor copied to clipboard

Create `xt::xvector`

Open tdegeus opened this issue 4 years ago • 19 comments

In some cases it would be great to have features like push_back and insert for a 1-d container (like std::vector)

tdegeus avatar Apr 12 '20 09:04 tdegeus

Let's create xt::xvector since it will be an expression with broadcasting abilities. We first need to update svector so it provides push_front / pop_front methods.

JohanMabille avatar Apr 12 '20 16:04 JohanMabille

Actually push_front and pop_front are not part of the std::vector API. This does not prevent to add them to svector, however the implementation will not be optimal. Do you need these methods or can we omit them for now?

JohanMabille avatar Apr 12 '20 20:04 JohanMabille

Let's omit them for now. We can always add them if they are requested

tdegeus avatar Apr 13 '20 07:04 tdegeus

I wanted to start with implementing xt::xvector, but in fact I need some guidance.

  • I should probably follow https://xtensor.readthedocs.io/en/latest/external-structures.html#embedding-a-full-tensor-structure ?
  • In fact I realised that I cannot find where xtensor and xarray are defined as classes.
  • I thought it should be completely independent of xt::svector, how should I place your earlier remark?

tdegeus avatar Apr 13 '20 08:04 tdegeus

Ha I started yesterday! Let me know if you want to do it anyway, that can be a good way to dive into the core of xtensor.

I should probably follow https://xtensor.readthedocs.io/en/latest/external-structures.html#embedding-a-full-tensor-structure ?

So basically, the xvector class will be very similar to the xtensor one, with some specific features:

  • it will not inherit form xstrided_container but directly from xcontainer and will implement specific reshape and resize methods.
  • It can embed strides and shape as std::array<size_t, 1>. The stride should be constantly equal to 1; or it can return std::array on demand, built from the underlying data size for the shape, and 1 for the strides.
  • push_back, insert and remove will forward to the underlying data contianer and update the shape as well (if no returned on demand).

In fact I realised that I cannot find where xtensor and xarray are defined as classes.

xtensor and xarray are specializations of xtensor_container and xarray_container with a default data container which is uvector. You can find their definition in xtensor_forward.hpp

I thought it should be completely independent of xt::svector, how should I place your earlier remark?

Like xarray is a specialization of xarray_container, xvector should be a specialization of xvector_container for a default data container. This default data container should be xt::svector instead of std::vector because this latter always initializes its content even when it is about to be assigned the result of a computation.

JohanMabille avatar Apr 13 '20 19:04 JohanMabille

That was part of the reason that I started on it, to learn the internals. At the same time, if you already have it, or allocated time for it, I don't want to stop you.

tdegeus avatar Apr 14 '20 09:04 tdegeus

Well I don't have that much, so I let you do it ;). Happy to give some guidance if you need, do not hesitate to open a PR early so we can both comment on it.

JohanMabille avatar Apr 14 '20 20:04 JohanMabille

I could really use this functionality. It feels weird jumping back and forth between xtensor and std when trying to do essentially numpy.Ndarray.append(). Love xtensor!

spectre-ns avatar Mar 17 '22 17:03 spectre-ns

@spectre-ns would you be willing to help with this contribution?

tdegeus avatar Mar 18 '22 13:03 tdegeus

@tdegeus I saw that here was a pull request in draft. How much work remains?

spectre-ns avatar Mar 18 '22 14:03 spectre-ns

Honestly, I don't remember. Normally making this class should not be very hard as is subclasses classes that provide almost everything.

I can look in the next two weeks, but please do not feel slowed down by this, we welcome a new PR!

tdegeus avatar Mar 18 '22 14:03 tdegeus

@tdegeus I'm thinking for the functionality I'm looking for I would be better served to implement this part of the numpy API. https://numpy.org/doc/stable/reference/generated/numpy.append.html

Is there something about the implementation of xarray<T> that prevents it from being appended to with another xexpression<T>?

spectre-ns avatar Mar 18 '22 21:03 spectre-ns

I think it should be perfectly implementable. Note, however, that according to the documentation you link there is copying involved each time. So I guess that is is quite a costly function.

tdegeus avatar Mar 21 '22 13:03 tdegeus

Yeah I agree. There is an upper bound on the problem I'm trying to solve. I think the best approach would be to allocate the upper bound then trim the remainder away at the end... Always appreciate your insight!

spectre-ns avatar Mar 21 '22 14:03 spectre-ns

A problem is that I think there is no way to trim away memory in xtensor :

https://xtensor.readthedocs.io/en/latest/quickref/basic.html?highlight=resize#resize

I actually forgot why that is. Do you remember @JohanMabille ?

tdegeus avatar Mar 21 '22 14:03 tdegeus

@tdegeus because we didn't take the time to implement it (preserving the elements is not trivial).

@spectre-ns I think you could reuse the concatenate and flatten functions to provide a first implementation of append. It would not be optimal, nor lazy, but a least it would do the job.

The optimized solution would be to crate a new kind of expression that would maintain internally the different arrays and / or appened values and make them appear as if they were in a single array. But that requires more work (but that is still easier than doing so for the concatenante function).

JohanMabille avatar Mar 22 '22 09:03 JohanMabille

@JohanMabille So that is what I don't get. If storage is contiguous, and data is stored as a std::vector<T>, then shrinking should preserve data (and even pointers to it I think).

tdegeus avatar Mar 22 '22 09:03 tdegeus

the problem is not in 1D, but for higher dimension. Let's consider the following tensor:

{{ 0, 1, 2 }, { 3, 4, 5 }}

Its internal buffer is [ 0, 1, 2, 3, 4, 5 ]. Now let's say you want to resize it to (2, 2) so that it becomes:

{{ 0, 1 }, { 3, 4 }}

Now its internal buffer is [ 0, 1, 3, 4 ]. So you have to move the second row in the internal buffer so that it becomes contiguous to the first (new) one.

JohanMabille avatar Mar 23 '22 08:03 JohanMabille

That is indeed perfectly ok, but my question arose from the docs : I got the impression that even in 1d items are not guaranteed to be preserved, which I do think should be the case. But now it seems that they might be. It would be good to add a note in the docs.

tdegeus avatar Mar 23 '22 08:03 tdegeus