thrust Add output_writer

trafficstars

This iterator serves as a generalized version of output iterator. Typical usage is output_writer_iterator(index_iterator, [result_begin](auto i, auto v) { result_begin[i]=v;});

Sometimes output iterator has to be complex. This output writer iterator can help achieve writing to complex data structures using binary function object or lambda which accepts an index and a value to write. Example given in documentation is achieving functionality similar to std::bitset using a thrust::device_vector.

Dec 03 '21 19:12 karthikeyann

Updating with some internal conversations:

@harrism suggested indexed_iterator. This seems like the best candidate so far IMO, since the key feature of this iterator is that it provides the current index to the functor(s).

What would the input and input/output cases look like?

Input:

struct adj_diff_functor
{
  const int *input;

  __device__
  int operator()(std::size_t i) const
  {
    return input[i + 1] - input[i];
  }
};

auto indices = thrust::make_counting_iterator(0);
thrust::device_vector<int> data{0, 1, 4, 7, 9, 12};
auto iter = thrust::make_decorated_input_iterator(indices, adj_diff_functor{data.data()});
auto iter_end = iter + data.size() - 1;

// [iter, iter_end) -> {1, 3, 3, 2, 3}

@benbarsdell correctly points out that this could be implemented using existing counting + transform iterators.

Input/output is a more interesting case that cannot be implemented using other existing fancy iterators:

struct input_functor
{
  // Produces ones[i] + 10*tens[i]
  const int *ones; // Always 0-9
  const int *tens;
  
  __device__
  int operator()(std::size_t i) const
  {
    return ones[i] + 10 * tens[i];
  }
};

struct output_functor
{
  // Decomposes input values to perform the inverse of input_functor
  int *ones; // Always 0-9
  int *tens;
  
  __device__
  void operator()(std::size_t i, int val) const
  {
    ones[i] = val % 10;
    tens[i] = val / 10;
  }
};

thrust::device_vector<int> ones = {1, 5, 3, 2, 6};
thrust::device_vector<int> tens = {3, 2, 6, 7, 3};

auto indices = thrust::make_counting_iterator(0);
auto iter = thrust::make_decorated_iterator(indicies,
                                            input_functor{ones.data(), tens.data()},
                                            output_functor{ones.data(), tens.data()});
auto iter_end = iter + ones.size();

// [iter, iter_end) -> {31, 25, 63, 72, 36}

iter[0] = 19;
iter[1] = 42;
iter[2] = 98;
iter[3] = 132;
iter[4] = 15;

// ones -> {9, 2, 8, 2, 5}
// tens -> {1, 4, 9, 13, 1}

Jan 13 '22 14:01 alliepiper

@harrism suggested indexed_iterator. This seems like the best candidate so far IMO, since the key feature of this iterator is that it provides the current index to the functor(s).

Looking to thrust::tabulate for inspiration, how about tabulate_iterator?

Jan 13 '22 14:01 jrhemstad

@harrism suggested indexed_iterator. This seems like the best candidate so far IMO, since the key feature of this iterator is that it provides the current index to the functor(s).

Looking to thrust::tabulate for inspiration, how about tabulate_iterator?

I like tabulate_iterator for the case where you cannot provide an iterator that generates the indices. But if, as in @karthikeyann 's original example, we enable passing an iterator that generates the indices, these may not just be sequential/counting. In that case indexed_iterator may be a better name. For example you may want to pass a counting transform iterator to generate indices with a stride, or a reverse counting iterator, etc.

Jan 14 '22 00:01 harrism

I like tabulate_iterator for the case where you cannot provide an iterator that generates the indices. But if, as in @karthikeyann 's original example, we enable passing an iterator that generates the indices, these may not just be sequential/counting. In that case indexed_iterator may be a better name. For example you may want to pass a counting transform iterator to generate indices with a stride, or a reverse counting iterator, etc.

I've had to start at this iterator for quite a while to understand it's story.

There's nothing here that requires the adapted iterator to be an "index_iterator".

Generally, it looks like a cross between a zip and transform iterator. It binds iterator It0 (value_type is T) and a binary callable void F(T, U). Assigning any U u to this iterator at i results in invoking F(*(It0 + i), u).

Thinking about this in terms of std::bind or std::bind_front this iterator is effectively doing a partial binding of a binary callable, F(T,U), with it's first argument (It0). Assigning to the iterator provides the second argument, u, and invokes the callable.

When viewed through that lens, I'm thinking of a name more like bind_iterator or bound_iterator. Though with that naming it feels like this should be made more general to support a callable with N arguments where [0,N) of the arguments can be bound at iterator construction and the remaining arguments provided at assignment. It may take a bit more finesse, but I think that should be doable.

Thinking about @allisonvacanti's case of having this be both an input and output iterator (I only thought about the output case above), the input case wouldn't be a partial binding but a full binding of a callable with all of its arguments. Dereferencing the iterator will invoke the callable with the associated arguments provided by the bound iterators. (This is basically just a normal transform_iterator).

Jan 14 '22 16:01 jrhemstad

My omission of the "index" iterator from my later examples definitely clouded things.

With this new perspective, I'm back to liking decorated_iterator. From this summary of the design pattern:

Decorator pattern allows a user to add new functionality to an existing object without altering its structure.

This pattern creates a decorator class which wraps the original class and provides additional functionality keeping class methods signature intact.

This is an iterator that wraps all accesses to an existing iterator with decorator functions.

Jan 14 '22 17:01 alliepiper

Decorator pattern allows a user to add new functionality to an existing object without altering its structure. This pattern creates a decorator class which wraps the original class and provides additional functionality keeping class methods signature intact.

This is an iterator that wraps all accesses to an existing iterator with decorator functions.

What gives me pause about this is that this pattern/definition applies just as well to transform_iterator.

In fact, the input iterator case for the decorated_iterator is exactly equivalent to a transform_iterator. The only difference is in assignment (or using it as an output iterator).

I'm not sure the name decorated_iterator vs. transform_iterator conveys that difference.

Now I'm thinking that maybe this new iterator can just be a new overload of transform_iterator as opposed to a wholly new iterator.

Jan 14 '22 18:01 jrhemstad

What gives me pause about this is that this pattern/definition applies just as well to transform_iterator.

It is a pretty generic name, and also very similar to the existing iterator_adaptor. I'd love to find something more specific.

Now I'm thinking that maybe this new iterator can just be a new overload of transform_iterator as opposed to a wholly new iterator.

Hmm, perhaps. Let's compare their user interfaces directly:

{ // Transform:
  T = value_t<Iter>;
  U InFunc(T); // Used for reads
  T OutFunc(U); // Used for writes
  auto thrust::make_transform_iterator(Iter, InFunc, OutFunc);
}

{ // New Iterator
  T = value_t<Iter>;
  U InFunc(T); // Used for reads
  void OutFunc(T, U); // Used for writes
  auto thrust::make_XXX_iterator(Iter, InFunc, OutFunc);
}

The key difference is in OutFunc. While transform will derive a new value from the assignment arg and write it to Iter, the new iterator will never modify Iter's data. Iter's data is only used as an input to modify state captured in the functors.

Jan 14 '22 18:01 alliepiper

Iter's data is only used as an input to modify state captured in the functors.

Which implies that this iterator is only useful with some external state, which could potentially be abstracted out:

{ // New Iterator with abstracted state:
  using T = value_t<Iter>;
  U InFunc(State, T); // Used for reads
  void OutFunc(State, T, U); // Used for writes
  auto thrust::make_XXX_iterator(Iter, State, InFunc, OutFunc);
}

This could be used to simplify the functors so they don't have to capture the same state in their members:

{ // current
  struct input_functor
  {
    const State *state;  
    U operator()(T) const;
  };

  struct output_functor
  {
    State *state;  
    void operator()(T, U);
  };

  State *state = ...;
  auto iter = thrust::make_XXX_iterator(
    indicies,
    input_functor{state},
    output_functor{state});
}

{ // with state abstraction
  struct input_functor
  {
    U operator()(const State&, T) const;
  };

  struct output_functor
  {
    void operator()(State&, T, U);
  };

  State *state = ...;
  auto iter = thrust::make_XXX_iterator(
    indicies,
    state,
    input_functor{},
    output_functor{});
}

I'm unsure whether this abstraction would be worthwhile, but I think a key point here is that this iterator does require extra state to do anything interesting.

Jan 14 '22 18:01 alliepiper

I think a key point here is that this iterator does require extra state to do anything interesting.

Yeah, I noticed that too. I'm not a fan of the fact that the binary callable doesn't return anything, which means it has to have side-effects to do anything useful.

Jan 14 '22 19:01 jrhemstad

I'm not a fan of the fact that the binary callable doesn't return anything, which means it has to have side-effects to do anything useful.

I don't think that'll be a problem. For instance, std::for_each uses a functor with no return value and relies on side effects, so it's not uncommon.

Do you see a different way to implement this functionality?

Jan 14 '22 21:01 alliepiper

The main reason for creating this iterator is to provide a generalized output iterator. For input iterator, we can create counting_iterator and use a make_transform_iterator to translate it to access any datastructure. But for creating output writer, AFAIK there is no way right now, except for the data structure to provide itself.

BinaryFunction is present only because first argument is supposed to the index. Instead of providing an pre-defined iterator index, I made it as an argument (which again gives more flexibility). Or else it could be UnaryFunction. If it is unary, tabulate_iterator or indexed_iterator name would make sense. but again it's missing output in the name.

Also, mixing input and output together would make it more than what it's designed for. My vote would be to limit this iterator's functionality to behave like output iterator only, and add new combined input and output iterator seperately.

May 11 '22 18:05 karthikeyann

The main reason for creating this iterator is to provide a generalized output iterator. For input iterator, we can create counting_iterator and use a make_transform_iterator to translate it to access any datastructure. But for creating output writer, AFAIK there is no way right now, except for the data structure to provide itself.

@karthikeyann's description reminded me that I explored an alternative for this approach awhile back in https://github.com/NVIDIA/cccl/issues/792.

However, coming back to this after some time, it occurs to me that we are effectively just trying to indirectly recreate projections via a complicated iterator.

That tells me there is probably a far more natural way to express this.

Curious for @ericniebler 's thoughts.

May 11 '22 22:05 jrhemstad

@jrhemstad In https://github.com/NVIDIA/cccl/issues/792 godbolt example, it still uses f.begin() as starting point for the output iterator. This output iterator is transformed using projections. Instead of f.begin(), can we use a counting_iterator(0), and still get an output iterator?

May 13 '22 18:05 karthikeyann

thrust
thrust copied to clipboard

Add output_writer_iterator

thrust thrust copied to clipboard

Add output_writer_iterator

thrust
thrust copied to clipboard