kibitzr icon indicating copy to clipboard operation
kibitzr copied to clipboard

How to process multiple elements?

Open egvimo opened this issue 4 years ago • 8 comments

Is there a way to process multiple elements (e.g. an HTML list or JSON Array) and be notified for each item?

egvimo avatar Feb 02 '22 18:02 egvimo

Yes, if you want one long notification report with all the changes. Or if the list is short and fixed, you can have multiple similar checks. In general, pipeline is linear and fan out capabilities are not supported.

peterdemin avatar Feb 02 '22 18:02 peterdemin

I've already managed to create a notification with the whole list.

Do you think it would be possible to create such feature? Maybe as a batch transformer:

checks:
  - name: Some Batch Check
    url: https://some-url.net
    transform:
      - css-all: .some-class
      - batch # Maybe with ordering (if the oldest items are on the bottom) -> batch: reverse
      - changes: batch # Remembers every item and allows only new item to pass
    notify:
      - python: print(content) # Would print every (new) item

egvimo avatar Feb 03 '22 07:02 egvimo

Would "changes: new" do the trick? Can batch be implemented as Python transformer?

peterdemin avatar Feb 03 '22 12:02 peterdemin

After looking through the code, here is my idea to realize this.

Additionally to the existing css and xpath transformers, there would be a -list (or -batch or something similar) transformer. After all css and xpath are the only ones which fetches more than one element.

There are different ways to handle this, but I think this part has to be changed:

https://github.com/kibitzr/kibitzr/blob/1d11033f884175afcfd21dc97548c61e6c8db1dc/kibitzr/checker.py#L39-L43

The content and the report would become a list. This way they could be iterated inside the transform and notify parts. Current transformers would simply return a list with only a single element, but the new transformers can return more than one.

This way it would be possible to add other transformers which produce multiple elements. For example a list transformer, which would do the same as the text transformer, but return each line as a separate item.

What do you think?

egvimo avatar Feb 05 '22 19:02 egvimo

I don't think changing the return type of all parts to the list is a viable approach. Instead, I think we should use a queue. It can be built on top of a stash. One check would use Python notifier to put all the items and add them to a queue. Another job will handle queue items one by one. We might either add a new source type (queue), or hack it using Python fetcher. If implemented as a new fetcher type (queue: ), this would be pretty extendable, flexible, and efficient. Queues need to be persisted, and the only existing persistence mechanism is stash, so each queue could be a separate key in the stash (queue-) with a list or deque value type. Does it sound good? Would it cover your use-case?

peterdemin avatar Feb 07 '22 00:02 peterdemin

It's not necessary to change all parts, only the following three.

The main connection between fetcher, transformer and notifier:

https://github.com/kibitzr/kibitzr/blob/1d11033f884175afcfd21dc97548c61e6c8db1dc/kibitzr/checker.py#L39-L43

Then this transformer factory loop would get another loop to iterate over all items:

https://github.com/kibitzr/kibitzr/blob/1d11033f884175afcfd21dc97548c61e6c8db1dc/kibitzr/transformer/factory.py#L49-L59

And the same for the notifier loop here:

https://github.com/kibitzr/kibitzr/blob/1d11033f884175afcfd21dc97548c61e6c8db1dc/kibitzr/notifier/factory.py#L64-L73

The advantage of this approach would be the ability to use every transformation on the item list.

In my opinion the main challenge would be the changed transformer because it have to know the lists.


If I understood you correctly, your approach would be to implement a queue as a transformer and a dequeue for the notifications? Something like that:

checks:
  - name: Some Batch Check
    url: https://some-url.net
    transform:
      - css-all: .some-class
      - queue # How does the queue know which elements are there?
      - changes: queue
    notify:
      - dequeue
      - python: print(content) # Would print every (new) item

egvimo avatar Feb 09 '22 11:02 egvimo

Here's my idea:

checks:
  - name: Some Batch Check
    url: https://some-url.net
    transform:
      - css-all: .some-class
      - changes: new
    notify:
      - python: |
           for line in lines:
               queue.push(name='process', content=line)   # queue is passed as a local variable into the Python script.

  - name: Process Single Item
    queue: process  # New type of fetcher, that reads items from the queue one by one.
    notify:
      - python: print(content)

peterdemin avatar Feb 10 '22 21:02 peterdemin

I think this would cover only the basic cases, because the css-all would return HTML as a single result and changes: new can not split it again, so you are forced split by lines.

egvimo avatar Feb 18 '22 07:02 egvimo