cdo-bindings Apply new syntax for calls with mutiple inputs

With the new syntax cdo.op1().op2().op3().run() the question comes up: How to handle operators with multiple inputs. Here is one idea I'd like to discuss @Chilipp

cdo -add -fldmean -topo,global_0.2 -fldmax -topo,'global_1' /tmp/CdoTemp234.grb

generated from

cdo = Cdo()
cdo.add(input=[cdo.fldmean.topo('global_0.2'),cdo.fldmax.topo('global_1')]).run()

Any other idea how to write this down?

Oct 21 '19 15:10 Try2Code

Hey @Try2Code!

Good idea and indeed, this is pretty simple to be implemented.

But let me first note that the call

cdo.add(input=[cdo.fldmean.topo('global_0.2'),cdo.fldmax.topo('global_1')])

raises an error. You specify the input parameter here which, for backwards compatibility, runs the cdo operator. As such, if any kwargs are given to the method, the computation is performed, as controlled by the following lines

see https://github.com/Try2Code/cdo-bindings/blob/d4dcd6a1cf200f269e97bbbca21c0a09e4f5ec22/python/cdo.py#L445-L448

The returned object from your cdo.add(...) call (a string, or xarray Dataset, or whatsoever) will therefore not have a run method. Within the new framework, you need

cdo.add.infile(cdo.fldmean.topo('global_0.2'),cdo.fldmax.topo('global_1')).run()

And this is also, where I would suggest to implement it, in the infile method

https://github.com/Try2Code/cdo-bindings/blob/d4dcd6a1cf200f269e97bbbca21c0a09e4f5ec22/python/cdo.py#L355-L366

My suggestion is to simply combine the _cmd attribute in case of a Cdo object by adding these two lines to the method:

      elif isinstance(infile, Cdo):
        self._cmd.extend(infile._cmd)

In the end, the Cdo.infile method then looks like

  def infile(self, *infiles):
    for infile in infiles:
      if isinstance(infile, six.string_types):
        self._cmd.append(infile)
      elif isinstance(infile, Cdo):
        self._cmd.extend(infile._cmd)
      elif self.hasXarray:
        import xarray #<<-- python2 workaround
        if (type(infile) == xarray.core.dataset.Dataset):
          # create a temp nc file from input data
          tmpfile = self.tempStore.newFile()
          infile.to_netcdf(tmpfile)
          self._cmd.append(tmpfile)
    return self

Oct 21 '19 16:10 Chilipp

yes, I was just suggesting a syntax - never meant to be able to run it right away ;-)

thx for your thoughts! I was thinking in a similar way since I want to implement this in Ruby,too.

Oct 21 '19 16:10 Try2Code

awesome 👍 Let me know when you have more of these nice ideas 😃

Oct 21 '19 17:10 Chilipp

Hey @Try2Code! What came to my mind: the infile method might be a bit counter-intuitive from a Pythonic perspective. It makes sense in terms of CDOs, so I think we should implement this, but I would suggest to also combine Cdo operators through the normal + operator, i.e. something like

command = cdo.add + cdo.fldmean.topo('global_0.2') + cdo.fldmax.topo('global_1')
command.run()

This should be very easy to implement by adding an __add__ and __iadd__ method to the Cdo class. Or do you think it would be to confusing because CDOs also have an add operator?

Oct 22 '19 06:10 Chilipp

this looks compelling, but I it's essentially an additional operator + on top of add :thinking:

Do you have this in mind for operators that have multiple input stream? or in general? What I like about it, is that it allow to write things parenthesis-free. but I fear a little the freedom to do so :rofl:

The + can also be regarded as a replacement for . The correct command line could also be generated by

cdo.add.fldmean.topo('global_0.2').fldmax.topo('global_1')

because the CDO binary is smart enough to parse it. It just looks way too confusing in python.

I see problems when using the + operator in situation where the CDO operators using it is not commutative (like ```sub``). Brackets are less beautiful, but easier to understand - a certain order can be enforced with them,too.

But that's what I like about this re-design: we are free to experiment with the syntax :+1:

Oct 22 '19 08:10 Try2Code

Because you mentioned backward compatibility before wrt using input or infile(): IMO we can drop it. I am willing to maintain the last python release for a while in parallel. I don't get many bug reports, so I do not expect a big amount of work from that. What we plan here is more a disruptive 2.0 version to me - I want to rethink this as a whole.

Oct 22 '19 08:10 Try2Code

Do you have this in mind for operators that have multiple input stream? or in general?

I would have implemented it in general since there are (in Python) no checks implemented so far, how many inputs streams the various CDO operators have.

The + can also be regarded as a replacement for .

You're absolutely right. The only advantage of the + would be the reusability of operators. For instance something like this would be possible:

cdo1 = cdo.topo('global_0.2')
cdo2 = cdo.fldmean + cdo1
final = cdo.sub + cdo2 + cdo1
final.run()

I see problems when using the + operator in situation where the CDO operators using it is not commutative (like ```sub``)

Yes, this is indeed true it should be mentioned in the documentation that the + operation is not commutative by design. For python, this is however not a problem because the + operation does not need to be commutative. a + b will always be evaluated as a.__add__(b).

Brackets can be used, but with the proposed design, they would not have an effect because of its associative property (i.e. (cdo1 + cdo2) + cdo3 = cdo1 + (cdo2 + cdo3) =...). But of course, one can put things in brackets for the purpose of a cleaner code.

Oct 22 '19 08:10 Chilipp

What we plan here is more a disruptive 2.0 version to me - I want to rethink this as a whole.

Sounds good to me and I am happy to contribute 😃

Oct 22 '19 09:10 Chilipp

Because you mentioned backward compatibility before wrt using input or infile(): IMO we can drop it.

Yes, I agree, then we should drop the infile method entirely, and just use the input parameter. Actually, I think it would be cleaner to change this to an infile parameter, because this is how it is framed in the CDO docs. I propose something like

cdo.add(infile=[cdo.fldmean.topo('global_0.2'), cdo.fldmax.topo('global_1')]).run()

This also avoids confusion with pythons built-in input function. But it's up to you, keeping input as a parameter for the operator call works well, too.

Oct 22 '19 09:10 Chilipp

just because we talk about brackets so much: I wrote a short article about the introduction of brackets on the command line for special cases where the parenthesis-free version limits what you can do: in the case of arbitrary number of input streams (https://code.mpimet.mpg.de/boards/53/topics/6702)

I don't use this in cdo.py or cdo.rb because it's still rather new (ok, it's already a year old). But in fact it allows functionality that was not possible before. Maybe it's worth taking this into account,too

Oct 22 '19 09:10 Try2Code

I wrote a short article about the introduction of brackets on the command line for special cases where the parenthesis-free version limits what you can do: in the case of arbitrary number of input streams

Awesome! I did not know about this new feature :heart_eyes:. It is indeed a good question how to handle this, because the brackets in cdo1 + (cdo2 + cdo3) are not communicated to the python calls. I have to think about it...

But using the input/infile parameter would allow this, i.e. it would be possible to translate

cdo.add(infile=[cdo.fldmean.topo('global_0.2'), cdo.fldmax.topo('global_1')]).run()

into

cdo -add [ -fldmean -topo.global_0.2 -fldmax -topo,global_1 ]

Oct 22 '19 09:10 Chilipp

yes, it would mimic the brackets on the command line to a certain extent

Oct 22 '19 10:10 Try2Code

yes, but as far as I understand the new [ ] notation, it also forces to specify all input streams in this call. So something like

cdo.add(infile=cdo.fldmean.topo('global_0.2')) + cdo.fldmax.topo('global_1')

won't work because it would be translated to

cdo -add [ -fldmean -topo.global_0.2 ] -fldmax -topo,global_1

Oct 22 '19 10:10 Chilipp

correct - the above call does not work, it has to be cdo -add [ -fldmean -topo,global_0.2 -fldmax -topo,global_1 ]

infile and + are not interchangeable here. We should not have both I guess. IMO infile is better, because [] is designed to collecting/marking inputs for a certain operator and the infile key is meant for exactly that

Oct 22 '19 10:10 Try2Code

Considering the brackets in the '+' operation. I think the best possibility would be to allow strings in this operation and then let the user insert the brackets where he wants. For example via

cdo.add + '[' + cdo.fldmean.topo('global_0.2') + cdo.fldmax.topo('global_1') + ']'

Oct 22 '19 10:10 Chilipp

We should not have both I guess.

So you mean we should not implement the + operation at all, right?

Oct 22 '19 10:10 Chilipp

Considering the brackets in the '+' operation. I think the best possibility would be to allow strings in this operation and then let the user insert the brackets where he wants. For example via
cdo.add + '[' + cdo.fldmean.topo('global_0.2') + cdo.fldmax.topo('global_1') + ']'

hm, this looks more and more like string processing. In fact that's what we do internally :rofl: But originally I wanted to hide this behind methods calls and parameters

Oct 22 '19 10:10 Try2Code

hm, this looks more and more like string processing

Well, we could also use pythons __getitem__ method and specify input parameters with []:

cdo.add[cdo.fldmean.topo('global_0.2') + cdo.fldmax.topo('global_1')]

or

cdo.add[cdo.fldmean.topo('global_0.2'), cdo.fldmax.topo('global_1')]

Oct 22 '19 10:10 Chilipp

but personally I do not like this approach to much, because one normally uses this so subset an array which is different than specifying input parameters to an operator call

Oct 22 '19 10:10 Chilipp

yes, it doesn't really tell what it's doing. I like to use methods or keywords for separating things and be expressive at the same time. Btw: we concentrated on the input, but what about output? We could have an outfile key or method call? it could also be a parameter of the run() method ...

Oct 22 '19 10:10 Try2Code

yes, it doesn't really tell what it's doing

👍 So I propose the following strategy:

The recommended usage for input streams is to specify them with the infile parameter in the operator call and then the CDO library puts brackets around them
If we go for the + operation (and I think the reusability mentioned in https://github.com/Try2Code/cdo-bindings/issues/23#issuecomment-544868538 is a very good reason for it), we should also allow strings in the + operation. It is straight-forward to implement and allows something like

base_op = cdo.add + 'base.nc'
op1 = base_op + 'infile1.nc'
op2 = base_op + 'infile2.nc'

which would not be possible with the cdo.add(infile=...) syntax.

Oct 22 '19 11:10 Chilipp

we concentrated on the input, but what about output?

So at the moment this is implemented in the run method and I would leave it there. I think this makes the most sense, right? Especially if we remove the backward compatibility issue mentioned above (i.e. cdo.add(infile=...) does not automatically run the cdo command in the shell).

Oct 22 '19 11:10 Chilipp

I agree - the output is only important in moment of execution. We can avoid this before

Oct 22 '19 11:10 Try2Code

hey @Try2Code! I just want to write down some points of what we talked about quickly before my talk.

My proposal is to transform each cdo operator into a string surrounded by [ ]. Let's define

cdo1 = cdo.fldmean.topo('global_0.2')
cdo2 = cdo.fldmean.topo('global_1')

cdo.add(input=[cdo1, cdo2]) would become equivalent to something like

cdo -add [ -fldmean -topo,global_0.2 -fldmean -topo,global_1 ]

Concerning the + operation, the question is whether we would not want to keep this for the next release :wink: Otherwise, if you want to continue the discussion about this,

here are my thoughts concerning `+`:

Something like cdo.add + cdo1 + cdo2 could become equivalent to

cdo -add [ -fldmean -topo,global_0.2 ] [ -fldmean -topo,global_1 ]

but this is probably not working, right? So we should rather not use brackets for the + and restrict the bracket functionality to the input parameter of the operators. i.e. cdo.add(input=...) gives the result I mentioned above, and cdo.add + cdo1 + cdo2 gives the current behaviour, i.e. something like cdo -add -fldmean -topo,global_0.2 -fldmean -topo,global_1 without any brackets.

Nov 18 '20 00:11 Chilipp

cdo-bindings cdo-bindings copied to clipboard

Apply new syntax for calls with mutiple inputs

cdo-bindings
cdo-bindings copied to clipboard