cdo-bindings
cdo-bindings copied to clipboard
Apply new syntax for calls with mutiple inputs
With the new syntax cdo.op1().op2().op3().run() the question comes up: How to handle operators with multiple inputs.
Here is one idea I'd like to discuss @Chilipp
cdo -add -fldmean -topo,global_0.2 -fldmax -topo,'global_1' /tmp/CdoTemp234.grb
generated from
cdo = Cdo()
cdo.add(input=[cdo.fldmean.topo('global_0.2'),cdo.fldmax.topo('global_1')]).run()
Any other idea how to write this down?
Hey @Try2Code!
Good idea and indeed, this is pretty simple to be implemented.
But let me first note that the call
cdo.add(input=[cdo.fldmean.topo('global_0.2'),cdo.fldmax.topo('global_1')])
raises an error. You specify the input parameter here which, for backwards compatibility, runs the cdo operator. As such, if any kwargs are given to the method, the computation is performed, as controlled by the following lines
see https://github.com/Try2Code/cdo-bindings/blob/d4dcd6a1cf200f269e97bbbca21c0a09e4f5ec22/python/cdo.py#L445-L448
The returned object from your cdo.add(...) call (a string, or xarray Dataset, or whatsoever) will therefore not have a run method. Within the new framework, you need
cdo.add.infile(cdo.fldmean.topo('global_0.2'),cdo.fldmax.topo('global_1')).run()
And this is also, where I would suggest to implement it, in the infile method
https://github.com/Try2Code/cdo-bindings/blob/d4dcd6a1cf200f269e97bbbca21c0a09e4f5ec22/python/cdo.py#L355-L366
My suggestion is to simply combine the _cmd attribute in case of a Cdo object by adding these two lines to the method:
elif isinstance(infile, Cdo):
self._cmd.extend(infile._cmd)
In the end, the Cdo.infile method then looks like
def infile(self, *infiles):
for infile in infiles:
if isinstance(infile, six.string_types):
self._cmd.append(infile)
elif isinstance(infile, Cdo):
self._cmd.extend(infile._cmd)
elif self.hasXarray:
import xarray #<<-- python2 workaround
if (type(infile) == xarray.core.dataset.Dataset):
# create a temp nc file from input data
tmpfile = self.tempStore.newFile()
infile.to_netcdf(tmpfile)
self._cmd.append(tmpfile)
return self
yes, I was just suggesting a syntax - never meant to be able to run it right away ;-)
thx for your thoughts! I was thinking in a similar way since I want to implement this in Ruby,too.
awesome 👍 Let me know when you have more of these nice ideas 😃
Hey @Try2Code! What came to my mind: the infile method might be a bit counter-intuitive from a Pythonic perspective. It makes sense in terms of CDOs, so I think we should implement this, but I would suggest to also combine Cdo operators through the normal + operator, i.e. something like
command = cdo.add + cdo.fldmean.topo('global_0.2') + cdo.fldmax.topo('global_1')
command.run()
This should be very easy to implement by adding an __add__ and __iadd__ method to the Cdo class. Or do you think it would be to confusing because CDOs also have an add operator?
this looks compelling, but I it's essentially an additional operator + on top of add :thinking:
Do you have this in mind for operators that have multiple input stream? or in general? What I like about it, is that it allow to write things parenthesis-free. but I fear a little the freedom to do so :rofl:
The + can also be regarded as a replacement for . The correct command line could also be generated by
cdo.add.fldmean.topo('global_0.2').fldmax.topo('global_1')
because the CDO binary is smart enough to parse it. It just looks way too confusing in python.
I see problems when using the + operator in situation where the CDO operators using it is not commutative (like ```sub``). Brackets are less beautiful, but easier to understand - a certain order can be enforced with them,too.
But that's what I like about this re-design: we are free to experiment with the syntax :+1:
Because you mentioned backward compatibility before wrt using input or infile(): IMO we can drop it. I am willing to maintain the last python release for a while in parallel. I don't get many bug reports, so I do not expect a big amount of work from that. What we plan here is more a disruptive 2.0 version to me - I want to rethink this as a whole.
Do you have this in mind for operators that have multiple input stream? or in general?
I would have implemented it in general since there are (in Python) no checks implemented so far, how many inputs streams the various CDO operators have.
The
+can also be regarded as a replacement for.
You're absolutely right. The only advantage of the + would be the reusability of operators. For instance something like this would be possible:
cdo1 = cdo.topo('global_0.2')
cdo2 = cdo.fldmean + cdo1
final = cdo.sub + cdo2 + cdo1
final.run()
I see problems when using the + operator in situation where the CDO operators using it is not commutative (like ```sub``)
Yes, this is indeed true it should be mentioned in the documentation that the + operation is not commutative by design. For python, this is however not a problem because the + operation does not need to be commutative. a + b will always be evaluated as a.__add__(b).
Brackets can be used, but with the proposed design, they would not have an effect because of its associative property (i.e. (cdo1 + cdo2) + cdo3 = cdo1 + (cdo2 + cdo3) =...). But of course, one can put things in brackets for the purpose of a cleaner code.
What we plan here is more a disruptive 2.0 version to me - I want to rethink this as a whole.
Sounds good to me and I am happy to contribute 😃
Because you mentioned backward compatibility before wrt using input or infile(): IMO we can drop it.
Yes, I agree, then we should drop the infile method entirely, and just use the input parameter. Actually, I think it would be cleaner to change this to an infile parameter, because this is how it is framed in the CDO docs. I propose something like
cdo.add(infile=[cdo.fldmean.topo('global_0.2'), cdo.fldmax.topo('global_1')]).run()
This also avoids confusion with pythons built-in input function. But it's up to you, keeping input as a parameter for the operator call works well, too.
just because we talk about brackets so much: I wrote a short article about the introduction of brackets on the command line for special cases where the parenthesis-free version limits what you can do: in the case of arbitrary number of input streams (https://code.mpimet.mpg.de/boards/53/topics/6702)
I don't use this in cdo.py or cdo.rb because it's still rather new (ok, it's already a year old). But in fact it allows functionality that was not possible before. Maybe it's worth taking this into account,too
I wrote a short article about the introduction of brackets on the command line for special cases where the parenthesis-free version limits what you can do: in the case of arbitrary number of input streams
Awesome! I did not know about this new feature :heart_eyes:. It is indeed a good question how to handle this, because the brackets in cdo1 + (cdo2 + cdo3) are not communicated to the python calls. I have to think about it...
But using the input/infile parameter would allow this, i.e. it would be possible to translate
cdo.add(infile=[cdo.fldmean.topo('global_0.2'), cdo.fldmax.topo('global_1')]).run()
into
cdo -add [ -fldmean -topo.global_0.2 -fldmax -topo,global_1 ]
yes, it would mimic the brackets on the command line to a certain extent
yes, but as far as I understand the new [ ] notation, it also forces to specify all input streams in this call. So something like
cdo.add(infile=cdo.fldmean.topo('global_0.2')) + cdo.fldmax.topo('global_1')
won't work because it would be translated to
cdo -add [ -fldmean -topo.global_0.2 ] -fldmax -topo,global_1
correct - the above call does not work, it has to be cdo -add [ -fldmean -topo,global_0.2 -fldmax -topo,global_1 ]
infile and + are not interchangeable here. We should not have both I guess. IMO infile is better, because [] is designed to collecting/marking inputs for a certain operator and the infile key is meant for exactly that
Considering the brackets in the '+' operation. I think the best possibility would be to allow strings in this operation and then let the user insert the brackets where he wants. For example via
cdo.add + '[' + cdo.fldmean.topo('global_0.2') + cdo.fldmax.topo('global_1') + ']'
We should not have both I guess.
So you mean we should not implement the + operation at all, right?
Considering the brackets in the
'+'operation. I think the best possibility would be to allow strings in this operation and then let the user insert the brackets where he wants. For example viacdo.add + '[' + cdo.fldmean.topo('global_0.2') + cdo.fldmax.topo('global_1') + ']'
hm, this looks more and more like string processing. In fact that's what we do internally :rofl: But originally I wanted to hide this behind methods calls and parameters
hm, this looks more and more like string processing
Well, we could also use pythons __getitem__ method and specify input parameters with []:
cdo.add[cdo.fldmean.topo('global_0.2') + cdo.fldmax.topo('global_1')]
or
cdo.add[cdo.fldmean.topo('global_0.2'), cdo.fldmax.topo('global_1')]
but personally I do not like this approach to much, because one normally uses this so subset an array which is different than specifying input parameters to an operator call
yes, it doesn't really tell what it's doing. I like to use methods or keywords for separating things and be expressive at the same time.
Btw: we concentrated on the input, but what about output? We could have an outfile key or method call? it could also be a parameter of the run() method ...
yes, it doesn't really tell what it's doing
👍 So I propose the following strategy:
- The recommended usage for input streams is to specify them with the
infileparameter in the operator call and then the CDO library puts brackets around them - If we go for the
+operation (and I think the reusability mentioned in https://github.com/Try2Code/cdo-bindings/issues/23#issuecomment-544868538 is a very good reason for it), we should also allow strings in the+operation. It is straight-forward to implement and allows something like
base_op = cdo.add + 'base.nc'
op1 = base_op + 'infile1.nc'
op2 = base_op + 'infile2.nc'
which would not be possible with the cdo.add(infile=...) syntax.
we concentrated on the input, but what about output?
So at the moment this is implemented in the run method and I would leave it there. I think this makes the most sense, right? Especially if we remove the backward compatibility issue mentioned above (i.e. cdo.add(infile=...) does not automatically run the cdo command in the shell).
I agree - the output is only important in moment of execution. We can avoid this before
hey @Try2Code! I just want to write down some points of what we talked about quickly before my talk.
My proposal is to transform each cdo operator into a string surrounded by [ ]. Let's define
cdo1 = cdo.fldmean.topo('global_0.2')
cdo2 = cdo.fldmean.topo('global_1')
cdo.add(input=[cdo1, cdo2])
would become equivalent to something like
cdo -add [ -fldmean -topo,global_0.2 -fldmean -topo,global_1 ]
Concerning the + operation, the question is whether we would not want to keep this for the next release :wink: Otherwise, if you want to continue the discussion about this,
here are my thoughts concerning `+`:
Something like cdo.add + cdo1 + cdo2 could become equivalent to
cdo -add [ -fldmean -topo,global_0.2 ] [ -fldmean -topo,global_1 ]
but this is probably not working, right? So we should rather not use brackets for the + and restrict the bracket functionality to the input parameter of the operators. i.e. cdo.add(input=...) gives the result I mentioned above, and cdo.add + cdo1 + cdo2 gives the current behaviour, i.e. something like cdo -add -fldmean -topo,global_0.2 -fldmean -topo,global_1 without any brackets.