iris icon indicating copy to clipboard operation
iris copied to clipboard

unpin numpy legacy printing

Open bjlittle opened this issue 7 years ago • 8 comments

Did we want to unpin the legacy=1.13 printing of numpy arrays for the 2.1 release of iris since we've worked hard to have numpy>=1.14 ?

Or do this in the following point release?

I just didn't want us to squirrel this one away and forget about it before it hurts us (again)...

bjlittle avatar Jun 02 '18 07:06 bjlittle

No need to do this for v2.1, but I'm definitely keen to do it asap.

pelson avatar Jun 02 '18 08:06 pelson

Without meaning to be blase, but...

~~No need to do this for v2.1, but I'm definitely keen to do it asap.~~

No need to do this for v2.2, but I'm definitely keen to do it asap.

DPeterK avatar Oct 03 '18 09:10 DPeterK

Without meaning to be blase, but...

~~No need to do this for v2.1, but I'm definitely keen to do it asap.~~

~~No need to do this for v2.2, but I'm definitely keen to do it asap.~~

~~No need to do this for v2.3, but I'm definitely keen to do it asap.~~

~~No need to do this for v2.4, but I'm definitely keen to do it asap.~~

~~No need to do this for v3.0, but I'm definitely keen to do it asap.~~

~~No need to do this for v3.1, but I'm definitely keen to do it asap.~~

~~No need to do this for v3.2, but I'm definitely keen to do it asap.~~

~~No need to do this for v3.3, but I'm definitely keen to do it asap.~~

~~No need to do this for v3.4, but I'm definitely keen to do it asap.~~

No need to do this for v3.5, but I'm definitely keen to do it asap.

:rofl:

bjlittle avatar Jan 10 '22 14:01 bjlittle

I just didn't want us to squirrel this one away and forget about it before it hurts us (again)...

👀

rcomer avatar Jan 10 '22 17:01 rcomer

@rcomer I know :laughing:

Given the level of angst on #4486, it would be ideal to perhaps take the hit square on the chin and deprecate/rewrite iris.util.format_array in iris 3.2. in preference to using something along the lines of numpy.array2print(..., legacy="1.13") instead afterwards.

I suspect adopting such a change will cause wide sweeping changes in iris, which I'm totally willing to wade through rather than rinse and repeat the current experience that we have at the moment.

However, the silver lining here is that this is a really lovely example of why it's not clever to use private functionality; one day it will burn you.

Shame we keep putting our hand into the fire :hand: :fire: :cry:

Anyways, I'm off to the print shop to get my Always stick to the public API T-shirt... want one?

They might do bulk discounts. Win :+1:

bjlittle avatar Jan 11 '22 09:01 bjlittle

Discussed just now by : @pp-mo @bjlittle @trexfeathers

We think we would like to change this soon, if not to something necessarily more stable, then at least to a public routine - like, probably numpy.array2string. Practically that seems quite do-able but will ..

  1. break all the xml/cdl tests (I count 409)
  2. change all their reference-result files tests/results/.../*.{cdl|xml} (about 120).

It would also be possible to adopt a more statistics-based or array "fingerprinting" approach (*) so as to shrink the XML, -- this would need a controlled sensitivity to numerical changes, which is not a trivial problem.

It clearly needs some thought, so let's just not rush it.

pp-mo avatar Jan 11 '22 10:01 pp-mo

It would also be possible to adopt a more statistics-based or array "fingerprinting" approach so as to shrink the XML, -- this would need a controlled sensitivity to numerical changes, which is not a trivial problem.

Re: "fingerprinting" : by which I mean (ideally) some kind of summary of array values, much smaller than the whole data, sensitive to any individual value changing, but toleranced for floating-point. N.B. not really a "hash" concept as that usually focusses on detecting even the smallest changes (though despite that, an idea very much like 'imagehash' !)

Effectively, what we are currently doing is to use the numpy array2string representation to choose a suitable common format = output precision, and outputting all the numbers to that precision : that string is our data summary.

I haven't managed to find any very accepted existing approach for this, though, except that "fingerprint" seems to be a recognised term for the general concept : see https://en.wikipedia.org/wiki/Fingerprint_(computing) ;

pp-mo avatar Jan 12 '22 09:01 pp-mo

What does the XML approach offer that we couldn't get from saving to NetCDF? Is reliance on the netCDF4 package the main problem there?

Because it seems from these problems that even the XML solution encounters dependency issues - possibly more difficult ones.

trexfeathers avatar Jan 12 '22 11:01 trexfeathers