Notifying implementation inconsistency in user properties
Hi,
I am getting an unexpected error following my calculation of the SFR_histogram for a single halo at the latest timestep: 'tangos write SFR_histogram SFR_10Myr --for Halo600 --latest --hmax=0'.
The error reads:
Out[10]: ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
691 type_pprinters=self.type_printers,
692 deferred_pprinters=self.deferred_printers)
--> 693 printer.pretty(obj)
694 printer.flush()
695 return stream.getvalue()
~/anaconda3/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
378 if callable(meth):
379 return meth(obj, self, cycle)
--> 380 return _default_pprint(obj, self, cycle)
381 finally:
382 self.end_group()
~/anaconda3/lib/python3.6/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
493 if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
494 # A user-provided repr. Find newlines and replace them with p.break_()
--> 495 _repr_pprint(obj, p, cycle)
496 return
497 p.begin_group(1, '<')
~/anaconda3/lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
691 """A pprint that just redirects to the normal repr function."""
692 # Find newlines and replace them with p.break_()
--> 693 output = repr(obj)
694 for idx,output_line in enumerate(output.splitlines()):
695 if idx:
~/Documents/tangos/tangos/core/halo_data/property.py in __repr__(self)
47 x = "<HaloProperty "
48 if self.data_float is not None:
---> 49 return (x + self.name.text + "=%.2e" % self.data) + " of " + self.halo.short() + ">"
50 elif self.data_array is not None:
51 return x + self.name.text + " (array) of " + self.halo.short() + ">"
~/Documents/tangos/tangos/core/halo_data/property.py in data(self)
65 @property
66 def data(self):
---> 67 return self.get_data_with_reassembly_options()
68
69 def get_data_with_reassembly_options(self, *options):
~/Documents/tangos/tangos/core/halo_data/property.py in get_data_with_reassembly_options(self, *options)
68
69 def get_data_with_reassembly_options(self, *options):
---> 70 return extraction_patterns.HaloPropertyValueWithReassemblyOptionsGetter(*options).postprocess_data_objects([self])[0]
71
72
~/Documents/tangos/tangos/core/extraction_patterns.py in postprocess_data_objects(self, outputs)
112
113 def postprocess_data_objects(self, outputs):
--> 114 return [self._postprocess_one_result(o) for o in outputs]
115
116 def _setup_data_mapper(self, property_object):
~/Documents/tangos/tangos/core/extraction_patterns.py in <listcomp>(.0)
112
113 def postprocess_data_objects(self, outputs):
--> 114 return [self._postprocess_one_result(o) for o in outputs]
115
116 def _setup_data_mapper(self, property_object):
~/Documents/tangos/tangos/core/extraction_patterns.py in _postprocess_one_result(self, property_object)
132 if hasattr(self._providing_class, 'reassemble'):
133 instance = self._providing_class(property_object.halo.timestep.simulation)
--> 134 return instance.reassemble(property_object, *self._options)
135 else:
136 self._setup_data_mapper(property_object)
~/Documents/tangos/tangos/properties/pynbody/SF.py in reassemble(self, *options)
35
36 def reassemble(self, *options):
---> 37 reassembled = super(StarFormHistogram, self).reassemble(*options)
38 return reassembled/1e9 # Msol per Gyr -> Msol per yr
39
~/Documents/tangos/tangos/properties/__init__.py in reassemble(self, property, reassembly_type)
248
249 if reassembly_type=='major':
--> 250 return self._reassemble_using_finding_strategy(property, strategy = rfs.MultiHopMajorProgenitorsStrategy)
251 elif reassembly_type=='major_across_simulations':
252 return self._reassemble_using_finding_strategy(property, strategy = rfs.MultiHopMajorProgenitorsStrategy,
~/Documents/tangos/tangos/properties/__init__.py in _reassemble_using_finding_strategy(self, property, strategy, strategy_kwargs)
276 for t_i, hist_i in zip(t, stack):
277 end = self.bin_index(t_i)
--> 278 start = end - len(hist_i)
279 valid = hist_i == hist_i
280 if t_i != previous_time:
TypeError: object of type 'numpy.float64' has no len()
I think the problem is generated because there is only one SFR_histogram property available across the merger tree of this halo, and hence a number (which has no length) is retruned rather than an array.
Am I misusing this property (in the sense that it should always be generated across the full merger tree) or is this a bug?
The error also generates a catastrophic failure of the web server preventing from accessing the page of this halo (but not the timestep page or other halo pages). I think this is because the webserver tries by default to reassemble histogram properties which fail with the same error as above.
Thanks Martin
It has to be generated at least for the whole major progenitor branch, otherwise it’s not defined. But it would be nice if a neater/more helpful error resulted...
Ok thanks for the clarification. I will leave it up to you to decide if a better error message should be added and if a better try/catch mechanism should be added to the web server to avoid it completely breaking if histogram properties fail.
Otherwise, I'll just close this issue and it will serve as documentation on SFR_histogram.
I finally got to the bottom of this. It was due to my declaration of the property name.
I declared it as names = ["SFR_histogram"] even though only one calculation was returned.
When performing the calculation loop, the code here https://github.com/pynbody/tangos/blob/7f36946b6b0aeee4d0031d2d9fa9248788409bc1/tangos/tools/property_writer.py#L407-L413 would evaluate the need to listize to False because the property name is a list type.
Once the actual calculation was done by my property, it would return an SFR array as result which would go through this snippet of code: https://github.com/pynbody/tangos/blob/7f36946b6b0aeee4d0031d2d9fa9248788409bc1/tangos/tools/property_writer.py#L423-L428 No list rewrapping would be done as the if test evaluates to False.
When queued for later committing, the result array would be broken down by a zip call here: https://github.com/pynbody/tangos/blob/7f36946b6b0aeee4d0031d2d9fa9248788409bc1/tangos/tools/property_writer.py#L259-L261 and the database would end up storing only the first number in the array, rather than the full array. This one number would then be retrieved during the reassembly, leading the above failure.
This error clearly follows my own coding error when declaring the property name, but the bug was incredibly difficult to catch as TANGOS swallowed the problem very deep without complaining. One way to avoid this in the future would be to implement additional consistency checks for example verifying that the length of the calculated result matches the length of the stored result, as well as throwing a more sensible error in the reassembly method if its arguments are not what is expected.
Note that in the end, this has nothing to do with the SFR_histogram being generated for only one output.
Ah, I see, yes it would be useful to implement a simple test to catch this type of inconsistency