Return keys of dictionary in crystfel.py
It would be helpful to optionally return the keys of the dictionary that _parse_stream in io/crystfel.py produces when calling rs.read_crystfel. My use case is working with stream files produced by a new crystfel data processing pipeline at LCLS that contains data from multiple fixed-target crystals. The crystal ID and per-crystal frame numbers are not encoded in the BATCH record produced by rs.read_crystfel but are recoverable from the keys. There's no need to parse the keys any further--just to have the option to return list(d.keys()).
See https://github.com/rs-station/reciprocalspaceship/blob/16867cf8c735d5cb169bb7ebd29194d60fae950e/reciprocalspaceship/io/crystfel.py#L237
Is this going to be a new, general feature of the crystfel stream file output? If it is a very customized output for the LCLS processing pipeline (and will not make it more broadly into crystfel), I don't know if it makes sense to support.
For very personal/custom use cases, I think it may be best to put together your own implementation rather than modifying the interface here.
My use case is, of course, specific, but the point is general: _parse_stream() already returns a dictionary with keys and values. I am simply asking for the option for read_crystfel() to also return that dictionary, or at least a list of its keys.
got it -- I'll think over whether that will make sense. In the meantime, the desired functionality can be obtained this way (this is not officially supported, and is subject to change...):
from reciprocalspaceship.io.crystfel import _parse_stream
d, cell = _parse_stream(streamfile)
That can work. In that case, could we turn lines 238-268 in crystfel.py into another function that can be called, e.g. dict_to_dataset() or so? That would allow me to write clean code that can both get the DataSet and the dictionary.
The StreamLoader class provided by #216 to speed up the parsing of stream files may provide the info you want in a cleaner way. I'm not exactly sure what data you want, but you will be able to instantiate a StreamLoader object using something like this:
from reciprocalspaceship.io.crystfel import StreamLoader
loader = StreamLoader(streamfile) # use to find metadata you seek
ds = loader.to_dataset(spacegroup=spacegroup) # also get a DataSet
ds.set_index(["H", "K", "L"], inplace=True)
This isn't yet in the main codebase, but you can always test it out using the faster_stream branch of the repo
I will look in to extending the StreamLoader class to provide the info @DHekstra wants. In our offline conversation, we discussed that the image filename was required for this.
Is this still desirable, or has the update to rs.read_crystfel() addressed this request?
Let's keep this open until I get a chance to try that. I expect that https://github.com/rs-station/reciprocalspaceship/pull/260 does everything I need to, but I have not tested it yet.