reciprocalspaceship icon indicating copy to clipboard operation
reciprocalspaceship copied to clipboard

Return keys of dictionary in crystfel.py

Open DHekstra opened this issue 2 years ago • 8 comments

It would be helpful to optionally return the keys of the dictionary that _parse_stream in io/crystfel.py produces when calling rs.read_crystfel. My use case is working with stream files produced by a new crystfel data processing pipeline at LCLS that contains data from multiple fixed-target crystals. The crystal ID and per-crystal frame numbers are not encoded in the BATCH record produced by rs.read_crystfel but are recoverable from the keys. There's no need to parse the keys any further--just to have the option to return list(d.keys()).

See https://github.com/rs-station/reciprocalspaceship/blob/16867cf8c735d5cb169bb7ebd29194d60fae950e/reciprocalspaceship/io/crystfel.py#L237

DHekstra avatar May 15 '23 16:05 DHekstra

Is this going to be a new, general feature of the crystfel stream file output? If it is a very customized output for the LCLS processing pipeline (and will not make it more broadly into crystfel), I don't know if it makes sense to support.

For very personal/custom use cases, I think it may be best to put together your own implementation rather than modifying the interface here.

JBGreisman avatar Jun 30 '23 15:06 JBGreisman

My use case is, of course, specific, but the point is general: _parse_stream() already returns a dictionary with keys and values. I am simply asking for the option for read_crystfel() to also return that dictionary, or at least a list of its keys.

DHekstra avatar Jun 30 '23 18:06 DHekstra

got it -- I'll think over whether that will make sense. In the meantime, the desired functionality can be obtained this way (this is not officially supported, and is subject to change...):

from reciprocalspaceship.io.crystfel import _parse_stream
d, cell = _parse_stream(streamfile)

JBGreisman avatar Jun 30 '23 18:06 JBGreisman

That can work. In that case, could we turn lines 238-268 in crystfel.py into another function that can be called, e.g. dict_to_dataset() or so? That would allow me to write clean code that can both get the DataSet and the dictionary.

DHekstra avatar Jun 30 '23 18:06 DHekstra

The StreamLoader class provided by #216 to speed up the parsing of stream files may provide the info you want in a cleaner way. I'm not exactly sure what data you want, but you will be able to instantiate a StreamLoader object using something like this:

from reciprocalspaceship.io.crystfel import StreamLoader
loader = StreamLoader(streamfile)              # use to find metadata you seek
ds = loader.to_dataset(spacegroup=spacegroup)  # also get a DataSet
ds.set_index(["H", "K", "L"], inplace=True)

This isn't yet in the main codebase, but you can always test it out using the faster_stream branch of the repo

JBGreisman avatar Jun 30 '23 18:06 JBGreisman

I will look in to extending the StreamLoader class to provide the info @DHekstra wants. In our offline conversation, we discussed that the image filename was required for this.

kmdalton avatar Jun 30 '23 22:06 kmdalton

Is this still desirable, or has the update to rs.read_crystfel() addressed this request?

JBGreisman avatar Aug 24 '24 11:08 JBGreisman

Let's keep this open until I get a chance to try that. I expect that https://github.com/rs-station/reciprocalspaceship/pull/260 does everything I need to, but I have not tested it yet.

DHekstra avatar Aug 24 '24 17:08 DHekstra