FlowKit
FlowKit copied to clipboard
Handling multiple datasets in FCS file
Hi @whitews, of all the cytometry libs available, I appreciate how straightforward and friendly yours is.
Do you support handling multiple datasets in FlowKit? I know that other projects (fcs/CytoFlow) allow this, but their script truncates the last dataset when reading.
Thanks!
@nwzy Thanks for the kind words. FlowKit is a relatively new project and one that I have wanted to develop for several years. Regarding multiple data sets, yes I do plan on supporting this. In fact, the recent Session
class is intended to be the programmatic equivalent of a FlowJo workspace, though it isn't yet finished...I am currently swamped with other projects. However, I do plan on getting back to FlowKit in the next week or so. I'm curious to know what use cases and functionality you would like to have in the library?
Also, I'm not sure what you mean that other libraries truncate the last sample, can you elaborate on that problem?
Kind regards, Scott
You're welcome! I'd love to contribute where I can
I'm curious to know what use cases and functionality you would like to have in the library?
We routinely use Merck's easyCyte high-throughput flow cytometer which takes a 96-plate. Some of the manual things that can see others do (that could be streamlined) are:
- Scrolling through samples to manually compare histograms
- Screenshot/crop the samples for presentations
- Compare FCS files of the same samples from different days
The principal scientist and I have managed to at least get use of jupyter notebooks up, but we have to write the code out in pre-made cells and narrowing their inputs to files/dirs that they're interested in.
Also, I'm not sure what you mean that other libraries truncate the last sample, can you elaborate on that problem?
Sure; when grepping a FCS file with multiple samples, all the samples will be there. However, when using the CytoFlow lib (which is dependent on another lib, fcsparser), for some reason it doesn't recognize that last sample. For example:
Grepping the raw FCS as a binary:
(cyto) nwzy@server:~/flow_data$ grep -ao '$SAMPLEID[^$SMNO]*' /my/sample/file.PRO.FCS
$SAMPLEID/Sample0/
$SAMPLEID/Sample1/
$SAMPLEID/Sample2/
But when using the CytoFlow lib:
input:
import cytoflow as flow
tube = flow.Tube(file='/rhome/nwong/guava_data/COCA/050619_nw.PRO.FCS')
import_op = flow.ImportOp(data_set=2, tubes=[tube])
ex = import_op.apply(metadata_only=True)
md = ex.metadata['fcs_metadata']['/my/sample/file.PRO.FCS']['GTI$SAMPLEID']
md
output:
CytoFlow: data_set=2 does not exist
The fastest way around this was to just capture an extra dummy sample at the end
Hope that helps out, Nic
Ahh, I misread the issue title. I thought you were asking about handling multiple FCS files under the same workflow, but you are referring to multi-sample FCS files. I have known about these for quite some time, as they are referenced in the FCS specification, but have never run across one. I've wondered if the cytometers we have in our lab can produce these, but they don't allow me to touch them ;) Would be very interested in getting one of these files to provide support for them. Could you send me one?
Here's a good example FCS from the Flow Repository, it has all the standard stuff you'd expect in a FCS3.0 file with multi-sample.
It's not a perfect example since it seems that each software writes out the metadata just a tiny bit differently.
My understanding is that ID (at the beginning of line containing the metadata) and $NEXTDATA
are used by the bundled programs to cycle through data, and GTI$SAMPLEID
is the user-typed name of the sample
I've wondered if the cytometers we have in our lab can produce these, but they don't allow me to touch them ;)
That's just cruel
Thanks for the link, will add this issue to the next milestone.
I've wondered if the cytometers we have in our lab can produce these, but they don't allow me to touch them ;)
That's just cruel
Maybe, but I also don't except pull requests from the biologists ;)
@nwzy I've had some time to look at the file you linked to, and it seems like the file might not be a valid FCS file. There are odd XML fragments in the text section, which might be okay, but they are oddly out of order as if the file's text section has been rewritten by a program that jumbled it. Do you have another example of a multi-data FCS file?
Hey @whitews hmm, that's strange...
Let me see if we have some data from an open whitepaper that we can share that, will tag you when I find something
Reviving this issue as I now have an example file. This will be supported in FlowIO 1.1.
The way this will work is that the FlowIO FlowData class will throw an error upon reading a multi-data file. That error will indicate to use a new utility function in FlowIO that will return multiple FlowData instances for every data set in the file. The FlowKit Sample class will check for this error and indicate a similar workflow. There will be an analogous pass-through utility function in FlowKit to return a list of Sample instances when given a multi-data file.