augur icon indicating copy to clipboard operation
augur copied to clipboard

Store titer records in TiterRecord class and export in structured JSON format

Open huddlej opened this issue 8 years ago • 2 comments

The goals of this proposal are:

  • remove need for eval call when loading titers from JSON in augur’s process step
  • add support for multiple user-defined titer attributes such as the passaging details available from the CDC titers
  • improve documentation of the JSON format for titers by explicitly naming fields instead of relying on slightly ambiguous dictionary format
  • encapsulate logic about individual titer records along the lines of a SequenceRecord from BioPython such that each record knows how to export itself to JSON and also report other details about itself

The current JSON format looks like this:

"titers": {
    "('A/Acores/11/2013', ('A/Alabama/5/2010', 'F27/10'))": [
      80.0
    ],
    "('A/Acores/11/2013', ('A/Athens/112/2012', 'F16/12'))": [
      640.0
    ]
}

In this format, each record is a key/value pair where the key is a tuple of test strain, reference strain, and serum id that has been converted to a string for JSON compatibility. The value of each pair is a list of floating point values corresponding to raw titer measurements.

The new format should be a list of dictionaries where each dictionary corresponds to a TiterRecord instance in JSON format. Each entry in the TiterRecord should be explicitly named to remove ambiguity about the data and enable additional fields to be added in the future. For example, the following format can support inclusion of the optional “source” and “assay” fields that was originally omitted from each record.

"titers": [
    {
        "assay": "hi",
        "test_strain": "A/Acores/11/2013",
        "reference_strain": "A/Alabama/5/2010",
        "serum": "F27/10",
        "source": "NIMR_Sep2013_7-11.csv",
        "values": [
            80.0
        ]
    },
    {
        "assay": "hi",
        "test_strain": "A/Acores/11/2013",
        "reference_strain": "A/Athens/112/2012",
        "serum": "F16/12",
        "source": "NIMR_Sep2013_7-11.csv",
        "values": [
            640.0
        ]
    }
]

The records from this JSON format map directly to attributes of the TiterRecord Python class. In addition to these attributes, the TiterRecord class would expose the following methods.

class TiterRecord(object):
    def __init__(self, test_strain, reference_strain, serum, values, **kwargs):
        """Builds a new TiterRecord instance.

        Args:
            test_strain (str): name of the test strain
            reference_strain (str): name of the reference strain
            serum (str): name of the serum
            values (list): a list of raw floating point titer measurements
            kwargs (dict): additional attributes of the TiterRecord instance

        Returns:
            TiterRecord: an instance of the record class populated with the given strains, serum, and values

        >>> record = TiterRecord(test_strain="strain_a", reference_strain="strain_b", serum="serum_a", values=[80.0], assay="hi")
        >>> record.test_strain
        'strain_a'
        >>> record.values[0]
        80.0
        >>> hasattr(record, "assay")
        True
        >>> record.assay
        'hi'
        >>> hasattr(record, "source")
        False
        >>> record_dict = {"test_strain": "strain_a", "reference_strain": "strain_b", "serum": "serum_a", "values": [80.0], "assay": "hi"}
        >>> record = TiterRecord(**record_dict)
        >>> record.test_strain
        'strain_a'
        """
        pass

    def to_dict(self):
        """Returns the current instance as a dictionary.

        Returns:
            dict: attributes of the current instance as key/value pairs

        >>> record = TiterRecord(test_strain="strain_a", reference_strain="strain_b", serum="serum_a", values=[80.0], assay="hi")
        >>> sorted(record.to_dict().items())
        [('assay', 'hi'), ('reference_strain', 'strain_b'), ('serum', 'serum_a'), ('test_strain', 'strain_a'), ('values', [80.0])]
        """
        pass 

The primary distinction between a TiterRecord and a dict is that the former class has required attributes. TiterRecord instances will not know how to export themselves into the dictionary format used by augur with tuple keys and list values; the TiterModel class should know how to convert a list of TiterRecord instances into that format. The TiterModel class should also know how to build a list of TiterRecord instances from a tab-delimited file of measurements.

huddlej avatar Jul 20 '17 16:07 huddlej

@huddlej I'm reviewing open Augur issues to try to identify some Augur-related work I could do this year. Is this issue still relevant and desirable to do?

genehack avatar Jan 10 '25 18:01 genehack

@genehack Thanks for checking! This issue is still relevant and desirable, although only for a small group of users (maybe only me), so it is still relatively low priority.

If I were to address this issue today, I would probably opt to load titers TSVs into a pandas data frame which provides the basic record-oriented structure I was hoping for originally.

I would also put off this change until I had added better test coverage of the titer models code, to give me more confidence that refactoring the data representation didn't break anything.

huddlej avatar Jan 16 '25 18:01 huddlej