PyHDX icon indicating copy to clipboard operation
PyHDX copied to clipboard

Refactor loading data

Open Jhsmit opened this issue 3 years ago • 1 comments

Large refactoring of how data is loaded into the HDXMeasurement object.

  • Removed the PeptideMasterTable object in favor of a functional approach with functions filter_peptides, apply_control and correct_d_uptake
  • Updated .yaml HDX spec files

Updated some of the column names in HDX Measurement data: Mapping from old to new column names is: "start" -> "_start" "end" -> "_stop" '_start'' -> "start" "_end" -> 'stop'

Additionally, replace spaces in column names with underscores.

Jhsmit avatar Oct 12 '22 19:10 Jhsmit

Hi @ococrook, if you have some time I was wondering if you could give some feedback on the new API and .yaml files as I've reworked them in this PR.

The new .yaml format is as shown here: https://github.com/Jhsmit/PyHDX/blob/refactor_data_loading/tests/test_data/input/data_states.yaml

The other main change is the removal of the PeptideMasterTable object. Implementation of this part as object-oriented was a poor design decision retrospectively, and I think the new functional approach is much better and makes it easier for users to directly interface with the HDXMeasurement object. An example of how to use these is here: https://github.com/Jhsmit/PyHDX/blob/a3838a3532d658dd16d50d738999706a694bbc61/templates/01_load_secb_data.py

There are still a few steps in the new procedure related to back-exchange correction that are not very clear in their current implementation and I plan to tackle those in the future.

I'll update the docs shortly.

Jhsmit avatar Oct 13 '22 11:10 Jhsmit