airr-standards
airr-standards copied to clipboard
Specify types for custom columns in Rearrangements TSV?
Ideally, I would be able to do this with a simple dictionary like { 'columnX':int, 'columnY':boolean }, but I could settle for passing in a full auxiliary schema if that's considered the proper way to go about it.
To you mean as arguments to read/write/validate in the python library?
Yeah, so I can do something like if row['columnX'] > 0. Obviously I can just cast it explicitly, but it seems like it might be a useful feature generally.
Is the field type sufficient, or would we see a circumstance where other attributes might be desirable? nullable is one example. If we don't really know but want to allow for the possibility in the future, we could just alter that simple dictionary slightly like this. Then we could support additional properties easily.
{ 'columnX': { type: int }, 'columnY': { type: boolean }}
Yeah, I could see how that could be desirable. Flexible is good, anyway, and it doesn't seem to cost anything in this case.
This would be a nice extension - to make sure I understand, this is for the validate libraries, correct? So you can say to the python validate functions, here is an AIRR TSV file with custom columns, and here is the schema for those custom columns, please validate the file against the AIRR schema and the custom schema... Correct?
Yes, but also the read/write functions so it does automatic type conversion.
Could someone please share an example file for which there are custom columns or columns where the type cannot be guessed? I didn't find any in the test data. The R package performs as expected for good_data.tsv and bad_data.tsv. What is the role of extra_data.tsv? There is a parsing error, but I think it does not reflect the column type problem.
Leaving this open as the functionality has only been implemented in R, not Python
@scharch still v2.0 issue?
I mean, it would be nice for the python and R functionalities to match...
This was implemented via the aux_types argument to read_trabular.
@javh I believe this should be reopened as there is not yet a corresponding functionality in the python RearrangmentReader
Okiee. Reopened. I missed that this was a python library issue and not an R library issue (for no good reason, just not paying attention).