airr-standards icon indicating copy to clipboard operation
airr-standards copied to clipboard

Specify types for custom columns in Rearrangements TSV?

Open scharch opened this issue 5 years ago • 13 comments

Ideally, I would be able to do this with a simple dictionary like { 'columnX':int, 'columnY':boolean }, but I could settle for passing in a full auxiliary schema if that's considered the proper way to go about it.

scharch avatar Jul 03 '20 00:07 scharch

To you mean as arguments to read/write/validate in the python library?

javh avatar Jul 03 '20 19:07 javh

Yeah, so I can do something like if row['columnX'] > 0. Obviously I can just cast it explicitly, but it seems like it might be a useful feature generally.

scharch avatar Jul 03 '20 21:07 scharch

Is the field type sufficient, or would we see a circumstance where other attributes might be desirable? nullable is one example. If we don't really know but want to allow for the possibility in the future, we could just alter that simple dictionary slightly like this. Then we could support additional properties easily.

{ 'columnX': { type: int }, 'columnY': { type: boolean }}

schristley avatar Sep 16 '20 15:09 schristley

Yeah, I could see how that could be desirable. Flexible is good, anyway, and it doesn't seem to cost anything in this case.

scharch avatar Sep 16 '20 15:09 scharch

This would be a nice extension - to make sure I understand, this is for the validate libraries, correct? So you can say to the python validate functions, here is an AIRR TSV file with custom columns, and here is the schema for those custom columns, please validate the file against the AIRR schema and the custom schema... Correct?

bcorrie avatar Sep 16 '20 16:09 bcorrie

Yes, but also the read/write functions so it does automatic type conversion.

schristley avatar Sep 16 '20 17:09 schristley

Could someone please share an example file for which there are custom columns or columns where the type cannot be guessed? I didn't find any in the test data. The R package performs as expected for good_data.tsv and bad_data.tsv. What is the role of extra_data.tsv? There is a parsing error, but I think it does not reflect the column type problem.

imkeller avatar Feb 17 '21 19:02 imkeller

Leaving this open as the functionality has only been implemented in R, not Python

scharch avatar Jul 10 '23 03:07 scharch

@scharch still v2.0 issue?

bcorrie avatar Feb 23 '24 19:02 bcorrie

I mean, it would be nice for the python and R functionalities to match...

scharch avatar Feb 23 '24 19:02 scharch

This was implemented via the aux_types argument to read_trabular.

javh avatar Feb 25 '24 04:02 javh

@javh I believe this should be reopened as there is not yet a corresponding functionality in the python RearrangmentReader

scharch avatar Feb 26 '24 21:02 scharch

Okiee. Reopened. I missed that this was a python library issue and not an R library issue (for no good reason, just not paying attention).

javh avatar Feb 27 '24 19:02 javh