cbioportal
cbioportal copied to clipboard
Fix errors when importing data with empty Start_Position or End_Position
Describe changes proposed in this pull request:
The current file format description for mutation data lists Start_Position
and End_Position
as optional attributes.
If those columns are not present at all the import of data without start/end position works just fine. However, if these columns are present in the header and are left empty you will get an error message stating that ''
is not an integer value.
By looking at the source code of the validator I identified NA
as an alternative option. The usage of NA
however leads to an error message in the checkAlleleMAFFormat
function where it tries to cast NA
to a float.
Therefore I propose to gor for an empty string for Start_Position
and End_Position
as this successfully passes the validator and is much more intuitive.
Thank you for having a look at this so quickly! First: I'm well aware of the consequences of letting these fields empty.
Second: I tried to import the data when setting Start_Position
and End_Position
to NA
- it failed. This is because L1699 & L1700 try to convert NA
String to a float type. So if you'd like to enable both ""
and "NA"
we'll have to adjust those two lines too.
@ao508 @inodb
I updated this to the most recent master branch and modified it so that now NA
and empty values are allowed.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Sorry for replying to this so late. It is a very good catch and I think we need to solve these reported issues.
I just have one question: the error is still there if the value is empty but contains spaces, I guess a strip is missing somewhere?
@oplantalech
Well, the error still occurs. I'm running this slightly modified script for quite some time now without any issues. If you like, I could also include a whitespace as an accepted value. But since NA
is more or less stated in the manual and just leaving it empty is sort of intuitive, I would at least vote for those two.
@nr23730 I agree with allowing NA
or leaving empty spaces, but what I mean is that we should either accept whitespaces as a valid value or catch the error so that the validator can finish the process. Right now, with whitespaces the cryptic error "'' is not an integer value
" is thrown and the script does not finalize with INFO: -: Validation complete
(and then either Validation of study succeeded.
or Validation of study failed.
depending on what we decide to do).
Closing this PR since it is included in #9799.