cbioportal icon indicating copy to clipboard operation
cbioportal copied to clipboard

Fix errors when importing data with empty Start_Position or End_Position

Open nr23730 opened this issue 3 years ago • 8 comments

Describe changes proposed in this pull request:

The current file format description for mutation data lists Start_Position and End_Position as optional attributes.

If those columns are not present at all the import of data without start/end position works just fine. However, if these columns are present in the header and are left empty you will get an error message stating that '' is not an integer value. By looking at the source code of the validator I identified NA as an alternative option. The usage of NA however leads to an error message in the checkAlleleMAFFormat function where it tries to cast NA to a float.

Therefore I propose to gor for an empty string for Start_Position and End_Position as this successfully passes the validator and is much more intuitive.

nr23730 avatar Nov 15 '21 18:11 nr23730

Thank you for having a look at this so quickly! First: I'm well aware of the consequences of letting these fields empty.

Second: I tried to import the data when setting Start_Position and End_Position to NA - it failed. This is because L1699 & L1700 try to convert NA String to a float type. So if you'd like to enable both "" and "NA" we'll have to adjust those two lines too.

nr23730 avatar Nov 16 '21 16:11 nr23730

@ao508 @inodb I updated this to the most recent master branch and modified it so that now NA and empty values are allowed.

nr23730 avatar Dec 14 '21 10:12 nr23730

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

sonarqubecloud[bot] avatar Dec 14 '21 10:12 sonarqubecloud[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 25 '22 03:04 stale[bot]

Sorry for replying to this so late. It is a very good catch and I think we need to solve these reported issues.

I just have one question: the error is still there if the value is empty but contains spaces, I guess a strip is missing somewhere?

oplantalech avatar Jul 11 '22 08:07 oplantalech

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

sonarqubecloud[bot] avatar Jul 11 '22 08:07 sonarqubecloud[bot]

@oplantalech
Well, the error still occurs. I'm running this slightly modified script for quite some time now without any issues. If you like, I could also include a whitespace as an accepted value. But since NA is more or less stated in the manual and just leaving it empty is sort of intuitive, I would at least vote for those two.

nr23730 avatar Jul 14 '22 22:07 nr23730

@nr23730 I agree with allowing NA or leaving empty spaces, but what I mean is that we should either accept whitespaces as a valid value or catch the error so that the validator can finish the process. Right now, with whitespaces the cryptic error "'' is not an integer value" is thrown and the script does not finalize with INFO: -: Validation complete (and then either Validation of study succeeded. or Validation of study failed. depending on what we decide to do).

oplantalech avatar Jul 15 '22 07:07 oplantalech

Closing this PR since it is included in #9799.

oplantalech avatar Sep 16 '22 13:09 oplantalech