TCGAbiolinks
TCGAbiolinks copied to clipboard
GDCprepare parse error on TCGA-BLCA clinical file
Hi, When the clinical file for TCGA-BLCA is downloaded and prepared there is a problem when the xml files are parsed. Some of the data is in the wrong column, there are 2 common errors;
- the data is shifted to the right
- data from multiple cells is put into one cell -particularly prominent around 'history_non_muscle_invasive_blca' column
Here are the commands that were used to download the data (when run, everything appears to be working properly) query <- GDCquery("TCGA-BLCA", data.category = "Clinical", file.type="xml") GDCdownload(query) clinical <- GDCprepare_clinic(query, "patient")
I've tried things discussed in the 'Error parsing clinical data XML files' thread. Also, I've tried querying patients individually, but the error persists. Any advice would be greatly appreciated!
Any updates about this?