TCGAbiolinks icon indicating copy to clipboard operation
TCGAbiolinks copied to clipboard

GDCprepare parse error on TCGA-BLCA clinical file

Open AG-Sangster opened this issue 6 years ago • 1 comments

Hi, When the clinical file for TCGA-BLCA is downloaded and prepared there is a problem when the xml files are parsed. Some of the data is in the wrong column, there are 2 common errors;

  1. the data is shifted to the right
  2. data from multiple cells is put into one cell -particularly prominent around 'history_non_muscle_invasive_blca' column

Here are the commands that were used to download the data (when run, everything appears to be working properly) query <- GDCquery("TCGA-BLCA", data.category = "Clinical", file.type="xml") GDCdownload(query) clinical <- GDCprepare_clinic(query, "patient")

I've tried things discussed in the 'Error parsing clinical data XML files' thread. Also, I've tried querying patients individually, but the error persists. Any advice would be greatly appreciated!

AG-Sangster avatar Nov 12 '18 17:11 AG-Sangster

Any updates about this?

adefelicibus avatar Aug 21 '19 13:08 adefelicibus