datahub
datahub copied to clipboard
lusc_cptac_2021 data issue
Cohort location on repo: public/lusc_cptac_2021
Latest commit hash: 0dd135a54412b969be818ac5a412dc1cc9d00ea5
cBioPortal version: 3.7.22
Import attempted on: 01/20/2022
Validation status: Failed
Import status: Failed
Error: java.lang.RuntimeException: org.mskcc.cbio.portal.dao.DaoException: DB Error: only 22470 of the 22471 records were inserted in genetic_alteration
. More error/warning details: Cannot add or update a child row: a foreign key constraint fails (cbioportal
.genetic_alteration
, CONSTRAINT genetic_alteration_ibfk_2
FOREIGN KEY (GENETIC_ENTITY_ID
) REFERENCES genetic_entity
(ID
) ON DELETE CASCADE)
We are using the cbioportal schema for version 3.7.22 and are validating against cbioportal.org. Not sure if this is data-related only to this cohort or schema-related that could have a broader issue.
Attaching the log lusc_cptac_2021_import_01_22_2022.log
@sheridancbio may I get your interpretation on this one? Not sure what could be the cause of this error in the public importer, but wasn't an issue in our pipeline importer.
@n1zea144 Running on docker importer locally and getting this importing error instead (validation passed)
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Reading data from: /study/lusc_cptac_2021/data_mirna.txt
Recaching...
Finished recaching...
--> profile id: 15
--> profile name: miRNA expression (TPM)
--> genetic alteration type: MRNA_EXPRESSION
--> total number of samples: 107
--> total number of data lines: 2439
--> records inserted into `sample_profile` table: 107
--> total number of data entries skipped (see table below): 2439
org.mskcc.cbio.portal.dao.DaoException: Something has gone wrong! I did not save any records to the database!
at org.mskcc.cbio.portal.scripts.ImportTabDelimData.importData(ImportTabDelimData.java:307)
at org.mskcc.cbio.portal.scripts.ImportProfileData.run(ImportProfileData.java:125)
at org.mskcc.cbio.portal.scripts.ConsoleRunnable.runInConsole(ConsoleRunnable.java:145)
at org.mskcc.cbio.portal.scripts.ImportProfileData.main(ImportProfileData.java:150)
Warnings / Errors:
-------------------
0. Entrez_Id null not found. Record will be skipped for this gene.; 2439x
ABORTED!
java.lang.RuntimeException: org.mskcc.cbio.portal.dao.DaoException: Something has gone wrong! I did not save any records to the database!
at org.mskcc.cbio.portal.scripts.ImportProfileData.run(ImportProfileData.java:130)
at org.mskcc.cbio.portal.scripts.ConsoleRunnable.runInConsole(ConsoleRunnable.java:145)
at org.mskcc.cbio.portal.scripts.ImportProfileData.main(ImportProfileData.java:150)
Caused by: org.mskcc.cbio.portal.dao.DaoException: Something has gone wrong! I did not save any records to the database!
at org.mskcc.cbio.portal.scripts.ImportTabDelimData.importData(ImportTabDelimData.java:307)
at org.mskcc.cbio.portal.scripts.ImportProfileData.run(ImportProfileData.java:125)
... 2 more
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Error occurred during data loading step. Please fix the problem and run this again to make sure study is completely loaded.
Traceback (most recent call last):
File "/usr/local/bin/metaImport.py", line 202, in <module>
cbioportalImporter.main(args)
File "/cbioportal/core/src/main/scripts/importer/cbioportalImporter.py", line 533, in main
process_directory(jvm_args, study_directory, args.update_generic_assay_entity)
File "/cbioportal/core/src/main/scripts/importer/cbioportalImporter.py", line 368, in process_directory
import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])
File "/cbioportal/core/src/main/scripts/importer/cbioportalImporter.py", line 162, in import_study_data
run_java(*args)
File "/cbioportal/core/src/main/scripts/importer/cbioportal_common.py", line 990, in run_java
raise RuntimeError('Aborting due to error while executing step.')
RuntimeError: Aborting due to error while executing step.
ERROR: 1
** the study imported to triage successfully at the same time
Hi @yichaoS @alexsigaras That foreign key constraint/error leads me to believe that the data contains a gene that is not the destination database. If you are validating against cbioportal.org, but importing into a different database, that explain why validation passes , but import fails (not sure if this is the scenario).
@alexsigaras We updated our seed database with new gene information, perhaps you need to update your gene information in your local db?
For what its worth, I did try to run the metaImport.py on a fresh grab of the master branch and datahub and didn't run into the foreign key constraint.
Comparing this output with the one provided, I'm wondering if you have the latest version of the data. The profile name in your log file is:
--> profile name: mRNA expression (FPKM)
while in my run its:
--> profile name: mRNA expression (FPKM, log2 transformed)
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Reading data from: /Users/grossb1/prgs/cbio/cbio-portal-data/datahub/public/lusc_cptac_2021/data_mrna_seq_fpkm.txt
Recaching...
Finished recaching...
--> profile id: 77614
--> profile name: mRNA expression (FPKM, log2 transformed)
--> genetic alteration type: MRNA_EXPRESSION
--> total number of samples: 108
--> total number of data lines: 21791
--> records inserted into `sample_profile` table: 108
--> records inserted into `genetic_alteration` table: 22491
--> total number of data entries skipped (see table below): 70
Warnings / Errors:
-------------------
0. Entrez_Id 100126355 not found. Record will be skipped for this gene.; 1x
1. Entrez_Id 100302159 not found. Record will be skipped for this gene.; 1x
2. Entrez_Id 100302287 not found. Record will be skipped for this gene.; 1x
3. Entrez_Id 100313771 not found. Record will be skipped for this gene.; 1x
4. Entrez_Id 100313835 not found. Record will be skipped for this gene.; 1x
5. Entrez_Id 100500895 not found. Record will be skipped for this gene.; 1x
6. Entrez_Id 100616123 not found. Record will be skipped for this gene.; 1x
7. Entrez_Id 100616190 not found. Record will be skipped for this gene.; 1x
8. Entrez_Id 100616354 not found. Record will be skipped for this gene.; 1x
9. Entrez_Id 100616440 not found. Record will be skipped for this gene.; 1x
10. Entrez_Id 100847017 not found. Record will be skipped for this gene.; 1x
11. Entrez_Id 100847057 not found. Record will be skipped for this gene.; 1x
12. Entrez_Id 100847059 not found. Record will be skipped for this gene.; 1x
13. Entrez_Id 100847062 not found. Record will be skipped for this gene.; 1x
14. Entrez_Id 100847063 not found. Record will be skipped for this gene.; 1x
15. Entrez_Id 100847071 not found. Record will be skipped for this gene.; 1x
16. Entrez_Id 100847072 not found. Record will be skipped for this gene.; 1x
17. Entrez_Id 100847074 not found. Record will be skipped for this gene.; 1x
18. Entrez_Id 100847078 not found. Record will be skipped for this gene.; 1x
19. Entrez_Id 100847080 not found. Record will be skipped for this gene.; 1x
20. Entrez_Id 100847083 not found. Record will be skipped for this gene.; 1x
21. Entrez_Id 100847085 not found. Record will be skipped for this gene.; 1x
22. Entrez_Id 100847086 not found. Record will be skipped for this gene.; 1x
23. Entrez_Id 100847087 not found. Record will be skipped for this gene.; 1x
24. Entrez_Id 100847088 not found. Record will be skipped for this gene.; 1x
25. Entrez_Id 100847092 not found. Record will be skipped for this gene.; 1x
26. Entrez_Id 116804918 not found. Record will be skipped for this gene.; 1x
27. Entrez_Id 406881 not found. Record will be skipped for this gene.; 1x
28. Entrez_Id 406882 not found. Record will be skipped for this gene.; 1x
29. Entrez_Id 406888 not found. Record will be skipped for this gene.; 1x
30. Entrez_Id 406889 not found. Record will be skipped for this gene.; 1x
31. Entrez_Id 406895 not found. Record will be skipped for this gene.; 1x
32. Entrez_Id 406896 not found. Record will be skipped for this gene.; 1x
33. Entrez_Id 406911 not found. Record will be skipped for this gene.; 1x
34. Entrez_Id 406912 not found. Record will be skipped for this gene.; 1x
35. Entrez_Id 406926 not found. Record will be skipped for this gene.; 1x
36. Entrez_Id 406954 not found. Record will be skipped for this gene.; 1x
37. Entrez_Id 406955 not found. Record will be skipped for this gene.; 1x
38. Entrez_Id 406956 not found. Record will be skipped for this gene.; 1x
39. Entrez_Id 406972 not found. Record will be skipped for this gene.; 1x
40. Entrez_Id 406976 not found. Record will be skipped for this gene.; 1x
41. Entrez_Id 406977 not found. Record will be skipped for this gene.; 1x
42. Entrez_Id 406995 not found. Record will be skipped for this gene.; 1x
43. Entrez_Id 407002 not found. Record will be skipped for this gene.; 1x
44. Entrez_Id 407015 not found. Record will be skipped for this gene.; 1x
45. Entrez_Id 407016 not found. Record will be skipped for this gene.; 1x
46. Entrez_Id 407031 not found. Record will be skipped for this gene.; 1x
47. Entrez_Id 407032 not found. Record will be skipped for this gene.; 1x
48. Entrez_Id 693127 not found. Record will be skipped for this gene.; 1x
49. Entrez_Id 693133 not found. Record will be skipped for this gene.; 1x
50. Entrez_Id 693134 not found. Record will be skipped for this gene.; 1x
51. Entrez_Id null not found. Record will be skipped for this gene.; 7x
52. Gene ADAMTSL4-AS1 (574406) found to be duplicated in your file. Duplicated row will be ignored!; 1x
53. Gene BUB1B-PAK6 (106821730) found to be duplicated in your file. Duplicated row will be ignored!; 1x
54. Gene CHURC1-FNTB (100529261) found to be duplicated in your file. Duplicated row will be ignored!; 1x
55. Gene DDTL (100037417) found to be duplicated in your file. Duplicated row will be ignored!; 1x
56. Gene DHRS4L1 (728635) found to be duplicated in your file. Duplicated row will be ignored!; 1x
57. Gene GOLGA8K (653125) found to be duplicated in your file. Duplicated row will be ignored!; 1x
58. Gene GOLGA8O (728047) found to be duplicated in your file. Duplicated row will be ignored!; 1x
59. Gene GPR75-ASB3 (100302652) found to be duplicated in your file. Duplicated row will be ignored!; 1x
60. Gene GTF2H2C (728340) found to be duplicated in your file. Duplicated row will be ignored!; 1x
61. Gene GUSBP1 (728411) found to be duplicated in your file. Duplicated row will be ignored!; 1x
62. Gene OPN3 (23596) found to be duplicated in your file. Duplicated row will be ignored!; 1x
63. Gene TBC1D3E (102723859) found to be duplicated in your file. Duplicated row will be ignored!; 1x
Done.
Total time: 493367 ms