immunarch icon indicating copy to clipboard operation
immunarch copied to clipboard

Loading 10x Genomics Data: Error in step_subset

Open mdozmorov opened this issue 2 years ago • 8 comments

Hello. I started with the Loading 10x Genomics Data tutorial, downloaded the CSV files from 10X website and ran immdata_10x <- repLoad(file_path). It results in error, reproducible with the data I actually want to analyze:

== Step 1/3: loading repertoire files... ==

Processing "/Users/mdozmorov/Documents/Data/VCU_work/Sawalha/2021-06.scRNA_scATAC/test_immunarch/data" ...
  -- [1/5] Parsing "/Users/mdozmorov/Documents/Data/VCU_work/Sawalha/2021-06.scRNA_scATAC/test_immunarch/data/vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv" -- 10x (filt.contigs)
  [!] Removed 2917 clonotypes with no nucleotide and amino acid CDR3 sequence.                                                             
Error in step_subset(parent, vars = vars, groups = groups, arrange = arrange,  : 
  is.null(j) || is_expression(j) is not TRUE
In addition: Warning message:
The following named parsers don't match the column names: barcode,is_cell,contig_id,high_confidence,length,chain,v_gene,d_gene,j_gene,c_gene,full_length,productive,cdr3,cdr3_nt,reads,umis,raw_clonotype_id,raw_consensus_id 

The files I downloaded and put in a separate file_path folder are:

vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv
vdj_v1_mm_c57bl6_pbmc_t_clonotypes.csv
vdj_v1_mm_c57bl6_pbmc_t_consensus_annotations.csv
vdj_v1_mm_c57bl6_pbmc_t_filtered_contig_annotations.csv
vdj_v1_mm_c57bl6_pbmc_t_metrics_summary.csv

I'm using Immunarch v.0.6.7 on a Mac. What may be wrong?

mdozmorov avatar Feb 13 '22 20:02 mdozmorov

Hi, @mdozmorov! My name is Maria Volobueva, I am a developer of the Immunarch package.

We have managed to reproduce your issue. Now we are working on fixing it.

I will get back to you with any updates.

Thank you so much for drawing our attention to this.

Good luck, Maria Volobueva

MVolobueva avatar Mar 28 '22 12:03 MVolobueva

Hello, @mdozmorov ​ I've figured out what the bug was. We have already fixed it in the dev-branch of Immunarch. ​ To install this branch you can utilize the following commands: ​ install.packages(c("devtools", "pkgload")) devtools::install_github("immunomind/immunarch", ref="dev") devtools::reload(pkgload::inst("immunarch")) ​ If you are working in Rstudio and the bug bothers you again, you need to go to Tools -> Project Options -> Restore .Rdata into workspace at startup -> No and then start your new project. ​ Do not hesitate to contact us with any questions further along. ​ Good luck, Maria Volobueva

MVolobueva avatar Mar 31 '22 12:03 MVolobueva

Thanks, Maria, I followed your instructions verbatim, but the problem still persists. I did reinstall immunarch from the dev branch, updated all packages, ensured global and local workspace restoring is disabled. I'm copy-pasting the code, the error is identical.

# 1.1) Load the package into R:
# devtools::install_github("immunomind/immunarch", ref="dev")
library(immunarch)

# 1.2) Replace with the path to your processed 10x data or to the clonotypes file
file_path = "/Users/mdozmorov/Documents/Data/VCU_work/test_immunarch/data"

# 1.3) Load 10x data with repLoad
immdata_10x <- repLoad(file_path)

mdozmorov avatar Apr 01 '22 00:04 mdozmorov

Hi, @mdozmorov!

My name is Aleksandr Popov, I am a developer of the Immunarch package.

When I tried to reproduce this bug, I noticed that it appears only when there are remains of old version of Immunarch, or there are function name conflicts in R environment. Please try to run R from terminal with R --vanilla command (to start it with empty environment) and run these commands:

install.packages(c("devtools", "pkgload"))
devtools::install_github("immunomind/immunarch", ref="dev")
devtools::reload(pkgload::inst("immunarch"))
file_path = "/Users/mdozmorov/Documents/Data/VCU_work/test_immunarch/data"
immdata_10x <- repLoad(file_path)

I hope this will help to load the data correctly.

Best regards, Aleksandr

Alexander230 avatar Apr 01 '22 10:04 Alexander230

It didn't help. The R --vanilla session still senses the installation and

Skipping install of 'immunarch' from a github remote, the SHA1 (37d06bef) has not changed since last install.
  Use `force = TRUE` to force installation

Manually removing it

rm -r /Users/mdozmorov/Library/R/x86_64/4.1/library/immunarch

and reinstalling still results in the same error.

mdozmorov avatar Apr 01 '22 11:04 mdozmorov

Hello, @mdozmorov!

I suppose that error persits as you try to load all files from your folder in Immunarch. But Immunarch could load only files with proper format. Files that names end with contig_annotations.csv should be loaded correctly.

Please try to replace the file_path variable in your script:

file_path = "/Users/mdozmorov/Documents/Data/VCU_work/test_immunarch/data/vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv"

Do not hesitate to contact us with any questions further along.

Good luck, Maria

MVolobueva avatar Apr 05 '22 09:04 MVolobueva

Hello, I get the same error when loading my 10x genomics results. I installed your dev version package and restart my Rstudio and get same error. immdata <- repLoad(.path = './BM01/tcr/run_count/outs/all_contig_annotations.csv')

my error is like this:

== Step 1/3: loading repertoire files... ==

Processing "" ... -- [1/1] Parsing "/BM01/tcr/run_count/outs/all_contig_annotations.csv" -- 10x (filt.contigs) [!] Removed 1415 clonotypes with no nucleotide and amino acid CDR3 sequence.
Error in step_subset(parent, vars = vars, groups = groups, arrange = arrange, : is.null(j) || is_expression(j) is not TRUE In addition: Warning message: The following named parsers don't match the column names: barcode,is_cell,contig_id,high_confidence,length,chain,v_gene,d_gene,j_gene,c_gene,full_length,productive,fwr1,fwr1_nt,cdr1,cdr1_nt,fwr2,fwr2_nt,cdr2,cdr2_nt,fwr3,fwr3_nt,cdr3,cdr3_nt,fwr4,fwr4_nt,reads,umis,raw_clonotype_id,raw_consensus_id,exact_subclonotype_id

#version #you can see I am using the latest version. packageVersion('immunarch') [1] ‘0.6.8’

session info R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] immunarch_0.6.8 patchwork_1.1.1 data.table_1.14.0 dtplyr_1.1.0
[5] dplyr_1.0.8 ggplot2_3.3.3

loaded via a namespace (and not attached): [1] rappdirs_0.3.3 prabclus_2.3-2
[3] R.methodsS3_1.8.1 tidyr_1.1.3
[5] bit64_4.0.5 knitr_1.33
[7] DelayedArray_0.16.3 R.utils_2.10.1
[9] RCurl_1.98-1.5 doParallel_1.0.16
[11] generics_0.1.0 BiocGenerics_0.36.1
[13] callr_3.7.0 usethis_2.0.1
[15] RSQLite_2.2.7 shadowtext_0.0.8
[17] rlist_0.4.6.2 tzdb_0.2.0
[19] bit_4.0.4 enrichplot_1.15.3
[21] xml2_1.3.2 httpuv_1.6.1
[23] SummarizedExperiment_1.20.0 assertthat_0.2.1
[25] viridis_0.6.2 xfun_0.23
[27] hms_1.0.0 celldex_1.0.0
[29] babelgene_21.4 evaluate_0.14
[31] promises_1.2.0.1 DEoptimR_1.0-8
[33] fansi_0.4.2 dbplyr_2.1.1
[35] readxl_1.3.1 igraph_1.2.11
[37] DBI_1.1.1 geneplotter_1.68.0
[39] htmlwidgets_1.5.3 stringdist_0.9.6.3
[41] stats4_4.0.5 purrr_0.3.4
[43] ellipsis_0.3.2 ggpubr_0.4.0
[45] backports_1.2.1 annotate_1.68.0
[47] sparseMatrixStats_1.2.1 MatrixGenerics_1.2.1
[49] ggalluvial_0.12.3 vctrs_0.3.8
[51] Biobase_2.50.0 remotes_2.3.0
[53] Cairo_1.5-12.2 abind_1.4-5
[55] cachem_1.0.5 withr_2.4.2
[57] ggforce_0.3.3 robustbase_0.93-7
[59] vroom_1.5.6 treeio_1.14.4
[61] prettyunits_1.1.1 mclust_5.4.9
[63] cluster_2.1.2 DOSE_3.16.0
[65] ExperimentHub_1.16.1 ape_5.5
[67] lazyeval_0.2.2 crayon_1.4.1
[69] genefilter_1.72.1 pkgconfig_2.0.3
[71] tweenr_1.0.2 GenomeInfoDb_1.26.7
[73] nlme_3.1-152 pkgload_1.2.4
[75] nnet_7.3-16 devtools_2.4.3
[77] diptest_0.76-0 rlang_1.0.1
[79] lifecycle_1.0.1 downloader_0.4
[81] BiocFileCache_1.14.0 AnnotationHub_2.22.1
[83] cellranger_1.1.0 rprojroot_2.0.2
[85] polyclip_1.10-0 matrixStats_0.61.0
[87] flextable_0.6.5 phangorn_2.7.1
[89] ggseqlogo_0.1 Matrix_1.3-3
[91] aplot_0.0.6 carData_3.0-4
[93] base64enc_0.1-3 GlobalOptions_0.1.2
[95] processx_3.5.2 pheatmap_1.0.12
[97] png_0.1-7 viridisLite_0.4.0
[99] rjson_0.2.20 bitops_1.0-7
[101] R.oo_1.24.0 blob_1.2.1
[103] DelayedMatrixStats_1.12.3 shape_1.4.6
[105] stringr_1.4.0 qvalue_2.22.0
[107] readr_2.1.2 rstatix_0.7.0
[109] gridGraphics_0.5-1 ggsignif_0.6.1
[111] S4Vectors_0.28.1 scales_1.1.1
[113] memoise_2.0.0 magrittr_2.0.1
[115] plyr_1.8.6 zlibbioc_1.36.0
[117] compiler_4.0.5 scatterpie_0.1.6
[119] factoextra_1.0.7 RColorBrewer_1.1-2
[121] clue_0.3-60 DESeq2_1.30.1
[123] cli_3.2.0 XVector_0.30.0
[125] ps_1.6.0 MASS_7.3-54
[127] tidyselect_1.1.1 forcats_0.5.1
[129] stringi_1.7.6 yaml_2.2.1
[131] GOSemSim_2.16.1 locfit_1.5-9.4
[133] ggrepel_0.9.1 grid_4.0.5
[135] fastmatch_1.1-0 tools_4.0.5
[137] rio_0.5.26 parallel_4.0.5
[139] rvg_0.2.5 circlize_0.4.13
[141] rstudioapi_0.13 uuid_0.1-4
[143] foreign_0.8-81 foreach_1.5.1
[145] gridExtra_2.3 devEMF_4.0-2
[147] farver_2.1.0 ggraph_2.0.5
[149] digest_0.6.27 rvcheck_0.1.8
[151] BiocManager_1.30.16 shiny_1.6.0
[153] quadprog_1.5-8 fpc_2.2-9
[155] Rcpp_1.0.6 car_3.0-10
[157] GenomicRanges_1.42.0 broom_0.7.6
[159] BiocVersion_3.12.0 R.devices_2.17.0
[161] later_1.2.0 httr_1.4.2
[163] gdtools_0.2.3 AnnotationDbi_1.52.0
[165] ComplexHeatmap_2.6.2 kernlab_0.9-29
[167] colorspace_2.0-1 job_0.3.0
[169] XML_3.99-0.9 fs_1.5.0
[171] IRanges_2.24.1 splines_4.0.5
[173] yulab.utils_0.0.4 tidytree_0.3.4
[175] graphlayouts_0.7.1 shinythemes_1.2.0
[177] flexmix_2.3-17 ggplotify_0.0.7
[179] plotly_4.9.3 sessioninfo_1.1.1
[181] systemfonts_1.0.2 xtable_1.8-4
[183] jsonlite_1.7.2 ggtree_2.4.2
[185] tidygraph_1.2.0 UpSetR_1.4.0
[187] modeltools_0.2-23 testthat_3.0.2
[189] R6_2.5.0 pillar_1.6.1
[191] htmltools_0.5.2 mime_0.10
[193] glue_1.6.0 fastmap_1.1.0
[195] clusterProfiler_3.18.1 BiocParallel_1.24.1
[197] class_7.3-19 interactiveDisplayBase_1.28.0 [199] codetools_0.2-18 fgsea_1.16.0
[201] pkgbuild_1.2.0 utf8_1.2.1
[203] lattice_0.20-44 tibble_3.1.2
[205] curl_4.3.1 officer_0.3.18
[207] magick_2.7.2 openxlsx_4.2.3
[209] zip_2.1.1 GO.db_3.12.1
[211] survival_3.2-11 rmarkdown_2.8
[213] desc_1.3.0 munsell_0.5.0
[215] DO.db_2.9 GetoptLong_1.0.5
[217] GenomeInfoDbData_1.2.4 iterators_1.0.13
[219] haven_2.4.1 reshape2_1.4.4
[221] gtable_0.3.0 msigdbr_7.4.1
[223] eoffice_0.2.1

Then I tested several versions of the package and only versions before 0.6.5 can load 10x data correctly. That means 0.6.4 can load 10x genomics data, but 0.6.5 0.6.7 not.

Hope you can give me some suggestions. Thank you!

shanshenbing avatar May 05 '22 07:05 shanshenbing

Hi, @shanshenbing!

Thank you for contacting us. I suppose that error persits as package versions conflict in Rstudio. To test it, write on the command line:

R --vanilla

Than install proper version of immunarch again and repeat your command (on the command line too). If everything will be ok, just update Rstudio projects, otherwise let us know.

Do not hesitate to contact us with any questions further along.

Good luck, Maria Samokhina

MVolobueva avatar May 31 '22 10:05 MVolobueva