adegenet icon indicating copy to clipboard operation
adegenet copied to clipboard

121 of 9,346 importing from structure file - inconsistent "negative subscripts" error

Open gottschoa opened this issue 8 years ago • 15 comments

Hi,

I am trying to run DAPC on my new Sceloporus dataset. I successfully got this to work before with some other datasets. I am using adegenet 2.0.0.

When I run the following line for my "Uma dataset", I am able to successfully import:

library(adegenet)

data <- read.structure("output_western_uma_121214_n60_h5_p75_editnames_2.str", n.ind=64, n.loc=597, onerowperind=FALSE, col.lab=1, col.pop=0, col.others=NULL, row.marknames=NULL, NA.char="-9", pop=NULL, ask=FALSE, quiet=FALSE)

When I run the same code for the "Sceloporus dataset":

data <- read.structure("output_sceloporus_032415_n43_h5_p75.str", n.ind=80, n.loc=1024, onerowperind=FALSE, col.lab=1, col.pop=0, col.others=NULL, row.marknames=NULL, NA.char="-9", pop=NULL, ask=FALSE, quiet=FALSE)

I get the following error:

Error in mat[, (ncol(mat) - p + 1):ncol(mat)] : only 0's may be mixed with negative subscripts

I also tried this with adegenet v 1.4.2 and having the exact same issue.

I attached both input (structure) files to this email. They were both formatted the same way, from pyRAD v2.1.2. If anyone can figure out why one file is giving me the error, and the other isn't, I would greatly appreciate it.

I should point out that I searched the archives, a similar question has been posted about a year ago, but I didn't see it resolved:

http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2014-December/001049.html

Thanks for your help! (I added .txt extension to the .str files to upload to github)

Best, Andy

output_sceloporus_032415_n43_h5_p75.str.txt output_western_uma_121214_n60_h5_p75_editnames_2.str.txt

gottschoa avatar Apr 20 '16 20:04 gottschoa

Hi there, before looking into this, have you tried with the latest version of adegenet (2.0.1)

thibautjombart avatar Apr 21 '16 10:04 thibautjombart

Is this issue still pending?

thibautjombart avatar Aug 05 '16 09:08 thibautjombart

Hi Dr. Jombart,

Sorry for the delayed reponse, I tried with 2.01 and still encounter the same issue.

Best, Andy

On Fri, Aug 5, 2016 at 5:56 AM, Thibaut Jombart [email protected] wrote:

Is this issue still pending?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/thibautjombart/adegenet/issues/141#issuecomment-237811321, or mute the thread https://github.com/notifications/unsubscribe-auth/ARuMZ2Ul3ldczVX2U0VrbIOxJrTX1zURks5qcwjIgaJpZM4IMFoz .

Andrew Gottscho, Ph.D. [email protected]

gottschoa avatar Aug 16 '16 23:08 gottschoa

Hi,

I'm having the same issue. I tried with 2.01 and continue to have the same error code. I'm using .str data.[ no_pop_map_snp_data.txt

](url)

MagB avatar Nov 23 '16 14:11 MagB

Hi there, I am heading to a conference all of next week, so will not be able to look into this before a week. If this is a persistent error, this may be a bug. If you have time for this, you can try and see what is wrong using:

debug(read.structure)

before entering the command line creating the error.

thibautjombart avatar Nov 25 '16 17:11 thibautjombart

Hi @gottschoa, the reason why this fails is because adegenet can only detect 1019 loci and not 1024. If you read the structure file in as a table, there are only 1019 columns that register as loci.

library("adegenet")
#> Loading required package: ade4
#> 
#>    /// adegenet 2.1.0 is loaded ////////////
#> 
#>    > overview: '?adegenet'
#>    > tutorials/doc/questions: 'adegenetWeb()' 
#>    > bug reports/feature requests: adegenetIssues()
tmp <- tempfile(fileext = ".str")
download.file("https://github.com/thibautjombart/adegenet/files/228778/output_sceloporus_032415_n43_h5_p75.str.txt", 
  destfile = tmp)
read.structure(tmp, n.ind = 80, n.loc = 1024, onerowperind = FALSE, col.lab = 1, 
  col.pop = 0, col.others = NULL, row.marknames = NULL, NA.char = "-9", pop = NULL, 
  ask = FALSE, quiet = FALSE)
#> 
#>  Converting data from a STRUCTURE .stru file to a genind object...
#> Error in mat[, (ncol(mat) - p + 1):ncol(mat)]: only 0's may be mixed with negative subscripts
read.structure(tmp, n.ind = 80, n.loc = 1019, onerowperind = FALSE, col.lab = 1, 
  col.pop = 0, col.others = NULL, row.marknames = NULL, NA.char = "-9", pop = NULL, 
  ask = FALSE, quiet = FALSE)
#> 
#>  Converting data from a STRUCTURE .stru file to a genind object...
#> Warning in df2genind(X = X, pop = pop, ploidy = 2, sep = sep, ncode =
#> ncode): entirely non-type marker(s) deleted
#> /// GENIND OBJECT /////////
#> 
#>  // 80 individuals; 1,017 loci; 2,047 alleles; size: 1.1 Mb
#> 
#>  // Basic content
#>    @tab:  80 x 2047 matrix of allele counts
#>    @loc.n.all: number of alleles per locus (range: 1-4)
#>    @loc.fac: locus factor for the 2047 columns of @tab
#>    @all.names: list of allele names for each locus
#>    @ploidy: ploidy of each individual  (range: 2-2)
#>    @type:  codom
#>    @call: read.structure(file = tmp, n.ind = 80, n.loc = 1019, onerowperind = FALSE, 
#>     col.lab = 1, col.pop = 0, col.others = NULL, row.marknames = NULL, 
#>     NA.char = "-9", pop = NULL, ask = FALSE, quiet = FALSE)
#> 
#>  // Optional content
#>    - empty -
sum(!sapply(read.table(tmp, sep = "\t"), is.logical))
#> [1] 1020
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.2 (2017-09-28)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/Chicago             
#>  date     2017-10-10
#> Packages -----------------------------------------------------------------
#>  package    * version    date       source                        
#>  ade4       * 1.7-8      2017-08-09 cran (@1.7-8)                 
#>  adegenet   * 2.1.0      2017-10-10 local                         
#>  ape          4.1        2017-02-14 CRAN (R 3.4.0)                
#>  assertthat   0.2.0      2017-04-11 CRAN (R 3.4.0)                
#>  backports    1.1.1      2017-09-25 CRAN (R 3.4.2)                
#>  base       * 3.4.2      2017-10-04 local                         
#>  bindr        0.1        2016-11-13 CRAN (R 3.4.0)                
#>  bindrcpp     0.2        2017-06-17 CRAN (R 3.4.0)                
#>  boot         1.3-20     2017-07-30 CRAN (R 3.4.1)                
#>  cluster      2.0.6      2017-03-16 CRAN (R 3.4.0)                
#>  coda         0.19-1     2016-12-08 CRAN (R 3.4.0)                
#>  colorspace   1.3-3      2017-08-16 R-Forge (R 3.4.1)             
#>  compiler     3.4.2      2017-10-04 local                         
#>  datasets   * 3.4.2      2017-10-04 local                         
#>  deldir       0.1-14     2017-04-22 CRAN (R 3.4.0)                
#>  devtools     1.13.3     2017-08-02 CRAN (R 3.4.1)                
#>  digest       0.6.12     2017-01-27 CRAN (R 3.4.0)                
#>  dplyr        0.7.4      2017-09-28 CRAN (R 3.4.1)                
#>  evaluate     0.10.1     2017-06-24 CRAN (R 3.4.1)                
#>  expm         0.999-2    2017-03-29 CRAN (R 3.4.0)                
#>  formatR      1.5        2017-04-25 CRAN (R 3.4.0)                
#>  gdata        2.18.0     2017-06-06 CRAN (R 3.4.0)                
#>  ggplot2      2.2.1      2016-12-30 CRAN (R 3.4.0)                
#>  glue         1.1.1      2017-06-21 CRAN (R 3.4.0)                
#>  gmodels      2.16.2     2015-07-22 CRAN (R 3.4.0)                
#>  graphics   * 3.4.2      2017-10-04 local                         
#>  grDevices  * 3.4.2      2017-10-04 local                         
#>  grid         3.4.2      2017-10-04 local                         
#>  gtable       0.2.0      2016-02-26 CRAN (R 3.4.0)                
#>  gtools       3.5.0      2015-05-29 CRAN (R 3.4.0)                
#>  htmltools    0.3.6      2017-04-28 CRAN (R 3.4.0)                
#>  httpuv       1.3.5      2017-07-04 CRAN (R 3.4.1)                
#>  igraph       1.1.2      2017-07-21 cran (@1.1.2)                 
#>  knitr        1.17       2017-08-10 cran (@1.17)                  
#>  lattice      0.20-35    2017-03-25 CRAN (R 3.4.0)                
#>  lazyeval     0.2.0      2016-06-12 CRAN (R 3.4.0)                
#>  LearnBayes   2.15       2014-05-29 CRAN (R 3.4.0)                
#>  magrittr     1.5        2014-11-22 CRAN (R 3.4.0)                
#>  MASS         7.3-47     2017-04-21 CRAN (R 3.4.0)                
#>  Matrix       1.2-11     2017-08-16 CRAN (R 3.4.1)                
#>  memoise      1.1.0      2017-04-21 CRAN (R 3.4.0)                
#>  methods    * 3.4.2      2017-10-04 local                         
#>  mgcv         1.8-22     2017-09-19 CRAN (R 3.4.2)                
#>  mime         0.5        2016-07-07 CRAN (R 3.4.0)                
#>  munsell      0.4.3      2016-02-13 CRAN (R 3.4.0)                
#>  nlme         3.1-131    2017-02-06 CRAN (R 3.4.0)                
#>  parallel     3.4.2      2017-10-04 local                         
#>  permute      0.9-4      2016-09-09 CRAN (R 3.4.0)                
#>  pkgconfig    2.0.1      2017-03-21 CRAN (R 3.4.0)                
#>  plyr         1.8.4      2016-06-08 CRAN (R 3.4.0)                
#>  R6           2.2.2      2017-06-17 cran (@2.2.2)                 
#>  Rcpp         0.12.13.1  2017-10-10 Github (RcppCore/Rcpp@136d50f)
#>  reshape2     1.4.2      2016-10-22 CRAN (R 3.4.0)                
#>  rlang        0.1.2      2017-08-09 cran (@0.1.2)                 
#>  rmarkdown    1.6        2017-06-15 cran (@1.6)                   
#>  rprojroot    1.2        2017-01-16 CRAN (R 3.4.0)                
#>  scales       0.5.0.9000 2017-08-28 Github (hadley/scales@d767915)
#>  seqinr       3.4-5      2017-08-01 CRAN (R 3.4.1)                
#>  shiny        1.0.5      2017-08-23 cran (@1.0.5)                 
#>  sp           1.2-5      2017-06-29 CRAN (R 3.4.1)                
#>  spdep        0.6-15     2017-09-01 CRAN (R 3.4.1)                
#>  splines      3.4.2      2017-10-04 local                         
#>  stats      * 3.4.2      2017-10-04 local                         
#>  stringi      1.1.5      2017-04-07 CRAN (R 3.4.0)                
#>  stringr      1.2.0      2017-02-18 CRAN (R 3.4.0)                
#>  tibble       1.3.4      2017-08-22 cran (@1.3.4)                 
#>  tools        3.4.2      2017-10-04 local                         
#>  utils      * 3.4.2      2017-10-04 local                         
#>  vegan        2.4-4      2017-08-24 cran (@2.4-4)                 
#>  withr        2.0.0      2017-07-28 CRAN (R 3.4.1)                
#>  xtable       1.8-2      2016-02-05 CRAN (R 3.4.0)                
#>  yaml         2.1.14     2016-11-12 CRAN (R 3.4.0)

zkamvar avatar Oct 10 '17 15:10 zkamvar

Hi, I still have this problem if I run dapc. I am using the lastest verison 2.1.2 It works fine if i work with imputed data. But leaving missing marker data as NA is giving me this error. "Fehler in dm[, 1L:dimen, drop = FALSE] : nur Nullen dürfen mit negativen Indizes gemischt werden"

saidwali avatar Feb 21 '20 13:02 saidwali

Hi, I still have this problem if I run dapc. I am using the lastest verison 2.1.2 It works fine if i work with imputed data. But leaving missing marker data as NA is giving me this error. "Fehler in dm[, 1L:dimen, drop = FALSE] : nur Nullen dürfen mit negativen Indizes gemischt werden"

Are you leaving missing data in the file as NA or as -9?

zkamvar avatar Feb 21 '20 16:02 zkamvar

Hi, I found out it was not working because of some stupid mistakes. Somehow it works now also with missing data. I am use "NA" for missing marker information. "1" for major, "2" for hetero and "3" for minor.

Some functions give me an error like "find.clusters" "Warning in find.clusters.data.frame(as.data.frame(x), ...) : NAs introduced by coercion". "Dudi.pca" is also not working with missing data. But I guess this is normal and I can live with that. DAPC, scatter etc are working fine.

saidwali avatar Feb 21 '20 18:02 saidwali

Hi, I found out it was not working because of some stupid mistakes. Somehow it works now also with missing data. I am use "NA" for missing marker information. "1" for major, "2" for hetero and "3" for minor.

Just to confirm: you are referring to an error with read.structure()?

The system you describe is not supported by adegenet and will give you incorrect results. Adegenet assumes that you represent each allele individually so that it can then represent those as counts in a sparse matrix.

zkamvar avatar Feb 21 '20 21:02 zkamvar

Hello, I am having a very similar problem with the dapc command, where I get the same error as saidwali when I run the code "mmOfour <- dapc(genlit.vcf, pop.list$pop, n.pca = 20, n.da = 4)" Error in dm[, 1L:dimen, drop = FALSE] : only 0's may be mixed with negative subscripts

I am currently running the Adegenet package 2.1.2. I am generating the genlight file with vcfR. string.vcf <- read.vcfR("file.vcf") genlit.vcf <- vcfR2genlight(string.vcf)

The Adegenet find.clusters program works with the genlight file. Additionally, previously generated genlight files work when running dapc.

I have been spinning my wheels with this error code for the past week, as I am re-analyzing some data after some changes to upstream filtering processes. I have relaxed some filters so that the new vcf/genlit files have more SNPs, and more missing data (however no more than ~25%).

Any help would be appreciated!

kkolis avatar Apr 20 '20 19:04 kkolis

Hi @gottschoa, the reason why this fails is because adegenet can only detect 1019 loci and not 1024. If you read the structure file in as a table, there are only 1019 columns that register as loci.

library("adegenet")
#> Loading required package: ade4
#> 
#>    /// adegenet 2.1.0 is loaded ////////////
#> 
#>    > overview: '?adegenet'
#>    > tutorials/doc/questions: 'adegenetWeb()' 
#>    > bug reports/feature requests: adegenetIssues()
tmp <- tempfile(fileext = ".str")
download.file("https://github.com/thibautjombart/adegenet/files/228778/output_sceloporus_032415_n43_h5_p75.str.txt", 
  destfile = tmp)
read.structure(tmp, n.ind = 80, n.loc = 1024, onerowperind = FALSE, col.lab = 1, 
  col.pop = 0, col.others = NULL, row.marknames = NULL, NA.char = "-9", pop = NULL, 
  ask = FALSE, quiet = FALSE)
#> 
#>  Converting data from a STRUCTURE .stru file to a genind object...
#> Error in mat[, (ncol(mat) - p + 1):ncol(mat)]: only 0's may be mixed with negative subscripts
read.structure(tmp, n.ind = 80, n.loc = 1019, onerowperind = FALSE, col.lab = 1, 
  col.pop = 0, col.others = NULL, row.marknames = NULL, NA.char = "-9", pop = NULL, 
  ask = FALSE, quiet = FALSE)
#> 
#>  Converting data from a STRUCTURE .stru file to a genind object...
#> Warning in df2genind(X = X, pop = pop, ploidy = 2, sep = sep, ncode =
#> ncode): entirely non-type marker(s) deleted
#> /// GENIND OBJECT /////////
#> 
#>  // 80 individuals; 1,017 loci; 2,047 alleles; size: 1.1 Mb
#> 
#>  // Basic content
#>    @tab:  80 x 2047 matrix of allele counts
#>    @loc.n.all: number of alleles per locus (range: 1-4)
#>    @loc.fac: locus factor for the 2047 columns of @tab
#>    @all.names: list of allele names for each locus
#>    @ploidy: ploidy of each individual  (range: 2-2)
#>    @type:  codom
#>    @call: read.structure(file = tmp, n.ind = 80, n.loc = 1019, onerowperind = FALSE, 
#>     col.lab = 1, col.pop = 0, col.others = NULL, row.marknames = NULL, 
#>     NA.char = "-9", pop = NULL, ask = FALSE, quiet = FALSE)
#> 
#>  // Optional content
#>    - empty -
sum(!sapply(read.table(tmp, sep = "\t"), is.logical))
#> [1] 1020

Session info

Dear @zkamvar I am facing the same problem as mentioned, How will i know that how many loci are detected in the structure file? am going round and round but could not figure it out. Please help me how I will know the number of loci being read by adegent? Thank you so much in advance, genotypic.data.structure.AFG landrace.stru.txt

massub avatar Aug 05 '20 16:08 massub

I had the same problem on a data set of A. obstetricans: AO_gen_F<-read.structure( "File", sep = ";", n.ind=474, n.loc=13, onerowperind = TRUE, NA.char="-9", col.lab=1, col.pop=2, row.marknames = 1, col.others = 0)

Comparing with other .stru I have, I saw that my working .stru have a space separator while those not working have a ; . Thus I replace ; per spaces in my not working file and obtained the expected results. So the bug is in the parameter management.

Cheers

SMoulherat avatar Nov 14 '20 10:11 SMoulherat

Hello, I am having a very similar problem with the dapc command, where I get the same error as saidwali when I run the code "mmOfour <- dapc(genlit.vcf, pop.list$pop, n.pca = 20, n.da = 4)" Error in dm[, 1L:dimen, drop = FALSE] : only 0's may be mixed with negative subscripts

I am currently running the Adegenet package 2.1.2. I am generating the genlight file with vcfR. string.vcf <- read.vcfR("file.vcf") genlit.vcf <- vcfR2genlight(string.vcf)

The Adegenet find.clusters program works with the genlight file. Additionally, previously generated genlight files work when running dapc.

I have been spinning my wheels with this error code for the past week, as I am re-analyzing some data after some changes to upstream filtering processes. I have relaxed some filters so that the new vcf/genlit files have more SNPs, and more missing data (however no more than ~25%).

Any help would be appreciated!

Has this been solved? I am experiencing the same thing... I also read the VCF file with vcfR and converted it with vcfR2genlight.

thesnakeguy avatar Jun 02 '21 11:06 thesnakeguy

Please forgive the lateness of my reply. It's.... been a hell of a year for everyone.

Regarding errors in structure files

It's likely that whitespace characters are giving you problems. There is a difference between a tab and a space that doesn't show up on text editors by default, which will cause problems down the line. For example, in my answer to the initial inquiry back in 2017, I showed that only 1019 loci were being detected. What I didn't explain was that there were six columns after the ID column that were completely blank because there was a series of six tabs after the ID. The truth is, there are many reason why this could be happening. Unfortunately the structure format is quite varied and it can be really hard to debug without knowing what you were expecting (number of loci and number of individuals)

Regarding vcfR errors

These errors don't have anything to do with the initial issue. You are getting a similar error because it's a common error message in R. The problem is that I don't have any way to reproduce the error you are getting because I don't know what the state of the data is. What I do know is that the code dm[, 1L:dimen, drop=FALSE] does not come from {adegenet}, rather it comes from MASS::predict.lda(). This comes from the Discriminant Analysis portion of the DAPC:

https://github.com/thibautjombart/adegenet/blob/78be588d418f8e5b0a05ebc2880917b1c6581054/R/dapc.R#L78-L92

Unfortunately, this is as far as I can go without knowing what your data looks like. What might help in debugging is to not set n.da and see how many discriminant axes are available because that is the source of the error.

zkamvar avatar Jun 06 '21 03:06 zkamvar