dada2
dada2 copied to clipboard
Demultiplex error
Hi, I'm still relatively new to R and I'm struggling to get my files to demultiplex. path <-("C:/Users/andring/Documents/spring_data_research") setwd(path) file_paths <- as.character(list.files(path, full.names = TRUE)) trunc_len <- c(200,200) trunc_q <- 2 max_n <- 0 max_ee <- c(2,2) filtered_data <- filterAndTrim(file_paths, truncLen = trunc_len , truncQ = trunc_q, maxN = max_n, maxEE = max_ee, filt = c(TRUE, TRUE)) Error in filterAndTrim(file_path, truncLen = trunc_len, truncQ = trunc_q, : File paths must be provided as character vectors.
I keep getting this error and can't figure out how to fix it. Anything helps!
Hi @Hellomissgabby, Can you give the result of the command:
file_paths
@adrientaudiere it returns the name of the path.
@Hellomissgabby First a note, what you are doing here is filtering and trimming the sequencing data, not demultiplexing (which is the separation of data from the sequencer into per-sample fastq files).
filterAndTrim
has two required arguments, the file paths of the files you want to filter, and file paths to the new location to which the filtered fastq files should be written. You haven't provided that second set of file paths. See the dada2 tutorial for an example, and you can always look at the documentation for any R function via ?filterAndTrim
.
@benjjneb What approach would be appropriate to demultiplex in R. I'm struggling to follow the tutorial for the demultiplexing aspect.
I don't know of any functionality in R for demultiplexing. Are you sure you need it? Typically sequencing data these days is already demultplexed into per-sample fastq files. You just have one big fastq files with multiple samples in it?
See the first question on the dada2 FAQ for some demultiplexing suggestions.
@benjjneb I multiple files but some of my files are really large and when I do the Dada2 procedure it tells me that some of my forwards and reverses don't match in length. So my assumption was that they still needed to be demultiplex in order to make the length of them shorter.
I think the best place to start would be to get a clear description of the data you are working with, probably from whoever generated it or gave it to you. What sequencing technology was used to generate the data? Is it amplicon sequencing data, and if so what amplicon was sequenced and using what primers? Is the data already in per-sample fastq files? (i.e. already demultiplexed). Was it pre-processed in any way? If so, it is possible to get the raw sequencing data before pre-processing?
Without understanding what the data is, it's going to be hard to fix errors you are running into.