jvarkit icon indicating copy to clipboard operation
jvarkit copied to clipboard

specify list of scaffolds for wgsCoveragePlotter

Open PAMorin opened this issue 2 years ago • 2 comments

Verify

  • if you cannot compile/install , read https://github.com/lindenb/jvarkit/wiki/Compilation before submitting a new issue
  • the version of java

Subject of the issue

specify list of scaffolds for wgsCoveragePlotter

Your environment

  • version of jvarkit (unclear; installed 6/16/22 from "https://github.com/lindenb/jvarkit.git"
  • which OS: Linux 8.2.2004

Steps to reproduce

I want to plot coverage of only a specific subset of scaffolds, but all scaffolds are named sequentially (e.g., scaffold_1, scaffold_2, etc.). Is there a way to substitute a list variable or array for the --include-contig-regex regular expression:

e.g., chr=("scaffold_1" "scaffold_2" "scaffold_3" "scaffold_4" "scaffold_5" "scaffold_10") java -jar ${covplot}/dist/wgscoverageplotter.jar --dimension 1500x500 -C -1 --clip -R ${REFDIR}/${REF} ${BAMDIR}/${BAMFILE} --include-contig-regex ${chr} --percentile median > ${OUTDIR}/${BAMFILE}_covplot_allChrom.svg

Expected behaviour

WGS coverage plot with only the specified scaffolds

Actual behaviour

WGS coverage plot of only the first scaffold in the list $chr

PAMorin avatar Jun 17 '22 21:06 PAMorin

@PAMorin I'm sorry I think I don't really understand the problem.

How about using a regular expression like

(scaffold_1|scaffold_2|scaffold_3|scaffold_4|scaffold_5|scaffold_10)

?

lindenb avatar Jun 17 '22 22:06 lindenb

Thanks for the quick response!

It looks like you understood my problem (not understanding how to construct a multiple object expression) perfectly. This works:

chr="scaffold_4|scaffold_5|scaffold_10"

java -jar ${covplot}/dist/wgscoverageplotter.jar --dimension 1500x500 -C -1 --clip -R ${REFDIR}/${REF} ${BAMDIR}/${BAMFILE} --include-contig-regex ${chr} --percentile median  > ${OUTDIR}/${BAMFILE}_covplot_allChrom.svg

The output is attached. My genome assembly has a few hundred scaffolds, but the first 22 are chromosome-length, and I only want to plot coverage of those scaffolds. This approach allows me to specify which ones to plot, and I can further refine it to omit or include a few specific scaffolds of interest in a single plot.

Thanks,

Phil

On 6/17/22 3:17 PM, Pierre Lindenbaum wrote:

@PAMorin https://github.com/PAMorin I'm sorry I think I don't really understand the problem.

How about using a regular expression like

|(scaffold_1|scaffold_2|scaffold_3|scaffold_4|scaffold_5|scaffold_10)|

?

— Reply to this email directly, view it on GitHub https://github.com/lindenb/jvarkit/issues/202#issuecomment-1159274212, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAAERIDAIQMRZ34REYLI4TVPT2RBANCNFSM5ZDQSTXA. You are receiving this because you were mentioned.Message ID: @.***>

--

Phillip A. Morin, Ph.D. (he/him/his) Southwest Fisheries Science Center 8901 La Jolla Shores Dr. La Jolla, CA 92037, USA Phone: 858-546-7165 @.*** http://swfsc.noaa.gov/mmtd-mmgenetics

"I have no special talent, I am only passionately curious." Albert Einstein "Care about what other people think and you will always be their prisoner." Lao Tzu "Your value doesn't decrease based on someone's inability to see your worth." Unknown

PAMorin avatar Jun 17 '22 22:06 PAMorin