AfterQC
AfterQC copied to clipboard
Specify output folder name
Is it possible to make option for specifying output folder name with the report files rather than using input files names?
Yours faithfully, Katerina
AfterQC is designed to run in batch. So, normally AfterQC will create a QC
folder, and within the QC
folder there will be folders for different input fastq.
You can change the name QC
to report
by specifying -r report
in the command line.
Then the dir tree will be like:
report/
└── R1.fq
├── report.html
└── report.json
So, your requirement is not to include 'R1.fq' folder inside the report folder, and make the dir tree like:
report/
├── report.html
└── report.json
Am I right?
Yeah, so the user can specify "report" folder for each pair manually if running in -1 -2 mode, for example.
I think, that it would be perfect I user can specify some prefix for reports, e.g. --report-prefix=/path/to/dir/filename
and then:
/path/to/dir/
├── filename.html
└── filename.json
@alezanalp do you agree with @serge2016 ?
I have submitted a commit to implement @serge2016 's idea. You can pull or download the latest master to have a try.
Now, you will get
QC
├── filename1.fq.html
└── filename1.fq.json
└── filename2.fq.html
└── filename2.fq.json
...
And you can change folder name from QC
to report
by specifying -r report
. And you can also specify an absolute path by -r /path/to/dir/
@sfchen Yes, I agree with @serge2016 . Thank you for the prompt reply. Will try it
There is one more "issue" or bag with this in v0.9.0:
If I run after.py --read1_file=SRR3184279_1.fastq.gz --read2_file=SRR3184279_2.fastq.gz --read1_flag=_1 --read2_flag=_2 --qc_only
then I get everything ok:
$(pwd)/QC/
└── SRR3184279_1.fastq.gz
├── report.html
└── report.json
But if I run after.py --read1_file=SRR3184279_1.fastq.gz --read2_file=SRR3184279_2.fastq.gz --read1_flag=_1 --read2_flag=_2 --qc_only --report_output_folder=$(pwd)
then I get:
SRR3184279_1.fastq.gz options:
{'qc_only': True, 'version': '0.9.0', 'seq_len_req': 35, 'index1_file': None, 'trim_tail': 0, 'report_output_folder': '/home/bg/kate/AfterQC/PE_reads/', 'trim_pair_same': True, 'no_correction': False, 'debubble_dir': 'debubble', 'barcode_flag': 'barcode', 'read2_file': 'SRR3184279_2.fastq.gz', 'barcode_length': 12, 'trim_tail2': 0, 'unqualified_base_limit': 60, 'allow_mismatch_in_poly': 2, 'read2_flag': '_2', 'store_overlap': False, 'debubble': False, 'read1_flag': '_1', 'index2_flag': 'I2', 'draw': True, 'index1_flag': 'I1', 'mask_mismatch': False, 'barcode': False, 'overlap_output_folder': None, 'barcode_verify': 'CAGTA', 'index2_file': None, 'qualified_quality_phred': 15, 'trim_front': 9, 'good_output_folder': 'good', 'poly_size_limit': 35, 'n_base_limit': 5, 'qc_sample': 200000, 'trim_front2': 9, 'no_overlap': False, 'input_dir': None, 'read1_file': 'SRR3184279_1.fastq.gz', 'qc_kmer': 8, 'bad_output_folder': None}
Traceback (most recent call last):
File "/home/bg/soft/AfterQC-0.9.0/after.py", line 221, in <module>
main()
File "/home/bg/soft/AfterQC-0.9.0/after.py", line 215, in main
processOptions(options)
File "/home/bg/soft/AfterQC-0.9.0/after.py", line 171, in processOptions
filter.run()
File "/home/bg/soft/AfterQC-0.9.0/preprocesser.py", line 709, in run
stat_file = open(os.path.join(qc_dir, "report.json"), "w")
IOError: [Errno 20] Not a directory: '/home/bg/kate/AfterQC/PE_reads/SRR3184279_1.fastq.gz/report.json'
This error occurs if I set the -r dir equal to the dir, where I run AfterQC from.
@serge2016 this issue is because of v0.9.0 need to create a folder same as the R1 fastq file name, so it will conflict with the fastq file name if $(pwd)
is specified as report_output_folder.
I believe with last commit, this issue is gone.
I just released v0.9.1. You can have a try with the new feature described above.
Now previous behavior is changed to more predictable:) Thank you!
But I still think about the variant when we specify -1
and -2
options: is this mode we have only one sample, so we can specify the full output name for the report.
I simply want to use your tool inside CWL environment, and it is easier to do if it is possible to specify output filenames independently from input filenames.