AfterQC icon indicating copy to clipboard operation
AfterQC copied to clipboard

Specify output folder name

Open alezanalp opened this issue 7 years ago • 10 comments

Is it possible to make option for specifying output folder name with the report files rather than using input files names?

Yours faithfully, Katerina

alezanalp avatar Mar 17 '17 08:03 alezanalp

AfterQC is designed to run in batch. So, normally AfterQC will create a QC folder, and within the QC folder there will be folders for different input fastq.

You can change the name QC to report by specifying -r report in the command line.

Then the dir tree will be like:

report/
└── R1.fq
    ├── report.html
    └── report.json

So, your requirement is not to include 'R1.fq' folder inside the report folder, and make the dir tree like:

report/
├── report.html
└── report.json

Am I right?

sfchen avatar Mar 17 '17 09:03 sfchen

Yeah, so the user can specify "report" folder for each pair manually if running in -1 -2 mode, for example.

alezanalp avatar Mar 17 '17 09:03 alezanalp

I think, that it would be perfect I user can specify some prefix for reports, e.g. --report-prefix=/path/to/dir/filename and then:

/path/to/dir/
├── filename.html
└── filename.json

serge2016 avatar Mar 17 '17 09:03 serge2016

@alezanalp do you agree with @serge2016 ?

sfchen avatar Mar 17 '17 10:03 sfchen

I have submitted a commit to implement @serge2016 's idea. You can pull or download the latest master to have a try.

Now, you will get

QC
├── filename1.fq.html
└── filename1.fq.json
└── filename2.fq.html
└── filename2.fq.json
...

And you can change folder name from QC to report by specifying -r report . And you can also specify an absolute path by -r /path/to/dir/

sfchen avatar Mar 17 '17 10:03 sfchen

@sfchen Yes, I agree with @serge2016 . Thank you for the prompt reply. Will try it

alezanalp avatar Mar 17 '17 10:03 alezanalp

There is one more "issue" or bag with this in v0.9.0: If I run after.py --read1_file=SRR3184279_1.fastq.gz --read2_file=SRR3184279_2.fastq.gz --read1_flag=_1 --read2_flag=_2 --qc_only then I get everything ok:

$(pwd)/QC/
└── SRR3184279_1.fastq.gz
    ├── report.html
    └── report.json

But if I run after.py --read1_file=SRR3184279_1.fastq.gz --read2_file=SRR3184279_2.fastq.gz --read1_flag=_1 --read2_flag=_2 --qc_only --report_output_folder=$(pwd) then I get:

SRR3184279_1.fastq.gz options:
{'qc_only': True, 'version': '0.9.0', 'seq_len_req': 35, 'index1_file': None, 'trim_tail': 0, 'report_output_folder': '/home/bg/kate/AfterQC/PE_reads/', 'trim_pair_same': True, 'no_correction': False, 'debubble_dir': 'debubble', 'barcode_flag': 'barcode', 'read2_file': 'SRR3184279_2.fastq.gz', 'barcode_length': 12, 'trim_tail2': 0, 'unqualified_base_limit': 60, 'allow_mismatch_in_poly': 2, 'read2_flag': '_2', 'store_overlap': False, 'debubble': False, 'read1_flag': '_1', 'index2_flag': 'I2', 'draw': True, 'index1_flag': 'I1', 'mask_mismatch': False, 'barcode': False, 'overlap_output_folder': None, 'barcode_verify': 'CAGTA', 'index2_file': None, 'qualified_quality_phred': 15, 'trim_front': 9, 'good_output_folder': 'good', 'poly_size_limit': 35, 'n_base_limit': 5, 'qc_sample': 200000, 'trim_front2': 9, 'no_overlap': False, 'input_dir': None, 'read1_file': 'SRR3184279_1.fastq.gz', 'qc_kmer': 8, 'bad_output_folder': None}

Traceback (most recent call last):
  File "/home/bg/soft/AfterQC-0.9.0/after.py", line 221, in <module>
    main()
  File "/home/bg/soft/AfterQC-0.9.0/after.py", line 215, in main
    processOptions(options)
  File "/home/bg/soft/AfterQC-0.9.0/after.py", line 171, in processOptions
    filter.run()
  File "/home/bg/soft/AfterQC-0.9.0/preprocesser.py", line 709, in run
    stat_file = open(os.path.join(qc_dir, "report.json"), "w")
IOError: [Errno 20] Not a directory: '/home/bg/kate/AfterQC/PE_reads/SRR3184279_1.fastq.gz/report.json'

This error occurs if I set the -r dir equal to the dir, where I run AfterQC from.

serge2016 avatar Mar 17 '17 13:03 serge2016

@serge2016 this issue is because of v0.9.0 need to create a folder same as the R1 fastq file name, so it will conflict with the fastq file name if $(pwd) is specified as report_output_folder.

I believe with last commit, this issue is gone.

sfchen avatar Mar 17 '17 14:03 sfchen

I just released v0.9.1. You can have a try with the new feature described above.

sfchen avatar Mar 17 '17 14:03 sfchen

Now previous behavior is changed to more predictable:) Thank you! But I still think about the variant when we specify -1 and -2 options: is this mode we have only one sample, so we can specify the full output name for the report.

I simply want to use your tool inside CWL environment, and it is easier to do if it is possible to specify output filenames independently from input filenames.

serge2016 avatar Mar 20 '17 12:03 serge2016