Finder icon indicating copy to clipboard operation
Finder copied to clipboard

Mapping rate in round 2 0.0

Open kmkocot opened this issue 3 years ago • 9 comments

I'm having an issue with the pipeline that I can't quite narrow down. Here's what's printed to the screen: wirenia@wirenia:~/Desktop/2021-04-21_Hanleya_finder$ ./run_finder.sh cat: output/alignments/Hanleya_hanleyi_mantle_round1_SJ.out.tab: No such file or directory cat: output/alignments/Hanleya_hanleyi_mantle_round2_SJ.out.tab: No such file or directory mv: cannot stat 'output/alignments/Hanleya_hanleyi_mantle_final_Log.final.out': No such file or directory cat: output/alignments/Hanleya_hanleyi_mantle_round3_SJ.out.tab: No such file or directory cat: output/alignments/Hanleya_hanleyi_mantle_round4_SJ.out.tab: No such file or directory samtools index: "output/alignments/Hanleya_hanleyi_mantle_final.sortedByCoord.out.bam" is in a format that cannot be usefully indexed samtools index: "output/alignments/Hanleya_hanleyi_mantle_final.sortedByCoord.out.bam" is in a format that cannot be usefully indexed [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). Can not open output/alignments/Hanleya_hanleyi_mantle_final.sortedByCoord.out.bam. mv: cannot stat 'output/assemblies_psiclass_modified/combined/psiclass_output_sample_0.gtf': No such file or directory mv: cannot stat 'output/assemblies_psiclass_modified/combined/psiclass_output_vote.gtf': No such file or directory Traceback (most recent call last): File "/home/wirenia/finder/finder", line 641, in <module> main() File "/home/wirenia/finder/finder", line 602, in main orchestrateGeneModelPrediction(options,logger_proxy,logging_mutex) File "/home/wirenia/finder/finder", line 415, in orchestrateGeneModelPrediction findTranscriptsInEachSampleNotReportedInCombinedAnnotations(options,logger_proxy,logging_mutex) File "/home/wirenia/finder/scripts/findTranscriptsInEachSampleNotReportedInCombinedAnnotations.py", line 17, in findTranscriptsInEachSampleNotReportedInCombinedAnnotations combined_transcript_info=readAllTranscriptsFromGTFFileInParallel([combined_gtf_filename,"combined","combined"])[0] File "/home/wirenia/finder/scripts/fileReadWriteOperations.py", line 232, in readAllTranscriptsFromGTFFileInParallel fhr=open(gtf_filename,"r") FileNotFoundError: [Errno 2] No such file or directory: 'output/assemblies_psiclass_modified/combined/combined.gtf'

progress.log says: 2021-04-23 00:46:46,664 - finder - INFO - Software paths have been set 2021-04-23 00:47:07,129 - finder - INFO - Generating STAR index 2021-04-23 01:09:17,755 - finder - INFO - STAR index generation complete 2021-04-23 01:09:17,786 - finder - INFO - Generating OLego index 2021-04-23 02:16:37,614 - finder - INFO - OLego index built 2021-04-23 02:16:37,648 - finder - INFO - validateCommandLineArguments execution successful 2021-04-23 02:16:37,676 - finder - INFO - Metadata information created 2021-04-23 02:16:37,676 - finder - INFO - readMetaDataFile execution successful 2021-04-23 02:16:37,689 - finder - INFO - expandGzippedFiles execution successful 2021-04-23 02:16:37,717 - finder - INFO - Starting FINDER from None checkpoint 2021-04-23 02:16:37,718 - finder - INFO - Program params - Namespace(addUTR=True, checkpoint=None, compressed_data_files=None, cpu='35', error_corrected_raw_data='output/raw_data_error_corrected', exonerate_gff3='protein_evidence.gff3', files_for_ncrna={'mature_ATGC': '/home/wirenia/finder/dep/mature_ATGC.fa'}, final_GTF_files='output/final_GTF_files', genome='final.purged.fa.PolcaCorrected.fa.masked', genome_dir_olego='output/indices/olego_index', genome_dir_star='output/indices/star_index_without_transcriptome', indices='output/indices', md=None, metadatafile='metadata.csv', mrna_md={'mantle': {'Hanleya_hanleyi_mantle': {'bioproject': 'DUMMY', 'condition': 'mantle', 'Date': '1/12/17', 'Ended': 'PE', 'desc': 'cDNA;Illumina HiSeq 2500', 'read_length': '101', 'error_corrected': 0, 'location_directory': '/home/wirenia/Desktop/2021-04-21_Hanleya_finder', 'downloaded_from_NCBI': 0}}}, no_cleanup=False, output_assemblies_psiclass_terminal_exon_length_modified='output/assemblies_psiclass_modified', output_braker='output/braker', output_directory='output', output_fasta_N_removed='output/raw_fasta_N_removed', output_rcorrector=None, output_sample_fastq=None, output_star='output/alignments', paired_end_adapterfile=None, perform_post_completion_data_cleanup=False, preserve_raw_input_data=False, protein='protein_evidence.fas', raw_data_downloaded_from_NCBI='output/raw_data_downloaded_from_NCBI', record_time={}, run_tests=False, single_end_adapterfile=None, skip_cpd=False, smrna_md={}, softwares={'psiclass': '/home/wirenia/finder/dep/psiclass_terminal_exon_length_modified/psiclass', 'junc': '/home/wirenia/finder/dep/psiclass_terminal_exon_length_modified//junc', 'subexon-info': '/home/wirenia/finder/dep/psiclass_terminal_exon_length_modified//subexon-info', 'addXS': '/home/wirenia/finder/dep/psiclass_terminal_exon_length_modified//addXS', 'fastq-sample': '/home/wirenia/finder/dep/fastq-tools-0.8/scripts/fastq-sample', 'download_and_dump_fastq_from_SRA': '/home/wirenia/finder/dep/../utils/downloadAndDumpFastqFromSRA.py', 'transferGenomicNucleotideCountsToTranscriptome': '/home/wirenia/finder/dep/../scripts/transferGenomicNucleotideCountsToTranscriptome.py', 'find_exonic_troughs': '/home/wirenia/finder/dep/../scripts/find_exonic_troughs.R', 'olego': '/home/wirenia/finder/dep/olego/olego', 'olegoindex': '/home/wirenia/finder/dep/olego/olegoindex', 'mergePEsam.pl': '/home/wirenia/finder/dep/olego/mergePEsam.pl', 'xa2multi': '/home/wirenia/finder/dep/olego/xa2multi.pl', 'gmst': '/home/wirenia/finder/dep/gmst.pl', 'prodigal': '/home/wirenia/finder/dep/Prodigal/prodigal', 'canon-gff3': '/home/wirenia/finder/dep/canon-gff3', 'convert_exonerate_gff_to_gtf': '/home/wirenia/finder/dep/../utils/convert_exonerate_gff_to_gtf.py', 'augustus_main_dir': '/home/wirenia/finder/dep/Augustus', 'braker': '/home/wirenia/finder/dep/BRAKER/scripts/braker.pl', 'GENEMARK_PATH': '/home/wirenia/finder/dep/gmes_linux_64', 'AUGUSTUS_CONFIG_PATH': 'output/braker/Augustus/config', 'AUGUSTUS_BIN_PATH': 'output/braker/Augustus/bin', 'AUGUSTUS_SCRIPTS_PATH': 'output/braker/Augustus/scripts', 'GUSHR_PATH': '/home/wirenia/finder/dep/GUSHR'}, space_saved=None, temp_dir='output/temp', total_space=None, verbose=3) 2021-04-23 02:16:37,719 - finder - INFO - Started processing data for mantle 2021-04-23 02:16:37,731 - finder - INFO - Downloading missing data from NCBI started 2021-04-23 02:16:37,731 - finder - INFO - Downloading missing data from NCBI finished for mantle 2021-04-23 02:16:38,384 - finder - INFO - STAR Round1 run for Hanleya_hanleyi_mantle completed 2021-04-23 02:16:38,385 - finder - INFO - Mapping of reads for round1 completed for mantle 2021-04-23 02:16:38,629 - finder - INFO - Selecting high confidence junctions after round1 mapping completed for mantle 2021-04-23 02:16:38,665 - finder - INFO - Raw read download from NCBI cleanup completed for mantle 2021-04-23 02:16:38,686 - finder - INFO - STAR Round2 run for Hanleya_hanleyi_mantle completed 2021-04-23 02:16:38,687 - finder - INFO - Mapping of reads for round2 completed for mantle 2021-04-23 02:16:38,888 - finder - INFO - Selecting high confidence junctions after round2 mapping completed for mantle 2021-04-23 02:16:38,889 - finder - INFO - Mapping rate in round2 mantle Hanleya_hanleyi_mantle 0.0 2021-04-23 02:16:38,890 - finder - INFO - Resorting to alignment with relaxed parameters for these runs due to poor mapping Hanleya_hanleyi_mantle 2021-04-23 02:16:38,975 - finder - INFO - STAR relaxed alignment run for Hanleya_hanleyi_mantle completed 2021-04-23 02:16:38,976 - finder - INFO - Mapping of reads for round3 completed for mantle 2021-04-23 02:16:39,005 - finder - INFO - Selecting high confidence junctions after round3 mapping completed for mantle 2021-04-23 02:16:39,006 - finder - INFO - Mapping of reads for round4 completed for mantle 2021-04-23 02:16:39,033 - finder - INFO - Selecting high confidence junctions after round4 mapping completed for mantle 2021-04-23 02:16:39,034 - finder - INFO - Mapping with OLego for micro-exon detection completed for mantle 2021-04-23 02:16:39,453 - finder - INFO - Merging of alignments from all rounds of mapping completed for mantle 2021-04-23 02:16:39,518 - finder - INFO - Removing intermediate alignment files completed for mantle 2021-04-23 02:16:39,528 - finder - INFO - Mapping of all runs completed for mantle 2021-04-23 02:16:40,360 - finder - INFO - Information collection about alignments completed 2021-04-23 02:16:43,663 - finder - INFO - Generation of assemblies with PsiCLASS completed

The onlly files in /output/temp/ are download_these_runs and finding_finder

The *relaxed.error file says: `EXITING because of FATAL ERROR: could not open genome file output/indices/star_index_without_transcriptome//genomeParameters.txt SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions

Apr 23 02:16:38 ...... FATAL ERROR, exiting`

My genome file name is correctly specified in the input script.

Any ideas?

Thanks! Kevin

kmkocot avatar Apr 23 '21 12:04 kmkocot

Hello @kmkocot,

Thank you for posting this issue. When you reranfinder did you delete the previous output directory? I think the STAR index generation step failed even with the new parameters. Could you remove the output directory and restart the run? Please let me know if you hit a snag.

Thank you.

sagnikbanerjee15 avatar Apr 23 '21 13:04 sagnikbanerjee15

Hi @sagnikbanerjee15! Thanks so much for your quick reply! I did delete the output folder and the _STARtmp folder before re-running. I tried again to be doubly sure but had the same result.

My metadata.csv file looks like this: BioProject,SRA Accession,Tissues,Description,Date,Read Length (bp),Ended,RNA Seq,process,Location DUMMY,Hanleya_hanleyi_mantle,mantle,cDNA;Illumina HiSeq 2500,1/12/17,101,PE,1,1,/home/wirenia/Desktop/2021-04-21_Hanleya_finder

The command I'm running looks like this: finder --cpu 35 --metadatafile metadata.csv --genome final.purged.fa.PolcaCorrected.fa.masked --protein protein_evidence.fas --exonerate_gff3 protein_evidence.gff3 --addUTR --verbose 3 --output_directory output

The contents of that folder look like this: wirenia@wirenia:~/Desktop/2021-04-21_Hanleya_finder$ ll total 15760948 drwxrwxr-x 4 wirenia wirenia 4096 Apr 23 11:05 ./ drwxr-xr-x 16 wirenia wirenia 49152 Apr 22 20:07 ../ -rw-r--r-- 1 wirenia wirenia 2829046558 Apr 22 14:49 final.purged.fa.PolcaCorrected.fa.masked -rw-r--r-- 1 wirenia wirenia 2468948 Apr 23 09:13 final.purged.fa.PolcaCorrected.fa.masked.fai -rw-rw-r-- 1 wirenia wirenia 6593405008 Apr 23 00:40 Hanleya_hanleyi_mantle_1.fastq -rw-rw-r-- 1 wirenia wirenia 6589808926 Apr 23 00:41 Hanleya_hanleyi_mantle_2.fastq -rwxrwxr-x 1 wirenia wirenia 228 Apr 23 00:44 metadata.csv* -rwxrwxr-x 1 wirenia wirenia 176 Apr 23 00:41 metadata_SRA.csv* drwxrwxr-x 8 wirenia wirenia 4096 Apr 23 10:45 output/ -rw-rw-r-- 1 wirenia wirenia 73716599 Apr 22 14:49 protein_evidence.fas -rw-rw-r-- 1 wirenia wirenia 50655822 Apr 22 14:49 protein_evidence.gff3 -rwxrwxr-x 1 wirenia wirenia 310 Apr 22 14:58 run_finder.sh* -rwxrwxr-x 1 wirenia wirenia 243 Apr 22 14:58 run_finder.sh~* drwx------ 2 wirenia wirenia 4096 Apr 23 09:13 _STARtmp/

Here's what the alignments folder looks like: wirenia@wirenia:~/Desktop/2021-04-21_Hanleya_finder/output/alignments$ ll total 40 drwxrwxr-x 2 wirenia wirenia 4096 Apr 23 10:45 ./ drwxrwxr-x 8 wirenia wirenia 4096 Apr 23 10:45 ../ -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 Hanleya_hanleyi_mantle_exons -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 Hanleya_hanleyi_mantle_exons.bed -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 Hanleya_hanleyi_mantle_final.sortedByCoord.out.bam -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 Hanleya_hanleyi_mantle_final.sortedByCoord.out.sam -rw-rw-r-- 1 wirenia wirenia 71 Apr 23 10:45 Hanleya_hanleyi_mantle_for_psiclass.bam -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 Hanleya_hanleyi_mantle_for_psiclass.sam -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 Hanleya_hanleyi_mantle_introns -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 Hanleya_hanleyi_mantle_introns.bed -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 Hanleya_hanleyi_mantle_num_exons_in_intron -rw-rw-r-- 1 wirenia wirenia 317 Apr 23 10:45 Hanleya_hanleyi_mantle_relaxed.error -rw-rw-r-- 1 wirenia wirenia 76 Apr 23 10:45 Hanleya_hanleyi_mantle_relaxed.output -rw-rw-r-- 1 wirenia wirenia 317 Apr 23 10:45 Hanleya_hanleyi_mantle_round1.error -rw-rw-r-- 1 wirenia wirenia 76 Apr 23 10:45 Hanleya_hanleyi_mantle_round1.output -rw-rw-r-- 1 wirenia wirenia 317 Apr 23 10:45 Hanleya_hanleyi_mantle_round2.error -rw-rw-r-- 1 wirenia wirenia 76 Apr 23 10:45 Hanleya_hanleyi_mantle_round2.output -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 mantle_round1_and_round2_and_round3_and_round4_SJ.out.tab -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 mantle_round1_and_round2_and_round3_SJ.out.tab -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 mantle_round1_and_round2_SJ.out.tab -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 mantle_round1_SJ.out.tab -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 mantle_round2_SJ.out.tab -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 mantle_round3_SJ.out.tab -rw-rw-r-- 1 wirenia wirenia 0 Apr 23 10:45 mantle_round4_SJ.out.tab -rw-rw-r-- 1 wirenia wirenia 255 Apr 23 10:45 mapping_stats.csv The raw_data_downloaded_from_NCBI is empty but I think that should be the case?

Thanks so much! Kevin

kmkocot avatar Apr 23 '21 16:04 kmkocot

Hi @kmkocot,

Thanks for posting the issue. Could you please attach the metadata.csv file here? Are you using local RNA-Seq samples? Also, could you attach the snapshot of the contents of the output/star_index_without_transcriptome directory? Also, it might be helpful to check output/star_index_without_transcriptome.error to make sure there arent any errors.

Thank you.

sagnikbanerjee15 avatar Apr 23 '21 16:04 sagnikbanerjee15

Hi @sagnikbanerjee15,

Thanks again for the quick reply! Yes, these are local RNA-Seq samples.

The star_index_without_transcriptome.error file's contents are just "Killed". Maybe this is a RAM issue? I have 64 GB but perhaps that's not enough. Is it possible to run STAR remotely and just provide a .bam file? I don't understand the --genome_dir_star flag but it looks like that is related to the genome index and not a precomputed star output?.

Thanks again! Kevin

metadata.csv

kmkocot avatar Apr 23 '21 17:04 kmkocot

Hi @kmkocot,

Thanks for attaching the metadata file. Yes, if the error file reports "Killed" then it could be a RAM issue. finder can take in the STAR index separately. Please generate the STAR index using the following steps.

mkdir star_index_without_transcriptome
CPU=30 #Enter the number of CPUs that you are permitted to use
STAR --runMode genomeGenerate --runThreadN $CPU --genomeDir star_index_without_transcriptome  --genomeFastaFiles final.purged.fa.PolcaCorrected.fa.masked

These commands will generate the STAR index. Then call finder using the following command:

finder --cpu 35 --metadatafile metadata.csv --genome final.purged.fa.PolcaCorrected.fa.masked --genome_dir_star star_index_without_transcriptome --protein protein_evidence.fas --exonerate_gff3 protein_evidence.gff3 --addUTR --verbose 3 --output_directory output 

Also, I noticed that the metadata file contains a single RNA-Seq sample. You will be able to get better gene models by adding in more if you have those. In fact, expression data from closely related species will also work.

Please let me know if you run into any problems.

Thank you.

sagnikbanerjee15 avatar Apr 23 '21 17:04 sagnikbanerjee15

Thanks for your help. I had some more problems and decided it was time to reformat my machine anyway. I'm not running Ubuntu 20.04, have re-installed finder and the dependencies, followed the advice above but I am still having issues.

I ran STAR to prepare the genome with the following command: /home/wirenia/bin/anaconda3/envs/finder_conda_env/bin/STAR --runThreadN 35 --limitGenomeGenerateRAM 63000000000 --runMode genomeGenerate --genomeDir star_genome --genomeFastaFiles final.purged.fa.PolcaCorrected.fa.masked That completed successfully.

The finder command I ran is: finder --cpu 20 --metadatafile metadata.csv --genome final.purged.fa.PolcaCorrected.fa.masked --genome_dir_star star_genome --protein protein_evidence.fas --exonerate_gff3 protein_evidence.gff3 --addUTR --verbose 3 --output_directory output

The progress.log file looked like this: 2021-04-28 15:47:48,821 - finder - INFO - Software paths have been set 2021-04-28 15:48:15,731 - finder - INFO - Generating OLego index 2021-04-28 15:48:15,733 - finder - INFO - OLego index built 2021-04-28 15:48:15,733 - finder - INFO - validateCommandLineArguments execution successful 2021-04-28 15:48:15,741 - finder - INFO - Metadata information created 2021-04-28 15:48:15,741 - finder - INFO - readMetaDataFile execution successful 2021-04-28 15:48:15,742 - finder - INFO - expandGzippedFiles execution successful 2021-04-28 15:48:15,742 - finder - INFO - Starting FINDER from None checkpoint 2021-04-28 15:48:15,742 - finder - INFO - Program params - Namespace(addUTR=True, checkpoint=None, compressed_data_files=None, cpu='20', error_corrected_raw_data='output/raw_data_error_corrected', exonerate_gff3='protein_evidence.gff3', files_for_ncrna={'mature_ATGC': '/home/wirenia/bin/finder/dep/mature_ATGC.fa'}, final_GTF_files='output/final_GTF_files', genome='final.purged.fa.PolcaCorrected.fa.masked', genome_dir_olego='output/indices/olego_index', genome_dir_star='star_genome', indices='output/indices', md=None, metadatafile='metadata.csv', mrna_md={'mantle': {'Hanleya_hanleyi_mantle': {'bioproject': 'DUMMY', 'condition': 'mantle', 'Date': '1/12/17', 'Ended': 'PE', 'desc': 'cDNA;Illumina HiSeq 2500', 'read_length': '101', 'error_corrected': 0, 'location_directory': '/home/wirenia/Desktop/2021-04-21_Hanleya_finder', 'downloaded_from_NCBI': 0}}}, no_cleanup=False, output_assemblies_psiclass_terminal_exon_length_modified='output/assemblies_psiclass_modified', output_braker='output/braker', output_directory='output', output_fasta_N_removed='output/raw_fasta_N_removed', output_rcorrector=None, output_sample_fastq=None, output_star='output/alignments', paired_end_adapterfile=None, perform_post_completion_data_cleanup=False, preserve_raw_input_data=False, protein='protein_evidence.fas', raw_data_downloaded_from_NCBI='output/raw_data_downloaded_from_NCBI', record_time={}, run_tests=False, single_end_adapterfile=None, skip_cpd=False, smrna_md={}, softwares={'psiclass': '/home/wirenia/bin/finder/dep/psiclass_terminal_exon_length_modified/psiclass', 'junc': '/home/wirenia/bin/finder/dep/psiclass_terminal_exon_length_modified//junc', 'subexon-info': '/home/wirenia/bin/finder/dep/psiclass_terminal_exon_length_modified//subexon-info', 'addXS': '/home/wirenia/bin/finder/dep/psiclass_terminal_exon_length_modified//addXS', 'fastq-sample': '/home/wirenia/bin/finder/dep/fastq-tools-0.8/scripts/fastq-sample', 'download_and_dump_fastq_from_SRA': '/home/wirenia/bin/finder/dep/../utils/downloadAndDumpFastqFromSRA.py', 'transferGenomicNucleotideCountsToTranscriptome': '/home/wirenia/bin/finder/dep/../scripts/transferGenomicNucleotideCountsToTranscriptome.py', 'find_exonic_troughs': '/home/wirenia/bin/finder/dep/../scripts/find_exonic_troughs.R', 'olego': '/home/wirenia/bin/finder/dep/olego/olego', 'olegoindex': '/home/wirenia/bin/finder/dep/olego/olegoindex', 'mergePEsam.pl': '/home/wirenia/bin/finder/dep/olego/mergePEsam.pl', 'xa2multi': '/home/wirenia/bin/finder/dep/olego/xa2multi.pl', 'gmst': '/home/wirenia/bin/finder/dep/gmst.pl', 'prodigal': '/home/wirenia/bin/finder/dep/Prodigal/prodigal', 'canon-gff3': '/home/wirenia/bin/finder/dep/canon-gff3', 'convert_exonerate_gff_to_gtf': '/home/wirenia/bin/finder/dep/../utils/convert_exonerate_gff_to_gtf.py', 'augustus_main_dir': '/home/wirenia/bin/finder/dep/Augustus', 'braker': '/home/wirenia/bin/finder/dep/BRAKER/scripts/braker.pl', 'GENEMARK_PATH': '/home/wirenia/bin/finder/dep/gmes_linux_64', 'AUGUSTUS_CONFIG_PATH': 'output/braker/Augustus/config', 'AUGUSTUS_BIN_PATH': 'output/braker/Augustus/bin', 'AUGUSTUS_SCRIPTS_PATH': 'output/braker/Augustus/scripts', 'GUSHR_PATH': '/home/wirenia/bin/finder/dep/GUSHR'}, space_saved=None, temp_dir='output/temp', total_space=None, verbose=3) 2021-04-28 15:48:15,743 - finder - INFO - Started processing data for mantle 2021-04-28 15:48:15,743 - finder - INFO - Downloading missing data from NCBI started 2021-04-28 15:48:15,743 - finder - INFO - Downloading missing data from NCBI finished for mantle 2021-04-28 15:49:56,710 - finder - INFO - STAR Round1 run for Hanleya_hanleyi_mantle completed 2021-04-28 15:49:56,725 - finder - INFO - Mapping of reads for round1 completed for mantle 2021-04-28 15:49:56,947 - finder - INFO - Selecting high confidence junctions after round1 mapping completed for mantle 2021-04-28 15:49:56,961 - finder - INFO - Raw read download from NCBI cleanup completed for mantle 2021-04-28 15:51:34,474 - finder - INFO - STAR Round2 run for Hanleya_hanleyi_mantle completed 2021-04-28 15:51:34,771 - finder - INFO - Mapping of reads for round2 completed for mantle 2021-04-28 15:51:35,375 - finder - INFO - Selecting high confidence junctions after round2 mapping completed for mantle 2021-04-28 15:51:35,443 - finder - INFO - Mapping rate in round2 mantle Hanleya_hanleyi_mantle 0.0 2021-04-28 15:51:35,450 - finder - INFO - Resorting to alignment with relaxed parameters for these runs due to poor mapping Hanleya_hanleyi_mantle 2021-04-28 15:52:56,577 - finder - INFO - STAR relaxed alignment run for Hanleya_hanleyi_mantle completed 2021-04-28 15:52:56,704 - finder - INFO - Mapping of reads for round3 completed for mantle 2021-04-28 15:52:56,816 - finder - INFO - Selecting high confidence junctions after round3 mapping completed for mantle 2021-04-28 15:52:56,816 - finder - INFO - Mapping of reads for round4 completed for mantle 2021-04-28 15:52:56,822 - finder - INFO - Selecting high confidence junctions after round4 mapping completed for mantle 2021-04-28 15:52:56,823 - finder - INFO - Mapping with OLego for micro-exon detection completed for mantle 2021-04-28 15:52:58,647 - finder - INFO - Merging of alignments from all rounds of mapping completed for mantle 2021-04-28 15:52:58,860 - finder - INFO - Removing intermediate alignment files completed for mantle 2021-04-28 15:52:58,862 - finder - INFO - Mapping of all runs completed for mantle 2021-04-28 15:53:01,324 - finder - INFO - Information collection about alignments completed 2021-04-28 15:53:01,409 - finder - INFO - Generation of assemblies with PsiCLASS completed

Here's the error I'm getting: `wirenia@wirenia:~/Desktop/2021-04-21_Hanleya_finder$ finder --cpu 20 --metadatafile metadata.csv --genome final.purged.fa.PolcaCorrected.fa.masked --genome_dir_star star_genome --protein protein_evidence.fas --exonerate_gff3 protein_evidence.gff3 --addUTR --verbose 3 --output_directory output Apr 28 15:48:15 ..... started STAR run Apr 28 15:48:15 ..... loading genome

EXITING: Did not find the genome in memory, did not remove any genomes from shared memory

Apr 28 15:48:15 ...... FATAL ERROR, exiting Apr 28 15:49:50 ..... started STAR run Apr 28 15:49:50 ..... loading genome Apr 28 15:49:54 ..... started mapping Apr 28 15:49:56 ..... finished mapping Apr 28 15:49:56 ..... finished successfully cat: output/alignments/Hanleya_hanleyi_mantle_round1_SJ.out.tab: No such file or directory Apr 28 15:51:32 ..... started STAR run Apr 28 15:51:32 ..... loading genome

EXITING: Did not find the genome in memory, did not remove any genomes from shared memory

Apr 28 15:51:32 ...... FATAL ERROR, exiting cat: output/alignments/Hanleya_hanleyi_mantle_round2_SJ.out.tab: No such file or directory mv: cannot stat 'output/alignments/Hanleya_hanleyi_mantle_final_Log.final.out': No such file or directory Apr 28 15:52:56 ..... started STAR run Apr 28 15:52:56 ..... loading genome

EXITING: Did not find the genome in memory, did not remove any genomes from shared memory

Apr 28 15:52:56 ...... FATAL ERROR, exiting cat: output/alignments/Hanleya_hanleyi_mantle_round3_SJ.out.tab: No such file or directory cat: output/alignments/Hanleya_hanleyi_mantle_round4_SJ.out.tab: No such file or directory samtools index: "output/alignments/Hanleya_hanleyi_mantle_final.sortedByCoord.out.bam" is in a format that cannot be usefully indexed samtools index: "output/alignments/Hanleya_hanleyi_mantle_final.sortedByCoord.out.bam" is in a format that cannot be usefully indexed sh: 1: /home/wirenia/bin/finder/dep/psiclass_terminal_exon_length_modified//junc: Permission denied sh: 1: /home/wirenia/bin/finder/dep/psiclass_terminal_exon_length_modified//subexon-info: Permission denied mv: cannot stat 'output/assemblies_psiclass_modified/combined/psiclass_output_sample_0.gtf': No such file or directory mv: cannot stat 'output/assemblies_psiclass_modified/combined/psiclass_output_vote.gtf': No such file or directory Traceback (most recent call last): File "/home/wirenia/bin/finder/finder", line 648, in main() File "/home/wirenia/bin/finder/finder", line 609, in main orchestrateGeneModelPrediction(options,logger_proxy,logging_mutex) File "/home/wirenia/bin/finder/finder", line 422, in orchestrateGeneModelPrediction findTranscriptsInEachSampleNotReportedInCombinedAnnotations(options,logger_proxy,logging_mutex) File "/home/wirenia/bin/finder/scripts/findTranscriptsInEachSampleNotReportedInCombinedAnnotations.py", line 17, in findTranscriptsInEachSampleNotReportedInCombinedAnnotations combined_transcript_info=readAllTranscriptsFromGTFFileInParallel([combined_gtf_filename,"combined","combined"])[0] File "/home/wirenia/bin/finder/scripts/fileReadWriteOperations.py", line 232, in readAllTranscriptsFromGTFFileInParallel fhr=open(gtf_filename,"r") FileNotFoundError: [Errno 2] No such file or directory: 'output/assemblies_psiclass_modified/combined/combined.gtf'`

The /output/alignments .error files all look something like this: `BAMoutput.cpp:27:BAMoutput: exiting because of OUTPUT FILE error: could not create output file output/alignments/Hanleya_hanleyi_mantle_round1__STARtmp//BAMsort/19/26 SOLUTION: check that the path exists and you have write permission for this file. Also check ulimit -n and increase it to allow more open files.

Apr 28 15:49:49 ...... FATAL ERROR, exiting`

Any advice would be greatly appreciated.

Best, Kevin

kmkocot avatar Apr 28 '21 22:04 kmkocot

Hi @kmkocot,

Thank you so much for going through the trouble of reformatting your machine. I notice that in the error there appears to be some issues with permissions. This should not have happened since in the install.py script, I had put in commands to execute chmod to give execution permissions to all relevant files. In any case, I will run finder on my end with a one paired ended RNA-Seq sample. Will it be possible for you to share the genome and the RNA-Seq file with me? You can email those to [email protected] or [email protected]. I think if I work with the same files I will be able to troubleshoot faster and get down to the bottom of this quicker.

Thank you.

sagnikbanerjee15 avatar Apr 28 '21 23:04 sagnikbanerjee15

Sent! Thank you again for your help!

kmkocot avatar Apr 30 '21 18:04 kmkocot

Hello there I'm having the same initial problem than Kevin. I'm running : (base) david@Dalhte:/media/david/E/finder_v1.1.0$ run_finder --metadatafile $PWD/Rattus_metadata5.csv --output_directory $PWD/FINDER_test_Rattus --genome $PWD/Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa --organism_model VERT --genemark_path $PWD/gmes_linux_64 --genemark_license $PWD/gm_key_64 --cpu 5 --genome_dir_star $PWD/RN7Genome

Star genom was ran independently before and works for other pipelines such as cellranger-arc

and get : 1.1.0: Pulling from sagnikbanerjee15/finder Digest: sha256:9816d258d2421d4625983c929f508b1f577cfe7ab3bc2042e841647a186c7931 Status: Image is up to date for sagnikbanerjee15/finder:1.1.0 docker.io/sagnikbanerjee15/finder:1.1.0 done Error: input 2 3153: Wrong XA format. mv: cannot stat '/media/david/E/finder_v1.1.0/FINDER_test_Rattus/assemblies_psiclass_modified/combined/psiclass_output_vote.gtf': No such file or directory Traceback (most recent call last): File "/softwares/FINDER/Finder/finder", line 688, in main() File "/softwares/FINDER/Finder/finder", line 649, in main orchestrateGeneModelPrediction( options, logger_proxy, logging_mutex ) File "/softwares/FINDER/Finder/finder", line 461, in orchestrateGeneModelPrediction findTranscriptsInEachSampleNotReportedInCombinedAnnotations( options, logger_proxy, logging_mutex ) File "/softwares/FINDER/Finder/scripts/findTranscriptsInEachSampleNotReportedInCombinedAnnotations.py", line 17, in findTranscriptsInEachSampleNotReportedInCombinedAnnotations combined_transcript_info = readAllTranscriptsFromGTFFileInParallel( [combined_gtf_filename, "combined", "combined"] )[0] File "/softwares/FINDER/Finder/scripts/fileReadWriteOperations.py", line 290, in readAllTranscriptsFromGTFFileInParallel fhr = open( gtf_filename, "r" ) FileNotFoundError: [Errno 2] No such file or directory: '/media/david/E/finder_v1.1.0/FINDER_test_Rattus/assemblies_psiclass_modified/combined/combined.gtf'

I attached the progress.log and the metafile (I used only one RNA-seq so far just for trial but I do have more) Can you help ? Best David progress.log Rattus_metadata5.csv

Dalhte avatar Dec 06 '23 10:12 Dalhte