minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

name is not defined in paftools.js gff2bed

Open johnomics opened this issue 5 years ago • 6 comments

Thank you for all your excellent work on minimap2, we use it every day.

I'm trying to convert the NCBI GRCh38 RefSeq annotation to BED format for aligning with minimap2 using paftools.js gff2bed. As per your advice, I'm using the no_alt_analysis GRCh38, and have got the full_analysis_set GFF and GTF from the same folder:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
ftp://ftp.ncbi.nlm.nih.gov//genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gff.gz
ftp://ftp.ncbi.nlm.nih.gov//genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gtf.gz

I get the following error when running gff2bed, with the GTF or GFF (minimap2 v2.17 release):

$ paftools.js gff2bed -j GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gtf
/mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:1593: ReferenceError: name is not defined
			exons.push([t[0], t[3], t[4], t[6], id, type, name, tname]);
                                                 ^
ReferenceError: name is not defined
    at paf_gff2bed (/mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:1593:50)
    at main (/mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:2517:29)
    at /mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:2534:1

The name variable used at line 1593 is set in the if statements at lines 1567 and 1574, but it is not initialised; instead, a gname variable is initialised at line 1562 but does not appear to be used.

If I change the name variable to gname, the command works, but I only ever get N/A for gene names; the NCBI annotations have gene_id and gene, but not gene_name. However, changing gene_name to gene_id or gene, or adding additional else if statements to check for gene_id or gene, doesn't work either.

Please could you look into this? Should I be using a different annotation? Or is there a fix that will include the NCBI gene names? Many thanks.

johnomics avatar Jun 11 '19 12:06 johnomics

Please try the latest paftools. It should have resolved the issue.

lh3 avatar Jun 11 '19 13:06 lh3

Thanks for the quick response. This works for the GTF, so I can continue with that, but just to let you know, it doesn't work with the GFF (maybe a separate issue?):

$ paftools.js gff2bed -j GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gff
chr1	12227	12612	NR_046018.2|misc_RNA|N/A	1000	+
chr1	12721	13220	NR_046018.2|misc_RNA|N/A	1000	+
chr1	14829	14969	NR_024540.1|misc_RNA|N/A	1000	-
chr1	15038	15795	NR_024540.1|misc_RNA|N/A	1000	-
chr1	15947	16606	NR_024540.1|misc_RNA|N/A	1000	-
chr1	16765	16857	NR_024540.1|misc_RNA|N/A	1000	-
chr1	17055	17232	NR_024540.1|misc_RNA|N/A	1000	-
chr1	17368	17605	NR_024540.1|misc_RNA|N/A	1000	-
chr1	17742	17914	NR_024540.1|misc_RNA|N/A	1000	-
chr1	18061	18267	NR_024540.1|misc_RNA|N/A	1000	-
chr1	18366	24737	NR_024540.1|misc_RNA|N/A	1000	-
chr1	24891	29320	NR_024540.1|misc_RNA|N/A	1000	-
/mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:1578: Error: No transcript_id
		if (id == null) throw Error("No transcript_id");
                        ^
Error: No transcript_id
    at Error (<anonymous>)
    at paf_gff2bed (/mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:1578:25)
    at main (/mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:2518:29)
    at /mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:2535:1

johnomics avatar Jun 11 '19 13:06 johnomics

Then use GTF. I think NCBI GFF3 is problematic more or less, and is inconsistent with the corresponding GTF. Gencode/ensembl GTF and GFF3 pretty much have the same information.

lh3 avatar Jun 11 '19 13:06 lh3

I am reopening this issue in case I may come back to it and make further improvement for NCBI GFF3.

lh3 avatar Jun 11 '19 15:06 lh3

Please try the latest paftools. It should have resolved the issue.

I found the GTF of human and mouse from ENSEMBL all have gene_id and gene_name, but some genes of other species (GFF from ENSEMBL) have gene_id attribute, but no gene_name attribute. How did you fix this problem, just ignore these genes which have "gene_id" attribute but not have "gene_name" attribute in the bam file? or use gene_id or something instead of gene_name?

niehu2018 avatar Jul 30 '19 15:07 niehu2018

I am still getting the original "...ReferenceError: name is not defined..." as above with minimap2 2.17-r941 (latest version of paftools.js I assume). I'm trying to use the --junc-bed option and only have the gtf.

akshayMpatel avatar Mar 27 '21 13:03 akshayMpatel