metaseq icon indicating copy to clipboard operation
metaseq copied to clipboard

genomic_signal

Open kellermac opened this issue 8 years ago • 10 comments

Hello again. I am making an average plot of Bigwig (signal) files and a gff3 reference file (midpoint of binding sites). I have succesfully used this script before, but when I substitute this new gff3 file, I receive the following error message: $ python hts.py Traceback (most recent call last): File "hts.py", line 49, in processes=processes) File "/usr/local/lib/python2.7/dist-packages/metaseq/_genomic_signal.py", line 122, in array chunksize=chunksize, **kwargs) File "/usr/local/lib/python2.7/dist-packages/metaseq/array_helpers.py", line 371, in _array_parallel chunks = list(chunker(genelist, chunksize)) File "/usr/local/lib/python2.7/dist-packages/metaseq/helpers.py", line 27, in chunker x.append(f.next()) File "pybedtools/cbedtools.pyx", line 787, in pybedtools.cbedtools.IntervalIterator.next (pybedtools/cbedtools.cxx:11136) File "pybedtools/cbedtools.pyx", line 697, in pybedtools.cbedtools.create_interval_from_list (pybedtools/cbedtools.cxx:9933) pybedtools.cbedtools.MalformedBedLineError: Start is greater than stop

The final line "MalformedBedLineError: Start is greater than stop". Indicated to me that my Gff3 reference file was malformed. So I verified the start/ stop positions of that file and found them to be correct (ie start is less than stop). Any advice would be greatly appreciated. Thanks for all the help!

kellermac avatar Oct 26 '16 23:10 kellermac

Tough to figure anything out without the input files. Can you provide an example of a gff file that causes this error?

daler avatar Oct 26 '16 23:10 daler

shortcis.txt

kellermac avatar Oct 27 '16 15:10 kellermac

On this file, I'm able to do this without errors:

import pybedtools
for i in pybedtools.BedTool('shortcis.txt'):
    print(i.chrom, i.start, i.stop)

Are you able to do the same on the full file?

daler avatar Oct 27 '16 15:10 daler

Yes, here is the first line of the output: (u'chr2R', 702320, 702325)

not sure what the significance of the 'u' is. I'll attach the full gff3. htscis.txt

kellermac avatar Oct 27 '16 15:10 kellermac

Hmm, seems fine. The u is just Python 2 telling you it's unicode.

If you convert to BED do you get the same error? It could be the lack of proper GFF/GTF formatting (e.g., no actual attributes in the attributes field) that's causing the problem.

To convert:

awk -F "\t" '{OFS="\t"; print $1, $4-1, $5}' htscis.txt > htscis.bed

daler avatar Oct 27 '16 16:10 daler

Can gffutils make a database from BED files? I have used the same formatting before (with blank fields). But thank you for the sanity check.

kellermac avatar Oct 27 '16 16:10 kellermac

Nope, it can't make a db from bed files.

Would you send a minimal complete example of the data and code that reproduces the problem? I'm going to need to look more closely at it for debugging.

daler avatar Oct 27 '16 16:10 daler

python script: htshort.txt

signal file: (I was using Bigwigs but switched to gff3 in order to send it on this message board.

h3k27ac.txt

Window file: I have sent previously as htscis.txt (it is gff3)

kellermac avatar Oct 27 '16 17:10 kellermac

Are you sure you can reproduce the error with these files? The h3k27ac.txt one doesn't have the same chromosome names as htscis.txt, and there's a lot of extra stuff in htshort.txt, including paths to directories and a database that I'm not sure are relevant, that prevents me from running it.

daler avatar Oct 27 '16 18:10 daler

Ah I am sorry. Oddly enough it did reproduce the same error message. The chromosome name will definately cause a subsequent error. As I am sure you know the fly community is inconsistent about whether or not to include a 'chr' prefix. The htscis file is cis interaction regions so they are all on 2R (will the absense of the other chromosomes affect compatability?). I removed the chr prefix: htscis.txt

I left the paths in there just to make it obvious that this is where paths should go. I used the db_dir and data_dir conventions from your example 1. You could replace both of these paths with your download directory. (the database was made from the htscis.gff3 file using gffutils) Here is the script I use to make databases: db.txt

kellermac avatar Oct 27 '16 20:10 kellermac