asgart icon indicating copy to clipboard operation
asgart copied to clipboard

Troubleshooting ruby script for using --min-identity when plotting

Open lawson-a-m opened this issue 1 year ago • 5 comments

Hi there, I'm hoping you don't mind helping me troubleshoot how to use asgart-plot to only plot the identified duplications filtered for a minimum % identity.

I successfully used asgart-extract to modify the json output to include the sequence from the FASTA, but am running into issues with the ruby script asgart-align.rb you attached in issue #4

The output I am getting is:

Aligning combo-x_test.json
0/59
asgart-align.rb:45:in `block (2 levels) in <main>': undefined method `size' for nil (NoMethodError)

    raise "Error: out[0].size != out[1].size" if out[0].size != out[1].size
                                                       ^^^^^
	from asgart-align.rb:26:in `each'
	from asgart-align.rb:26:in `block in <main>'
	from asgart-align.rb:24:in `each'
	from asgart-align.rb:24:in `each_with_index'
	from asgart-align.rb:24:in `<main>'

From what I can tell, the issue is the output json asgart is giving has different lengths from each of the duplicons, the left and right do not match in length. I am only interested in the highest identity duplicates and expect that in these high-identity duplicates the actual length of each arm should be very close if not identical. Do you have any suggestions for what may be happening to cause this, or a workaround? Happy to send the json file if it would help.

Thanks, I am really excited that it seems to be working well, and does plot the data correctly if I don't restrict based on identity.

lawson-a-m avatar May 07 '24 21:05 lawson-a-m

Cam you pleaseattach your fasta & json files so that I can take a deeper look?

delehef avatar May 08 '24 09:05 delehef

combo-x_test.json.gz Here is the json file, which had asgart-extract ran to insert the sequence! The fasta it was generated from is bigger than 25MB even when gzipped so github won't let me upload - if you need that (it is only a two-line fasta of two concatenated X chromosomes) let me know an alternate way to send to you.

Thanks!

lawson-a-m avatar May 08 '24 18:05 lawson-a-m

The good news is that I run your file on my machine, and it works fine.

The bad news is that it seems mafft is returning garbage on yours. Would you mind removing the >2 /dev/null line 32, so that we can see if mafft outputs some errors?

delehef avatar May 09 '24 07:05 delehef

Ah, I see! I have MAFFT loaded as a module and is version 7.310 if that is potentially contributing. Here is the output when I remove ">2/dev/null" from line 32:

asgart-align.rb: --> asgart-align.rb expected a closing delimiter for the %x or backtick string 24 result["families"].each_with_index do |family, i| 26 family.each do |sd|

32 mafft_out = %x(#{MAFFT} --auto #{fasta.path} 33 out, frag = [], "" 42 out << frag unless frag.empty? 67 end 68 end

asgart-align.rb:74: unterminated string meets end of file (SyntaxError) asgart-align.rb:74: syntax error, unexpected end-of-input, expecting `end' or dummy end

On Thu, May 9, 2024 at 3:42 AM delehef @.***> wrote:

The good news is that I run your file on my machine, and it works fine.

The bad news is that it seems mafft is returning garbage on yours. Would you mind removing the >2 /dev/null line 32, so that we can see if mafft outputs some errors?

— Reply to this email directly, view it on GitHub https://github.com/delehef/asgart/issues/6#issuecomment-2102129295, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQZ3CT4IZV42BOP2PRGYUJLZBMSEZAVCNFSM6AAAAABHLYY5QOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBSGEZDSMRZGU . You are receiving this because you authored the thread.Message ID: @.***>

lawson-a-m avatar May 09 '24 14:05 lawson-a-m

You are missing a closing parentheses at the end of line 32.

delehef avatar May 09 '24 15:05 delehef

Hi @delehef Sorry for the really late reply! Even after fixing the missing parenthesis I still was having issues, but it was determined that it was due to something with the mafft binaries on the HPC I use.

In case others have similar problems with mafft, the fix was for our IT to install MAFFT as a module, and when i loaded both MAFFT and ruby as modules with lmod it now works beautifully.

Thanks for the great program! I will close this issue as it's now resolved, but was also wondering if it would be possible to implement a --max-identity flag when plotting. I would love to be able to plot a range rather than only above a minimum % identity - i.e. all duplicons from 70-80% identity, but not above 80% or <70% for example. Thanks again for the help!

lawson-a-m avatar Sep 06 '24 13:09 lawson-a-m

if it would be possible to implement a --max-identity flag when plotting

That's a nice idea, I just added it!

delehef avatar Sep 11 '24 19:09 delehef