arcs-make option to specify prefix
Hello,
I'm using arcs-make and there seems not to be an option to specify an output file prefix, whereas arcs standalone has the -b option to do so. Is it possible to incorporate this into arcs-make ?
Hi @pdimens,
Yes, you are right - arcs-make decides on the prefix of the files based on the input scaffolds and parameters.
Are you thinking that you would like all output files to have the specified prefix, or would it be sufficient to have the final scaffolds file soft-linked to a file name with the specified prefix?
Thank you for your interest in ARCS! Lauren
Thanks for writing back so quickly! For consistency (as I see it), a prefix would specify the output names (no softlinks), and the parameters, which is what the name is derived from currently, would be preserved in the log file. The additions of things like .tigmint. etc. that come with file processing would still be in place though!
Yes, for sure. Thanks for that suggestion (I know that the current prefix names can be quite long...) - I'll take a look at making that adjustment in the next week or so and update you when I've pushed the changes!
Thanks! FWIW, I came across this as I've been trying to add it to the forthcoming assembly workflow in harpy
Hi @pdimens,
So I spent some time looking at arcs-make to see what tweaks that could be made. Unfortunately, adding that prefix parameter would not be as straightforward as I hoped for a couple reasons:
- The current prefixes are used to run the appropriate commands for the various modes (ARCS, ARCS-long, ARKS, ARKS-long), so if the prefix can be anything, it interferes with that. I do agree that in hindsight, that was not a great design decision
- Additionally, I would want any change to be backwards compatible (ie. the final output file to have the same name as before) - this is extra tricky because of the previous point.
I could implement that part of the prefix could be user-specified, but I doubt that this would be very useful? Ie. These are the output file names for ARCS and ARCS-long:
arcs: $(draft)_c$c_m$m_s$s_r$r_e$e_z$z_l$l_a$a.scaffolds.fa
arcs-long: $(draft)_c$c_m$m_cut$(cut)_s$s_r$r_e$e_z$z_l$l_a$a.scaffolds.fa
Up to the $(cut)_s$s part is needed to run the right rules, but we could allow the remainder to be user-specified
arcs: $(draft)_c$c_m$m_s$s_$(prefix).scaffolds.fa
arcs-long: $(draft)_c$c_m$m_cut$(cut)_s$s_$(prefix).scaffolds.fa
But again, this doesn't seem like a great improvement?
Or, I could allow the $(draft) part to be user-specified (ie. user-specified prefix if supplied, else $(draft))? The name is still pretty long, but at least then you could get the file to start with your preferred string.
The alternative, like I mentioned before, would be to soft-link the final scaffolds file to a naming scheme that you prefer.
Here are some ideas:
- intermediate steps create files with the specified
$prefix, but symlinks are made at each step with the existing naming structure so that those files exist with the expected names for the pipeline - the final output file is the one that's named with the prefix
Thanks @pdimens! I think we're thinking along the same lines with your second suggestion and my soft-link suggestion. If we just have a soft-link to the file with your the user-specified prefix for the final scaffolds, you can treat the file just the same as if the pipeline had that as its output.
Honestly, I think that would be the best option - I think that having soft-links for every output file would get quite messy, and with the current structure of the pipeline, keeping the naming scheme as-is for the various steps would be much more straightforward (we don't have a dedicated developer for ARCS anymore, so we have limited bandwidth to make changes).
I get that. I appreciate you taking the time to address this!
Always happy to get suggestions from our users! I'll try to make those changes in the next week or so - will keep you updated.
Hi again @pdimens,
I added an option prefix in PR #172 - would you be willing to try it out to see if it works for you? You wouldn't need to recompile ARCS if you don't want to - simply cloning that branch and adjusting your PATH to find that arcs-make versions first would suffice.
You invoke the option by just adding prefix=my_prefix to your arcs-make command. If specified, a soft link named <prefix>.scaffolds.fa to the final scaffolds file will be created.
If all is good, I'll make a new release including that change.
@lcoombe LGTM!
Awesome! I'll do a new release integrating that change this week :)
I just released v1.2.8 which includes this new option! It should be updated on conda in the next day or so.
Awesome, thanks! FWIW, arcs (and by extension tigmint/links) have been added to Harpy as of v1.10 :grin:
That's great to hear! :D