pypiper proposed change to output file naming

Currently, pypiper output files start with the pipeline name and then the file type, like PIPE_log.md and PIPE_profile.md, etc. (see docs)

I did it this way to make it so you could find stuff by multiple pipelines easier. But the thing is, I've realized that 95% of the time we're running only a single pipeline, ad it's annoying to have 4 files all starting with PIPE_* for tab completion.

PIPE_log.md PIPE_profile.md PIPE_status.flag PIPE_commands.md

instead, these could be:

log_PIPE.md profile_PIPE.md status_PIPE.flag

etc. The advantage being tab completion is now much easier for folders running a single pipeline, which is most of the time.

But this is kind of a big change. Any opposed? Ideas?

Oct 08 '16 01:10 nsheff

I am not favorable to that change. The advantage you mention is tiny compared to how disruptive that change could be.

Oct 11 '16 14:10 ghost

Ok, the advantage is only relevant if you are going through output files a lot, which you may not do, but I do.

The disruption would just be if scripts rely on that naming scheme, but I can't think of many... maybe just flagCheck.sh. So it doesn't seem very disruptive to me... nothing reads log files, profile files, or command files. so for those the file names can be changed without effect.

Oct 11 '16 14:10 nsheff

Well, I do look at log files. And indeed it might not be as disruptive. I am not favorable for that it potentially affects lots of undocumented scripts and projects without bringing any significant improvements to what Pypiper really is about.

See what other users have to say on that. I would comply to the general consensus.

Oct 11 '16 14:10 ghost

Well, if anyone has examples of scripts that rely on this naming scheme, let us know.

The only one I can think of is the checks for the flags, which are mostly programmatic; like flagCheck and looper also checks for flags. These would be very easy to change.

I can't think of anything currently checking the log, profile, or commands files, so I think it wouldn't be too bad but maybe I'm not thinking of something.

the tab-completion I think woudl really help for general navigation; it saves a complete step because you can just say "l " for the log file instead of R to get RRBS, and then l to get log.md, see? It starts to add up when you're checking lots of log files in a project running only 1 pipeline, which is 95% of the time.

Oct 11 '16 14:10 nsheff

I have no preference for either, you can always use some sort of regex to get whatever files, no? The find command is super powerful, I use it all the time e.g.

find project/results_pipeline -iname "*log*.md*" | grep something | xargs less

Nov 04 '16 10:11 afrendeiro

of course, but I think you're missing the point... I'm just talking about looking at a single sample, and saving the split second of an extra tab-completion step, because everything starts with the same string. typing out a 'find' command is already way overkill for the use case I'm trying to describe.

Dec 20 '16 21:12 nsheff

One issue: re-runs of a currently completed pipeline would re-run everything one time because they wouldn't find the new complete flag name.

Dec 20 '16 21:12 nsheff

I think I will go ahead with it but leave the flag files unchanged, so it shouldn't disrupt anything. If anyone feels strongly that this is a bad idea, speak now! @cdietzgit you were opposed to this, are you still?

May 24 '17 14:05 nsheff

pypiper pypiper copied to clipboard

proposed change to output file naming

pypiper
pypiper copied to clipboard