htsjdk
htsjdk copied to clipboard
IOUtil.getPath fails when the input has spaces.
IOUtil.getPath("some file")
explodes:
Caused by: java.net.URISyntaxException: Illegal character in path at index 4: some file
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parseHierarchical(URI.java:3105)
at java.net.URI$Parser.parse(URI.java:3063)
at java.net.URI.<init>(URI.java:588)
at java.net.URI.create(URI.java:850)
... 25 more
This is due to converting it to a URI first. It should not explode.
I also think we should be using the URI
constructor and not the URI.create
method after reading the javadoc for the latter.
I had a look at this yesterday, I think we can get rid of the error if we check that the passed string is a valid URI (has scheme, no illegal characters, etc.) before invoking URI.create
. If not, then the string is treated as a local filesystem path.
Thoughts?
@jb-adams Hi Jeremy - I think you're on the right track. We have developed a more generalized solution for this in a separate repo, that we use in GATK, in the form of an interface and a concrete type that handles these conversions. It normalizes all inputs to a URI internally, and also has some other advantages. Integrating that into htsjdk is a much a larger task, but you might look at the implementation for inspiration, and there are also a boatload of test cases in there.
Hi @cmnbroad, looks great, I think it would be good to implement something similar in htsjdk. I'm pretty new to developing here, so I was curious, will htsjdk-next-beta replace htsjdk? Which repo are current versions of GATK and Picard using?
@jb-adams GATK and Picard both depend on this repo; htsjdk-next doesn't (yet) have a sufficient critical mass of functionality to warrant switching. The URI classes were designed with the idea that they would migrate their way here at some point, but I'd suggest getting maintainer buy-in before spending much time on it. @lbergelson or @tfenne any thoughts on this ? GATK has its own copy of them, which could be discarded if they lived in htsjdk, and I think it would benefit Picard and possibly other consumers as well.
@cmnbroad sounds good, I'll wait till we get the go ahead from maintainers, then I can work on this.