htsjdk icon indicating copy to clipboard operation
htsjdk copied to clipboard

IOUtil.getPath fails when the input has spaces.

Open lbergelson opened this issue 7 years ago • 6 comments

IOUtil.getPath("some file") explodes:

Caused by: java.net.URISyntaxException: Illegal character in path at index 4: some file
	at java.net.URI$Parser.fail(URI.java:2848)
	at java.net.URI$Parser.checkChars(URI.java:3021)
	at java.net.URI$Parser.parseHierarchical(URI.java:3105)
	at java.net.URI$Parser.parse(URI.java:3063)
	at java.net.URI.<init>(URI.java:588)
	at java.net.URI.create(URI.java:850)
	... 25 more

This is due to converting it to a URI first. It should not explode.

lbergelson avatar Nov 17 '17 20:11 lbergelson

I also think we should be using the URI constructor and not the URI.create method after reading the javadoc for the latter.

nh13 avatar Nov 17 '17 20:11 nh13

I had a look at this yesterday, I think we can get rid of the error if we check that the passed string is a valid URI (has scheme, no illegal characters, etc.) before invoking URI.create. If not, then the string is treated as a local filesystem path.

Thoughts?

jb-adams avatar Jun 19 '19 13:06 jb-adams

@jb-adams Hi Jeremy - I think you're on the right track. We have developed a more generalized solution for this in a separate repo, that we use in GATK, in the form of an interface and a concrete type that handles these conversions. It normalizes all inputs to a URI internally, and also has some other advantages. Integrating that into htsjdk is a much a larger task, but you might look at the implementation for inspiration, and there are also a boatload of test cases in there.

cmnbroad avatar Jun 19 '19 13:06 cmnbroad

Hi @cmnbroad, looks great, I think it would be good to implement something similar in htsjdk. I'm pretty new to developing here, so I was curious, will htsjdk-next-beta replace htsjdk? Which repo are current versions of GATK and Picard using?

jb-adams avatar Jun 19 '19 17:06 jb-adams

@jb-adams GATK and Picard both depend on this repo; htsjdk-next doesn't (yet) have a sufficient critical mass of functionality to warrant switching. The URI classes were designed with the idea that they would migrate their way here at some point, but I'd suggest getting maintainer buy-in before spending much time on it. @lbergelson or @tfenne any thoughts on this ? GATK has its own copy of them, which could be discarded if they lived in htsjdk, and I think it would benefit Picard and possibly other consumers as well.

cmnbroad avatar Jun 20 '19 11:06 cmnbroad

@cmnbroad sounds good, I'll wait till we get the go ahead from maintainers, then I can work on this.

jb-adams avatar Jun 20 '19 12:06 jb-adams