jgo
jgo copied to clipboard
Add additional check to prevent false positive endpoints
Addresses #92
This works because it specifically addresses my use case described in #92 (some string of the form A:/B:C in the arguments). I am not too happy with the solution because it opens the door to just adding more and more special case checks. Maybe a regular expression check would be better in Endpoint.is_endpoint, e.g. something along the lines of
[a-z][a-z0-9-_]+[a-z0-9](\.[a-z][a-z0-9-_]+[a-z0-9])
Unfortunately, maven is not very specific with their naming conventions
Codecov Report
Merging #93 (a5b4226) into main (fcaf307) will not change coverage. The diff coverage is
100.00%.
:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more
@@ Coverage Diff @@
## main #93 +/- ##
=======================================
Coverage 94.51% 94.51%
=======================================
Files 6 6
Lines 875 875
=======================================
Hits 827 827
Misses 48 48
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/jgo/jgo.py | 91.97% <100.00%> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
@hanslovsky Thanks for pursuing a solution! But I would have thought the -- separator should be good enough to prevent jgo from matching subsequent arguments as potential endpoint expressions, no? The fact that using -- didn't work for your use case is a bug, isn't it? Or do I misunderstand the issue?
@ctrueden I am not sure that this is a bug and if -- is intended to be used by jgo. I did not find it at all in jgo.py. We can certainly think about adding it here, but we will also need to do some research if it already has some reserved functionality in argparse. I found it in util.main_from_endpoint, though, but the use case is a bit different: It does not need to scan the argv for the endpoint string. It converts the user-provided argv into something that jgo.main can use.
There is also a similar check to the one I added in is_endpoint:
endpoint_elements[0].startswith("-")
going back to 3c577183a24713a8f912b87a6407c805998a74e8 (2018 is 5 years ago btw :scream:)
There is also a pattern check for the entrypoint candidate:
(.*https?://.*|[a-zA-Z]:\\.*)
Maybe we should change this to
(.*(https?|grpc)://.*|[a-zA-Z]:\\.*)
or maybe even to
(.*://.*|[a-zA-Z]:\\.*)
Unfortunately, the maven naming guides are not very helpful for what we can exclude. // is probably never part of a valid groupId or artifactId. I would like to have a single pattern that we can use confidently rather than updating the check for every problem we run into.
I added grpc to pattern instead of adding a check for / in Endpoint.is_endpoint. This seems cleaner to me. It might make sense to move the pattern check into Endpoint.is_endpoint, too.
Bumping this, @ctrueden @hanslovsky. I'm running into this when using jgo with arguments that contain s3:// paths. I liked @hanslovsky original proposal of catching things with a URL shape instead of going protocol by protocol.
Maybe the best option would be to change this PR to use
(.*://.*|[a-zA-Z]:\\.*)
and investigate double dash (--) as another option to explicitly separate jgo args from program arguments.
@kephale this could be a temporary workaround: I was able to fix the issue that I had by changing the URL shape input from a positional argument of my Java CLI to an option that I could specify like
--dataset=grpc://host:port
This, of course, works only if you own the code for the CLI that you would like to run, and can change the CLI without breaking anyone else's programs.
@hanslovsky aha! I tried the generalized regex, but only now realize what you were saying with the --. I'll try that one when I run into this again (which will be soon :D)