fpath icon indicating copy to clipboard operation
fpath copied to clipboard

`is_prefix`, `find_prefix` and `rem_prefix` are inconsistent

Open rand00 opened this issue 2 years ago • 17 comments
trafficstars

I was surprised to find that rem_prefix on two equal paths returned None - but makes sense as an edgecase of not being able to construct Fpath.v "". Though I don't think it feels nice that is_prefix isn't consistent with rem_prefix - so an idea could be to return more information from rem_prefix, e.g.:

type rem_prefix = [
  | `Prefix of t
  | `Equal
]
val rem_prefix : t -> t -> rem_prefix option

rand00 avatar Feb 09 '23 11:02 rand00

An alternative would be for is_prefix to denote strict prefixes. It seems that's what I opted for here.

But I guess at that point it's a bit late to change that.

dbuenzli avatar Feb 09 '23 14:02 dbuenzli

I feel like the behaviour of String.starts_with will be what people expect, so that is in favour of the existing is_prefix - though maybe that isn't enough of an argument; people will need to read the docs anyway to be sure - otherwise it should be called something like is_prefix_strictly (to be able to autocomplete on is_prefix..)

.. or ofc one could add a ~strict param

rand00 avatar Feb 09 '23 15:02 rand00

... though it doesn't feel too nice with those last proposals for is_prefix - so I'm still leaning towards returning more info from rem_prefix

Edit: After all it's rem_prefix that has the edgecase

rand00 avatar Feb 09 '23 15:02 rand00

I feel like the behaviour of String.starts_with will be what people expect

That's not a good expectation though, as you want path segments to be respected by the prefix relation, we are not globbing here. See point 2. of the doc string I liked to.

Edit: After all it's rem_prefix that has the edgecase

I don't see any edgecase here. If you remove the prefix p from p you get nothing which is not a path, hence you get None.

dbuenzli avatar Feb 09 '23 15:02 dbuenzli

I don't see any edgecase here. If you remove the prefix p from p you get nothing which is not a path, hence you get None.

I see it as an edgecase because is_prefix p p returns true and rem_prefix p p returns None. Though another formulation could be that 2 different None states gets merged, so one looses relevant information.

I optimally don't want to write code where I need to both call rem_prefix and afterwards call equal.

rand00 avatar Feb 09 '23 15:02 rand00

Concretely: https://github.com/rand00/mtag/blob/7d0162ae5e9c9c3a411cc005dd73c167bf1d1561/src/mtag.ml#L549

rand00 avatar Feb 09 '23 15:02 rand00

I see it as an edgecase because is_prefix p p returns true and rem_prefix p p returns None.

You need to turn the problem upside down :-) If you want rem_prefix to be a good combinator on paths rather than hodgepodge your first message proposes, then the problem lies in the definition of is_prefix.

Though another formulation could be that 2 different None states gets merged, so one looses relevant information.

Again depends on your definition of prefix…

Personally I expect a remove prefix function to return a path without a given prefix or None if that's not possible. Not a path without prefix or None or another edge case.

I optimally don't want to write code where I need to both call rem_prefix and afterwards call equal.

Maybe you don't. But then you annoy all the other users who are interested in using the prefix function for testing path containments (after a suitable realpath of course).

Testing for equality in the None case if you are interested in treating that case specially seems better to me.

dbuenzli avatar Feb 09 '23 15:02 dbuenzli

So you prefer a single kind of use to be very easy (strict path containment), and other kinds of use to be more involved. I can see the point in that. Though I would have expected more users to be interested in path equality related to rem_prefix.

Otoh. concerning my ideals of interface design, I'm often skeptical of only keeping to option-results no matter what; as what often happens is the same as here, where None gets several meanings, and you loose information that the operator internally already knows.

Would be interested in seeing how annoying a more informative signature would be for the usecases you think about.

rand00 avatar Feb 09 '23 16:02 rand00

So you prefer a single kind of use to be very easy (strict path containment), and other kinds of use to be more involved.

That's not how I would put it.

I prefer to have, unlike what we have here, good definitions and a consistent sets of combinators that implement them. That's better for composition.

Hodgepodge result don't compose well and are rather unnatural. In this case it transforms rem_prefix into some frankenfunction that does both prefix removal and equality testing.

If you try to use a non-strict definition of prefix for paths you constantly run into problems with your prefix combinators because you don't have a neutral element[^1].

[^1]: Unless you accept "" as a path even if it's not. I can't say whether it was a good idea to leave it out of the strings behind Fpath.t but that's what we have.

Would be interested in seeing how annoying a more informative signature would be for the usecases you think about.

For example I couldn't write simple one liners like:

let cwd_paths ~cwd ps = List.filter_map (Fpath.rem_prefix cwd) ps 

dbuenzli avatar Feb 09 '23 17:02 dbuenzli

I also see your frankenfunction point, though I feel like each of our points have their value depending on which angle you look from. My initial focus was on what I feel is probably often natural when one uses rem_prefix in practice. E.g. some might be interested in including cwd in result of cwd_paths if it was part of ps. Also - ps might contain Fpath.(cwd / ".") anyway...

rand00 avatar Feb 09 '23 22:02 rand00

My initial focus was on what I feel is probably often natural when one uses rem_prefix in practice. E.g. some might be interested in including cwd in result of cwd_paths

I don't see what's natural in practice in what you propose.

Strict prefixes on absolute paths corresponds to a notion of file hierarchy rooted at a given prefix and it seems that's what I need most of the time. The prefix is the root, the choppable absolute paths, those paths that are therein.

The notion of path is a total mess. I have in the past tried to devise and use smarter functions giving relative segments in results but that always ended up in bugs (.e.g. not so smartly introduced "." turning files into directories) or puzzling results for users.

Nowadays when working with paths I mostly try to always make everything absolute asap via realpath, work with strict prefix functions that never introduce relative segments and treat relativisation as a rendering aspect. A lot of the mess I had is gone and the code is easier to understand.

Also - ps might contain Fpath.(cwd / ".") anyway...

Indeed that's why I wrote above:

(after a suitable realpath of course)

dbuenzli avatar Feb 09 '23 23:02 dbuenzli

Okay I'm convinved. So the primary problem is that the semantics of is_prefix can't be changed

rand00 avatar Feb 09 '23 23:02 rand00

It would be unwise to change. What we could do is add is_strict_prefix and deprecate is_prefix.

dbuenzli avatar Feb 09 '23 23:02 dbuenzli

And then also a find_strict_prefix?

rand00 avatar Feb 09 '23 23:02 rand00

The problem with is_strict_... is that user can't find the function via merlin autocomplete

rand00 avatar Feb 09 '23 23:02 rand00

And then also a find_strict_prefix?

I think find_prefix does what you want.

You just need to define it differently something like the longest path r (if any) such that both rem_prefix r p and rem_prefix r q is Some _.

dbuenzli avatar Feb 10 '23 00:02 dbuenzli

λ » Fpath.(find_prefix (v "/home") (v "/home") ) ;;
- : Fpath.t option = Some /home

.. so is not a strict prefix found

rand00 avatar Feb 10 '23 00:02 rand00