embark icon indicating copy to clipboard operation
embark copied to clipboard

Make file target finder avoid identifiers that match their filename

Open woolsweater opened this issue 2 years ago • 19 comments

I'd like to suggest that some variation of the string match condition be brought back to the file target finder to prevent essentially the same problem as https://github.com/oantolin/embark/issues/530 and https://github.com/oantolin/embark/issues/423 in other languages/file types.

For a language where filenames are conventionally the same casing as the entities they contain, for example Swift:

Foo.swift

struct Foo {
  //...
}

putting point on "Foo" and calling embark-act produces a file target first, not an identifier. The language doesn't really matter of course; you could also see this in a Python file (although the Python convention is to use lower-case filenames):

Foo.py

class Foo(object):
    pass

Since (thing-at-point 'filename) produces "Foo" as the result, the string match would also have to be anchored or it'll just find that inside the ffap result.

Naively, that would be something like (string-match-p (rx string-start (literal tap-file)) ffap-file). This doesn't prevent a file path like "/tmp/foo/bar" from being recognized as a file, since in that case tap and ffap will agree. But it does fail for a file path with spaces: "/my docs/foo/bar" if point is after the space. So I'm not sure what the perfect solution is.

woolsweater avatar Oct 20 '23 22:10 woolsweater

How about this rule: in prog modes a bare identifier which ffap thinks is a file only counts as a file if it is on the same line as the word "import", "require" or "load"? That way, your class Foo woudn't match, but import Baz would, and embark-dwim on the Baz would take you to Baz.py.

I'm not sure about the rule yet, so I implemented it in a separate branch only-file-if-import (a5825753be44323411b14d885642a486082d6a65). I'd appreciate feedback on it.

oantolin avatar Oct 24 '23 13:10 oantolin

In don't think this is a good rule. There are many more such "include" keywords. Furthermore I'd also like to act in file names appearing at other places. I have to think more about a better alternative proposal.

minad avatar Oct 24 '23 13:10 minad

Yeah, as I said I'm not convinced this is a good rule, but like @woolsweater, I don't really like that class Foo matches Foo.py.

There are many more such "include" keywords.

Are there?

. Furthermore I'd also like to act in file names appearing at other places.

The rule is only for file names which are identifiers, where else would you want them?

oantolin avatar Oct 24 '23 13:10 oantolin

The rule is only for file names which are identifiers, where else would you want them?

Okay then it is probably alright. But then I wonder why file is not prioritized lower than identifier. This would also resolve this, wouldn't it?

minad avatar Oct 24 '23 14:10 minad

But then I wonder why file is not prioritized lower than identifier.

Because with point on the foo in ~/notes/foo.txt you want the target ~/notes/foo.txt before the target foo.

oantolin avatar Oct 24 '23 14:10 oantolin

That seems like a clever solution. I'll give it a trial today.

woolsweater avatar Oct 24 '23 16:10 woolsweater

@oantolin

Because with point on the foo in ~/notes/foo.txt you want the target ~/notes/foo.txt before the target foo.

One could add two file finders with different priorities (embark-target-file-path-at-point, embark-target-guess-file-name-at-point), such that full file names (or file names with extension) win over identifiers. Iirc we've avoided adding multiple finders until now, but I must say I would find that less ugly than hardcoding matching "include", "import" or "require".

minad avatar Oct 24 '23 17:10 minad

I like that suggestion, @minad!

Iirc we've avoided adding multiple finders until now

We have mostly avoided it, but not completely: there are embark-target-library-file-at-point (for elisp libraries) and embark-target-file-at-point. So there's definitely precedent for your suggestion.

oantolin avatar Oct 24 '23 17:10 oantolin

Okay, I see. I thought libraries at point have some further special treatment, such that they do not really count as files. (EDIT: They have their own map!)

For more context - I often feel that ffap matches way too often and just makes up some file name. But I am not sure how much of this guessing is carried over to Embark. I would still like to have such guessing at low priority via cycling, but the high priority matcher should require file base name plus extension or some incomplete name plus slash.

minad avatar Oct 24 '23 17:10 minad

I thought libraries at point have some further special treatment, such that they do not really count as files. (EDIT: They have their own map!)

Yes, a symbol naming an elisp library gives you two targets: a library (that goes with that map you mentioned) and a file.

oantolin avatar Oct 24 '23 17:10 oantolin

I've also just checked embark--targets. It removes duplicate targets, such that Embark is already prepared for a setup with multiple finders of the same type. One should probably not overdo it and split up the finders in too many fine grained pieces, but in some cases like this one it seems reasonable and flexible. Did we have other similar issues in the past? Iirc there were ordering issues regarding embark-org but these have all been fixed satisfactorily by reordering the finders?

minad avatar Oct 24 '23 17:10 minad

I went with @minad's suggestion instead of hard-coding some import-like keywords. Thanks, @minad, it is much nicer this way.

oantolin avatar Oct 26 '23 18:10 oantolin

Great, thanks! It is indeed nicer now with the clean separation of file name detection with high confidence and less reliable guessing via ffap. I see there was also a bug in the target deduplication :)

minad avatar Oct 26 '23 18:10 minad

Magnificent, thank you!

woolsweater avatar Oct 26 '23 19:10 woolsweater

I see there was also a bug in the target deduplication :)

That's debatable: in the default configuration I believe I was only generating adjacent duplicates so it didn't matter whether I took the first or last among duplicates. But now that we have duplicate files with an identifier in the middle, it makes more sense to always take the first among duplicates.

oantolin avatar Oct 26 '23 19:10 oantolin

That's debatable: in the default configuration I believe I was only generating adjacent duplicates so it didn't matter whether I took the first or last among duplicates. But now that we have duplicate files with an identifier in the middle, it makes more sense to always take the first among duplicates.

Yes, a bug with respect to my interpretation in https://github.com/oantolin/embark/issues/673#issuecomment-1777688439. I had always assumed that the higher prioritized target wins, the one which comes first in the list of target finders.

minad avatar Oct 26 '23 20:10 minad

hi, i know this issue has been closed, but i think i have a corner case for this functionality. i recently started working on a new Clojure project where sometimes we have

(ns foo.core 
  (:require '[foo.bar.core :as bar]))

(bar/fn ...)

with the following directory structure

src
└── foo
    ├── bar
    │   └── core.clj
    ├── core.clj

and embark-dwim on bar/fn seems to think it is a file path to some file on the bar/ directory.

bigodel avatar May 02 '24 14:05 bigodel

Oops, thanks for reporting. I can confirm the problem. I'll reopen this as a reminder.

oantolin avatar May 02 '24 15:05 oantolin

Oops, thanks for reporting. I can confirm the problem. I'll reopen this as a reminder.

no worries. probably a very specific corner case. i've been using M-. as embark-dwim for about 2 years now and haven't had this issue until recently. do you have any idea on a possible solution? i could work on it and create a PR

bigodel avatar May 02 '24 17:05 bigodel