bin icon indicating copy to clipboard operation
bin copied to clipboard

Feature Request: Optimized select for selecting the binary

Open breml opened this issue 3 years ago • 18 comments

When presenting the user a list of potential files to select the correct binary from, the following improvements could be applied to improve the user experience:

  • Filter files with extensions, that are most likely not executables or find the files with extensions, that are likely executables (e.g. .exe, .sh and of course no extension)
  • Filter by elements provided in the install url (e.g. repo name in the case of the github provider)
  • Evaluate FileInfo() information from header of archive (e.g. in tar and zip) to find files with executable bit set
  • Order by relevance
  • Inverse order, such that the files with the highest relevance are at the end of the list (and therefore closest to the prompt), this is especially important if the archive contains lots of files and the list might span even multiple pages).
  • Optional: add a fuzzy search like https://github.com/junegunn/fzf or https://github.com/ktr0731/go-fuzzyfinder

breml avatar Mar 20 '21 12:03 breml

The current approach uses a very minimal scoring method, which includes having the repo's name in the binary name or URL's basename. This should already give priority to files that at least include the repo name. However, the other suggestions sound interesting, and especially looking at FileInfo() in .tar and .zip files and other archives that support the executable ti sounds like a quick win.

sirlatrom avatar Mar 20 '21 20:03 sirlatrom

Finding binaries by reading it's MIME type would be awesome, every time I update a binary I've to manually select it between ~10-20 files.

Is this open to contributions?

cristiand391 avatar Mar 31 '21 02:03 cristiand391

Is this open to contributions?

Defintely!

For updating binaries I believe there's actually something better we can do. We could save the name of the original file selected the first time and then whenever triggering an update, check if that same file exists to fetch it without asking the user again. Additionally, I suggest we can do what @sirlatrom suggests about improving the scoring method to target the files better to remove altogether the selection step.

I guess we can implement this in several steps

  • Score better archive files based on OS and Arch and don't prompt the user if we have a single high scoring file
  • Save the selected file the first time and use the same name upon updates.

marcosnils avatar Mar 31 '21 02:03 marcosnils

  • Save the selected file the first time and use the same name upon updates.

This will help a lot as it's difficult to check out the same binary when updating. Sometimes I have ended up downloading the checkgen instead of the executable.

akhan4u avatar Mar 31 '21 14:03 akhan4u

  • Score better archive files based on OS and Arch and don't prompt the user if we have a single high scoring file
  • Save the selected file the first time and use the same name upon updates.

The first point is already handled for .tar(.*) and .zip archives as the same filtering/selection mechanism is used there as for 'top level' files/assets.

I'm not sure how we can handle the second idea as there can theoretically be an indefinitely long chain. Maybe we can somehow store each choice along the way and 'pop' a choice for each part of the chain?

sirlatrom avatar Mar 31 '21 14:03 sirlatrom

I'm not sure how we can handle the second idea as there can theoretically be an indefinitely long chain. Maybe we can somehow store each choice along the way and 'pop' a choice for each part of the chain?

Hmm maybe I missing something here? What I had in mind is:

  • Install a .tar binary and keep the original final file name in the tar file (regardless of the final binary name) on the bin config
  • When performing an update, check the tar files again and look for a match on the initially saved file. If yes, just use that same file.

Not sure I'm missing something Sune, since I didn't quite understand the "indefinitely long chain" part.

marcosnils avatar Mar 31 '21 16:03 marcosnils

  • When performing an update, check the tar files again and look for a match on the initially saved file. If yes, just use that same file.

Not sure I'm missing something Sune, since I didn't quite understand the "indefinitely long chain" part.

@marcosnils Not very likely, so we don't need to handle it, but there could be a binary within a tar.gz within a zip etc, each with an ambiguous list of files, and we'd need to remember each choice the user made along the way.

Practically speaking, we should at least remember which top level asset was chosen, and if it's an archive then which file was chosen within that archive.

sirlatrom avatar Mar 31 '21 16:03 sirlatrom

I guess we can implement this in several steps

  • Score better archive files based on OS and Arch and don't prompt the user if we have a single high scoring file

I would like to emphasize once again, that I do not like the scoring part about OS and Arch. There is really no value in ever presenting the user a file, that does not match the OS or the Arch, even if such a file has the highest score, for example if there is no file available for the OS/Arch of the user. I had this situation once, where bin installed a Windows exe on my Linux and in my opinion, this should never happen. So I propose a filtering by OS and Arch and apply the scoring only to the files, that remain as options after the filtering has been applied.

breml avatar Mar 31 '21 20:03 breml

So I propose a filtering by OS and Arch and apply the scoring only to the files, that remain as options after the filtering has been applied.

I agree with this approach. I believe we're on the same page here and we're mostly discussing semantics. Files with different OS / Arch should score 0 by default and we shouldn't present that option to the user (unless eventually overridden by a flag?).

Not very likely, so we don't need to handle it, but there could be a binary within a tar.gz within a zip etc, each with an ambiguous list of files, and we'd need to remember each choice the user made along the way.

Now I understood your original concern. I guess we can save all the file chain, doesn't seem very difficult to do. However, I still haven't come across a scenario with multiple zipped files to ultimately get a binary. Not sure how often this becomes in practice, since it's not very standard right?

marcosnils avatar Mar 31 '21 23:03 marcosnils

Files with different OS / Arch should score 0 by default and we shouldn't present that option to the user (unless eventually overridden by a flag?).

Currently, any asset containing the repo name gets a score of 1 to begin with, and additional points for matching the OS/arch/OS specific extension (.exe/.appimage). I don't think we can expect all repos to play nice and include both the OS and the arch in asset names or binary names within archive assets. What's the best way to move forward?

Not sure how often this becomes in practice, since it's not very standard right?

Agreed. We would still need to save two choices, though: Which archive, and which binary within the archive.

sirlatrom avatar Apr 01 '21 01:04 sirlatrom

I don't think we can expect all repos to play nice and include both the OS and the arch in asset names or binary names within archive assets. What's the best way to move forward?

My basic scoring proposal:

  • All files start with score 0
  • If file has arch and/or OS and it doesn't match the bin host, subtract -1
  • If file has arch and/or OS and does match the bin host, add +1

Given scores:

  • Single high score file, install automatically
  • Multiple score files => 0, prompt the user order by score desc
  • Files with score < 0 don't prompt the user

I'm probably missing something and there's surely a better way of doing it, I just wrote the first idea that came to my mind.

marcosnils avatar Apr 01 '21 01:04 marcosnils

* match the `bin` host

What does that mean? Do you mean the repo name? That's what we already do, but I suppose we can wait with giving that point until we've found at least one of the os/arch matches first.

sirlatrom avatar Apr 01 '21 01:04 sirlatrom

Ok referring to bin host OS and Arch. I'm aware that were doing some of those things already, I was just describing how I see the overall algorithm working

sent from mobile

Em qua, 31 de mar de 2021 22:46, Sune Keller @.***> escreveu:

  • match the bin host

What does that mean? Do you mean the repo name? That's what we already do, but I suppose we can wait with giving that point until we've found at least one of the os/arch matches first.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/marcosnils/bin/issues/67#issuecomment-811578161, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMBLWTLDJNKLWNVBQB3C3TTGPGAPANCNFSM4ZQL5TZQ .

marcosnils avatar Apr 01 '21 02:04 marcosnils

I don't think we can expect all repos to play nice and include both the OS and the arch in asset names or binary names within archive assets. What's the best way to move forward?

My basic scoring proposal:

  • All files start with score 0
  • If file has arch and/or OS and it doesn't match the bin host, subtract -1
  • If file has arch and/or OS and does match the bin host, add +1

Given scores:

  • Single high score file, install automatically
  • Multiple score files => 0, prompt the user order by score desc
  • Files with score < 0 don't prompt the user

I'm probably missing something and there's surely a better way of doing it, I just wrote the first idea that came to my mind.

In general I like the above proposal. One downside I see is, that a file with the correct os, but the wrong arch will still get a score of 0 (+1 -1) and therefore this file remains a candidate. So I guess in order for a file to be considered a candidate, it must achieve at least a score > 0.

Additionally I would like to work towards an algorithm, that is successful in most cases to pick an archive and perform a successful installation and only in very few exceptional cases, it should be necessary for the user to select an archive. One step into this direction would be to put the different archive types into an ordered list (ordered by priority). This would allow us to successfully install the binary even if there are multiple archive types available (e.g. tar.gz and .zip).

I have an additional idea, which I feel worth exploring and this idea is to check, if the repo does contain a .goreleaser.yml file. I know, this targets only towards Go, but I feel that goreleaser is becoming the defacto standard for releasing binaries in the Go eco system. The hugh advantage of considering this file is, that we no longer need to guess if arch / os are present in the file name, because based on the existence of the replacement section, we know which is the correct file to download.

Example from bin:

archives:
- replacements:
    darwin: Darwin
    linux: Linux
    windows: Windows
    386: i386
    amd64: x86_64

It might be worth it to try to figure out, if there is something similar for e.g. Rust.

breml avatar Apr 01 '21 18:04 breml

I did a quick test with my ~50 binaries managed with bin. For a little bit more than 1/3, I found a .goreleaser.yml.

breml avatar Apr 01 '21 18:04 breml

Just for reference, this site lists the valid combinations of arch/os supported by the Go compiler: https://gist.github.com/asukakenji/f15ba7e588ac42795f421b48b8aede63

breml avatar Apr 04 '21 10:04 breml

I'd like to contribute some more examples test cases that could affect this issue:

This issue might overlap with #102.

schnatterer avatar May 30 '21 12:05 schnatterer

I'd like to add the case where there are alternate binaries matching your platform, such as:

  • statically/dynamically linked
  • libc/musl

pataquets avatar May 02 '23 14:05 pataquets