cheat.sh icon indicating copy to clipboard operation
cheat.sh copied to clipboard

exclude and include sections or exact text with special characters in queries

Open wis opened this issue 6 years ago • 7 comments
trafficstars

special characters like the hyphen (-) which search engines use, tilta (~) is also used by search engines for similar results, but cheat.sh use it for keyword lookup. how about hyphen for inclusion and lowercase (_) for exclusion, mnemonically meaning up and down respectively.

while I'm working on #137, cheat.sh didn't return useful results most of the time, it returned results for bash (Bourne again shell) that don't work in Posix compliant bourne shell.

a section and exact text inclusion and exclusion feature would be very useful, which can be used like -bash for section inclusion -"bash" for exact text result inclusion, and _bash to exclusion results that belong to the bash section and _"readarray" to exlude results that include the exact text in the quotes.

of course, you'd need to restrict where these optional operators can be in the query, ether in the begging or the end, in order to parse the query correctly.

I know cheat.sh is not a search engine with a page rank algorithm, simple limited implementation will be very useful. :smiley:

wis avatar May 04 '19 23:05 wis

@wis

That's indeed a good point, that we need some reliable filtering mechanism, that will filter out all irrelevant responses, first of all the ones,that are written in wrong languages.

It happens quite often with queries for some not (yet) very popular languages, such as rust, that instead of answers in rust one get answers in Python or in Go.

So this mechanism should be developed anyway.

And what about this idea: what if we would use the /sh/ namespace for POSIX compatible answers and /bash/? Without doing an explicit filtering with -bash or -sh?

chubin avatar May 06 '19 10:05 chubin

i'm so glad you agree about the need of this feature.

And what about this idea: what if we would use the /sh/ namespace for POSIX compatible answers and /bash/? Without doing an explicit filtering with -bash or -sh?

that'd would be great, especially since it's hard to find POSIX compatible Bourne shell answers using a search engine without complicated queries.

It happens quite often with queries for some not (yet) very popular languages, such as rust, that instead of answers in rust one get answers in Python or in Go.

I'm sure you're using a StackOverflow data dump server-side, right? if a section is included with the query (cheat.sh/rust/append+to+file) do you currently use it to exclude answers that don't have that section as a tag? (in this case it's rust) it can also be rust/Types, or rust/Pointers

wis avatar May 06 '19 10:05 wis

@wis Actually, it is a little bit more difficult to detect the programming language of the answer than by mere tags. Tags are sometimes not there, sometimes they are wrong. That means that we should analyze the code on the page and guess the language from it (+ using the tags, of course).

For shells (bash/Bourne shell), it would be especially important (and especially tricky), because SO users specify quite often the bash tag for pure shell code and vice versa. So here should be use some more complex heuristics. Maybe shellcheck and then compare the lists of found errors/warnings?

chubin avatar May 06 '19 16:05 chubin

I realized after posting that there are questions on SO about pointers that don't have the pointers tag. so how useful are tags as a ranking signal to pick the result to return? it doesn't look like a reliable signal, so answers to questions about pointers that have the pointers tag shouldn't be prioritized over answers to questions that don't?

linguist seems pretty accurate judging by my GitHub usage, plus if it's good enough for GitHub it should be good enough for cheat.sh.

For shells (bash/Bourne shell), it would be especially important (and especially tricky), because SO users specify quite often the bash tag for pure shell code and vice versa. So here should be use some more complex heuristics. Maybe shellcheck and then compare the lists of found errors/warnings?

ah, I can imagine that it's more difficult to differentiate between bash and sh code, using errors and warnings given by shellcheck is a good idea, it is good at giving great error and warning messages for bash specific code in .sh files/(files with a #!/bin/sh shebang), according to my personal experience/usage of it, speaking of which did you use it while writing cheat.sh, because I found many warnings and IIRC some errors.

wis avatar May 06 '19 17:05 wis

how useful are tags as a ranking signal to pick the result to return?

In my opinion, they are not very useful. They can be used but only with some low-weight coefficient.

I agree with you regarding linguist, and there are several other projects of the sort. That is exactly the tool that we plan to use to improve answers classification and filtering.

Regarding usage shebang for the bash/sh differentiation: it is almost never possible, because the short snippets in the answers (almost) never contain it.

Another, similar problem is detection of python version: the best way I've found so far is to call the 2to3.py conversion script with code and check there is some diff after conversion (or use ast.parse to try to parse the code).

chubin avatar May 06 '19 20:05 chubin

I mentioned the shebang because I'm wasn't sure if Shellcheck can detect to which shell the code is written, I just tested it, I deleted the shebang from cheat.sh.txt (renamed to cht_) and reopened it in vim, there were no shellcheck warnings or errors.

the server-side is a black-box to me, since it's not open source, do you use an use a data dump of SO or use the query API, just-in-time for each cheat.sh query?

wis avatar May 07 '19 02:05 wis

My shellcheck version does not detect shell type automatically, but one can do it with a small trick:

~$ cat /tmp/1.sh 
a=(1 2 3)
echo "${a[@]}"

~$ for i in sh bash ksh; do echo $i $(shellcheck -s $i --format gcc /tmp/1.sh | wc -l); done | sort -k3 | head -1 | awk '{print $1}'
bash

The only not open source part of the server is the stackoverflow adapter, the rest is open source, and it will be open sourced soon too; there are several minor things to be fixed, and hopefuly it will be done soon. Currently the query API is being used, but the datadump can become a good alternative for the offline usage scenario.

chubin avatar May 09 '19 22:05 chubin