pure-bash-bible icon indicating copy to clipboard operation
pure-bash-bible copied to clipboard

curl/wget (need help)

Open 131 opened this issue 7 years ago • 6 comments

I'm trying to PR a working curl / wget (using /dev/tcp)

set -ex

function __curl() {
  read proto server path <<<$(echo ${1//// })
  DOC=/${path// //}
  HOST=${server//:*}
  PORT=${server//*:}
  [[ x"${HOST}" == x"${PORT}" ]] && PORT=80

  exec 3<>/dev/tcp/${HOST}/$PORT
  echo -en "GET ${DOC} HTTP/1.0\r\nHost: ${HOST}\r\n\r\n" >&3
  (while read line; do
   [[ "$line" == $'\r' ]] && break
  done && cat) <&3
  exec 3>&-
}

__curl http://www.google.com/favicon.ico > mine.ico
md5sum mine.ico

Yet 'im stuck on the && cat to handle the file body (binary) I'm sure i can use a new file descriptor && echo , but my bash skill ends here 😞 I can link & use a pure bash [1], yet i'm sure there is something more elegant to do here.

[1] https://unix.stackexchange.com/questions/83926/how-to-download-a-file-using-just-bash-and-nothing-else-no-curl-wget-perl-et

131 avatar Jun 16 '18 15:06 131

I got it working. It's a little slow as it requires two while loops. I'm going to work on making this even faster but for now it's an example. Usage is script url > file.

Example script:

#!/usr/bin/env bash
#
# Download a file in pure bash.

download() {
    IFS=/ read -r _ _ host query <<< "$1"

    # Send the HTTP request.
    exec 3<"/dev/tcp/${host}/80"; {
        printf '%s\r\n%s\r\n\r\n' \
               "GET /${query} HTTP/1.0" \
               "Host: $host"
    } >&3

    # Strip the HTTP headers.
    while IFS= read -r line; do
        [[ "$line" == $'\r' ]] && break
    done <&3

    # Output the file.
    nul='\0'
    while IFS= read -d '' -r line || { nul=""; [[ -n "$line" ]]; }; do
        printf "%s%b" "$line" "$nul"
    done <&3

    exec 3>&-
}

download "$1"

dylanaraps avatar Jun 16 '18 23:06 dylanaraps

The fist loop is reasonably slow as it will just drop a sane amount of headers, i can't understand how the 2nd loop (a simple cat !!) can be so complicated (hence, slow i guess)

131 avatar Jun 16 '18 23:06 131

Bash is slow at file IO and it doesn't handle binary data very well. I'm sure it can be optimized but I have some doubts as to whether or not this will ever be faster than wget/curl.

dylanaraps avatar Jun 17 '18 00:06 dylanaraps

According to the "bash bible" - yours :p a simple cat alternative might be

file_data="$(<"file")"

Yet i cannot make this work with my design, but i do not understand why

131 avatar Jun 17 '18 00:06 131

cat handles binary data correctly iirc, bash doesn't. What causes a larger problem is that bash handles binary data and null bytes differently depending on which version you're using (In 4.4+ null bytes are skipped and never reach the variable).

dylanaraps avatar Jun 17 '18 00:06 dylanaraps

All the other examples here make sense and can often be faster than invoking another program. However, in the case of networking, I think it makes sense to depend on the binaries, both for useability and performance.

In the case of wget / curl replacements, all of these only work when you have a HTTP endpoint. This code is not going to work for HTTPS.

darnir avatar Aug 09 '18 09:08 darnir