curl/wget (need help)
I'm trying to PR a working curl / wget (using /dev/tcp)
set -ex
function __curl() {
read proto server path <<<$(echo ${1//// })
DOC=/${path// //}
HOST=${server//:*}
PORT=${server//*:}
[[ x"${HOST}" == x"${PORT}" ]] && PORT=80
exec 3<>/dev/tcp/${HOST}/$PORT
echo -en "GET ${DOC} HTTP/1.0\r\nHost: ${HOST}\r\n\r\n" >&3
(while read line; do
[[ "$line" == $'\r' ]] && break
done && cat) <&3
exec 3>&-
}
__curl http://www.google.com/favicon.ico > mine.ico
md5sum mine.ico
Yet 'im stuck on the && cat to handle the file body (binary) I'm sure i can use a new file descriptor && echo , but my bash skill ends here 😞 I can link & use a pure bash [1], yet i'm sure there is something more elegant to do here.
[1] https://unix.stackexchange.com/questions/83926/how-to-download-a-file-using-just-bash-and-nothing-else-no-curl-wget-perl-et
I got it working. It's a little slow as it requires two while loops. I'm going to work on making this even faster but for now it's an example. Usage is script url > file.
Example script:
#!/usr/bin/env bash
#
# Download a file in pure bash.
download() {
IFS=/ read -r _ _ host query <<< "$1"
# Send the HTTP request.
exec 3<"/dev/tcp/${host}/80"; {
printf '%s\r\n%s\r\n\r\n' \
"GET /${query} HTTP/1.0" \
"Host: $host"
} >&3
# Strip the HTTP headers.
while IFS= read -r line; do
[[ "$line" == $'\r' ]] && break
done <&3
# Output the file.
nul='\0'
while IFS= read -d '' -r line || { nul=""; [[ -n "$line" ]]; }; do
printf "%s%b" "$line" "$nul"
done <&3
exec 3>&-
}
download "$1"
The fist loop is reasonably slow as it will just drop a sane amount of headers, i can't understand how the 2nd loop (a simple cat !!) can be so complicated (hence, slow i guess)
Bash is slow at file IO and it doesn't handle binary data very well. I'm sure it can be optimized but I have some doubts as to whether or not this will ever be faster than wget/curl.
According to the "bash bible" - yours :p a simple cat alternative might be
file_data="$(<"file")"
Yet i cannot make this work with my design, but i do not understand why
cat handles binary data correctly iirc, bash doesn't. What causes a larger problem is that bash handles binary data and null bytes differently depending on which version you're using (In 4.4+ null bytes are skipped and never reach the variable).
All the other examples here make sense and can often be faster than invoking another program. However, in the case of networking, I think it makes sense to depend on the binaries, both for useability and performance.
In the case of wget / curl replacements, all of these only work when you have a HTTP endpoint. This code is not going to work for HTTPS.