wordpress-playground
wordpress-playground copied to clipboard
Explore curl support
What is this PR doing?
Explores building PHP with libcurl support
CURL builds, PHP builds with the --with-curl
flag, curl_init()
etc run as expected.
However, running the following PHP snippet fails:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://wordpress.org');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
var_dump($output);
var_dump(curl_error($ch));
curl_close($ch);
Reproduction link: http://localhost:5400/website-server/?php=8.0&wp=6.4&storage=none&php-extension-bundle=kitchen-sink&url=/test-curl.php
Curl likely runs fork()
internally, similarly to PHP's proc_open()
. Getting it to work in Playground will require patching curl
source code to remove that fork()
call and, likely, replace it with a JavaScript function call – similarly to the0 proc_open()
patch.
To rebuild curl, run:
cd packages/php-wasm/compile
rm -rf libcurl/dist
make libcurl
cd ../../../
nx reset; npm run recompile:php:web:kitchen-sink:8.0
cc @mho22 – I spent an hour here just to get to the first roadblock. I won't be able to spend more time here for now – you're more than welcome to take over. I'd love to see a functional CURL extension!
Related resources
- https://github.com/WordPress/wordpress-playground/issues/85
- https://github.com/WordPress/wordpress-playground/pull/1093
File descriptors, fork and NTLM An application that uses libcurl and invokes fork() gets all file descriptors duplicated in the child process, including the ones libcurl created. libcurl itself uses fork() and execl() if told to use the CURLAUTH_NTLM_WB authentication method which then invokes the helper command in a child process with file descriptors duplicated. Make sure that only the trusted and reliable helper program is invoked!
https://github.com/curl/curl/blob/647e86a3efe1eea7a2a456c009cfe1eb55fe48eb/docs/libcurl/libcurl-security.md?plain=1#L452C1-L462C1
NTML and NTLM_WP are both set to no in PHP info. I don't think that this is caused by using fork
.
This message is documented in CURL. We could try to disable AsynchDNS
and see if this resolves the issue.
Disabling AsynchDNS resolved the thread failed to start error.
Now the test request times-out and from the sound of my fans, it keeps doing something in the background. I haven't debugged it.
The verbose output has some insights:
* Trying 172.29.1.0:80... * Could not set TCP_NODELAY: Protocol not available * Connection timed out after 10000 milliseconds * Closing connection 0 bool(false) string(45) "Connection timed out after 10000 milliseconds"
I see that we use TCP_NODELAY in PHP-WASM, but I don't know where is this coming from: TCP_NODELAY: Protocol not available
curl_setopt($ch, CURLOPT_TCP_NODELAY, 0);
resolves the TCP_NODELAY
, now I'm back to timeouts.
I need to wrap up now, this is what I found today:
- NTML and NTLM_WP are disabled so they won't trigger
fork
- AsynchDNS started a new thread and that was resolved by disabling it
- TCP_NODELAY is enabled by default, but it doesn't work (Protocol not available), disabling it in the request works, but I'm not sure if it's required
- Requests are now timing out without any errors. I assume, that the request isn't properly sent to WASM, or that the response isn't properly returned, but I wasn't able to debug this part today.
I attempted another approach on my end by trying to run php-wasm/node
with curl
. It does appear in my modules list when I run it. I created a file named curl.php
:
<?php
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, 'http://wordpress.org' );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
var_dump( curl_version() );
var_dump( curl_getinfo( $ch ) );
$output = curl_exec( $ch );
var_dump( $output );
var_dump( curl_error( $ch ) );
curl_close( $ch );
I still get the same error of course :
bool(false)
string(37) "getaddrinfo() thread failed to start\n"
So I tried to make a comparison with built-in php
where no error occur when running php curl.php
.
curl_version()
in php-wasm
:
["ssl_version"]=>
string(0) ""
["libz_version"]=>
string(0) ""
when curl_version()
in PHP8.3
:
["ssl_version"]=>
string(13) "OpenSSL/3.1.4"
["libz_version"]=>
string(5) "1.3.1"
Additionally, the following information is present in PHP 8.3
but missing in php-wasm
when running curl_getinfo($ch)
:
["effective_method"]=>
string(3) "GET"
["capath"]=>
string(14) "/etc/ssl/certs"
["cainfo"]=>
string(17) "/etc/ssl/cert.pem"
I'm not sure if this information could be helpful, but it's something I noticed."
Thank you @mho22! I have an hour now and can take a look at it.
Good sources of information
Fixed the link https://github.com/curl/curl/blob/master/configure.ac
It looks like this will take some effort to find the correct combination of flags and link all required libraries.
For example, the scp
protocol which I assume we need, requires libssh
. We need to add libssh
, build it, and link it.
@mho22 feel free to take over, I'm not sure if I will have time to work more on this.
scp
protocol which I assume we need
I only had the http://
and https://
support in mind for the first iteration here. That's about what the browser can support anyway. Anything beyond that would make a great follow-up effort, but I wouldn't block v1 on it.
I don't know why it produces an error here but :
cd packages/php-wasm/compile
rm -rf libcurl/dist
make libcurl
returns :
#14 17.84 CC ../lib/curl-nonblock.o
#14 17.91 CC ../lib/curl-warnless.o
#14 17.98 CC ../lib/curl-curl_ctype.o
#14 18.05 CCLD curl
#14 18.15 wasm-ld: error: duplicate symbol: curlx_strtoofft
#14 18.15 >>> defined in ../lib/curl-strtoofft.o
#14 18.15 >>> defined in ../lib/.libs/libcurl.a(libcurl_la-strtoofft.o)
#14 18.15
#14 18.15 wasm-ld: error: duplicate symbol: curlx_nonblock
#14 18.15 >>> defined in ../lib/curl-nonblock.o
#14 18.15 >>> defined in ../lib/.libs/libcurl.a(libcurl_la-nonblock.o)
...
duplicate symbol errors prevent the script to successfully end.
I think that's fine, at that point libcurl.a
is already created in the filesystem. This is why I put || true
in this line:
RUN source /root/emsdk/emsdk_env.sh && EMCC_SKIP="-lc -lz -lcurl" EMCC_FLAGS="-sSIDE_MODULE" emmake make || true
It would be useful to have a comment in place to document that behavior.
@adamziel Ok thank you. Could it be possible that curl
has no ssl
and libz
even if this portion of code added them [ or at least openssl
] :
RUN CPPFLAGS="-I/root/lib/include " \
LDFLAGS="-L/root/lib/lib " \
PKG_CONFIG_PATH=$PKG_CONFIG_PATH \
source /root/emsdk/emsdk_env.sh && \
emconfigure ./configure \
--build i386-pc-linux-gnu \
--target wasm32-unknown-emscripten \
--prefix=/root/install/ \
--disable-shared \
--enable-static \
--with-openssl \
--enable-https \
--enable-http
I suspect curl
to not properly load openssl
and zlib
as displayed using var_dump( curl_version() );
I currently don't have the tools to investigate this. But I should try :
- I suppose
make libcurl
will run the script fromcompile/Makefile
of course. - It will run
base-image
libz
andlibopenssl
scripts before all. - create a dist directory
dist/root/lib
inlibcurl
- run the
libcurl/Dockerfile
and return the different resulting directoriescurl-7.69.1/libs/.libs
->libcurl/dist/root/lib/lib
andcurl-7.69.1/include
->./libcurl/dist/root/lib/include
.
This assumes that the Dockerfile script runs correctly. We can consider having curl
with openssl
[ openssl
having zlib
].
- running
npm run recompile:php:node:8.0
should then addcurl
to php thanks to this :
# Add curl if needed
RUN if [ "$WITH_CURL" = "yes" ]; \
then \
echo -n ' --with-curl=/root/lib ' >> /root/.php-configure-flags; \
echo -n ' /root/lib/lib/libcurl.a' >> /root/.emcc-php-wasm-sources; \
fi;
And in fact if we display phpinfo()
, curl
exists. But something is missing between php-wasm phpinfo()
and php phpinfo()
:
php-wasm phpinfo()
:
curl
cURL Information => 7.69.1
Age => 5
IPv6 => No
libz => No
NTLM => No
SSL => No
TLS-SRP => No
HTTP2 => No
HTTPS_PROXY => No
Host => i386-pc-linux-gnu
curl.cainfo => no value => no value
php8.3 phpinfo()
curl
cURL Information => 8.6.0
Age => 10
IPv6 => Yes
libz => Yes
NTLM => Yes
SSL => Yes
TLS-SRP => Yes
HTTP2 => Yes
HTTPS_PROXY => Yes
ALTSVC => Yes
HTTP3 => No
UNICODE => No
ZSTD => No
HSTS => Yes
GSASL => No
Protocols => ftps, gophers, https, imaps, ldap, ldaps, mqtt,pop3s, smb, smbs, smtps
Host => Darwin
SSL Version => OpenSSL/3.1.4
ZLib Version => 1.3.1
curl.cainfo => .../config/php/cacert.pem
I only displayed the differences between the two curls. libz
, SSL
are part of the main differences.
But what next ?
How can I be sure the problem comes from compile/libcurl/Dockerfile
or maybe libcurl/dist/root/lib/lib/libcurl.a
or libcurl/dist/root/lib/include/Makefile
? Where should I investigate ?
I only displayed the differences between the two curls. libz, SSL are part of the main differences.
I forgot to push it yesterday. This commit adds zlib. https://github.com/WordPress/wordpress-playground/pull/1133/commits/0c84fd1b4089c8f15019fb37ddffed832d94c68e
For SSL, we need to do something similar and provide a path with the --with-openssl
flag. I'm trying this now.
Done, OpenSSL was missing the includes folder. Here is a path that will print curl info.
Requests are still timing out. As a next step, we could add some breakpoints to see if the request gets "stuck" somewhere.
I investigated a lot today. I couldn't find the answer yet but I came across a lot of informations. I first had to copy paste files into process to allow printing data from them : lib/multi.c
, lib/url.c
and lib/connect.c
:
libcurl/Dockerfile
:
COPY ./libcurl/multi.c /root/$CURL_VERSION/lib/multi.c
COPY ./libcurl/url.c /root/$CURL_VERSION/lib/url.c
COPY ./libcurl/connect.c /root/$CURL_VERSION/lib/connect.c
WORKDIR /root/$CURL_VERSION
I could then inject a lot of flags to follow the behavior of our test script. So here is what I understood so far :
It begins with this file :
multi.c
- function
multi_socket()
called - function
curl_multi_perform()
called - function
multi_runsingle()
called
- CASE CURLM_STATE_INIT entered then while
- CASE CURLM_STATE_CONNECT entered
function Curl_connect
in url.c
file is called within the CURLM_STATE_CONNECT
case
url.c
- function
Curl_connect()
called - function
Curl_setup_conn()
called
function Curl_connecthost
in connect.c
file is called in previous Curl_setup_conn()
function
connect.c
- function
Curl_connecthost()
called - function
singleipconnect()
returnsCURLE_OK
It then go back into the previous multi_runsingle
function mentionned on point 3 and go on the while loop indefinitely.
Entering endlessly in CURLM_STATE_WAITRESOLVE case
multi.c
- function
multi_runsingle
INFINITE Loop While
- CASE CURLM_STATE_WAITRESOLVE
The error is probably coming from the singleipconnect()
function.
Here are multiple results I printed :
line 1191
: result = bindlocal(conn, sockfd, addr.family, Curl_ipv6_scope((struct sockaddr*)&addr.sa_addr)); EQUALS 0
line 1212
: if(!isconnected && (conn->transport != TRNSPRT_UDP)) EQUALS TRUE
line 1251
: rc = connect(sockfd, &addr.sa_addr, addr.addrlen); EQUALS -1 = [ rc = connect( 4, 11319420, 16 ); ]
line 1273
: switch(error) -> return result = CURLE_OK
from line 1285
;
line 1303
: return result EQUALS CURLE_OK == 0
I suppose something is probably going wrong with the sockfd
parameter ?
data->set.fsockopt
on line 1184
is false. Should we try to add that fsockopt
?
Another thing :
php-wasm
Trying 172.29.1.0:80...
php
Trying 198.143.164.252:80...
I tried to add every option mentionned in the singleipconnect()
function in the test script :
curl_setopt( $ch, CURLOPT_TCP_NODELAY, 1 );
curl_setopt( $ch, CURLOPT_TCP_KEEPALIVE, 1 );
curl_setopt( $ch, CURLOPT_TCP_FASTOPEN, 1 );
* Could not set TCP_NODELAY: Protocol not available
* Failed to set SO_KEEPALIVE on fd 4
* Failed to enable TCP Fast Open on fd 4
P.S. : if you want to add a new flag, don't forget to add a \n
at the end of the infof( data, "message\n" )
. Otherwise nothing will be displayed and this will cause a lot of time wasted to find out why.
This is really good debugging @mho22!
The local IP is resolved likely due to this issue:
https://github.com/WordPress/wordpress-playground/issues/400
Regardless, file_get_contents( "https://wordpress.org" )
makes Emscripten start a WebSockets so Curl should also be able to do that.
sockfd
might be the right track – Playground applies a few patches on top of Emscripten to improve fd and sockopt handling.
Here's a few questions I'm thinking of:
- Is
sockfd
a valid descriptor, or is it-1
? - Does Emscripten create a new WebSocket instance? In other words, is this console.log statement triggered?
console.log('Called constructor()!');
- Is
___syscall_connect
called inphp_8_0.js
? An unminified PHP build might be helpful here – to get one you could runnemcc
with-g2
. - If it is, where does the execution stop? Is the
catch
ever trigerred? What's the error?
@adamziel Here are the answers :
-
sockfd
equals18
-
console.log('Called constructor()!')
is never triggered -
___syscall_connect
is called 2 times inphp_8_0.js
- The execution stops in the
catch
and this object is returned :
Here is a copy of the ___syscall_connect
where I added console.log
to link with the screenshot :
function ___syscall_connect(fd, addr, addrlen, d1, d2, d3) {
try {
var sock = getSocketFromFD(fd);
console.log( sock );
var info = getSocketAddress(addr, addrlen);
console.log( info );
sock.sock_ops.connect(sock, info.addr, info.port);
return 0;
} catch (e) {
console.log( e );
if (typeof FS == "undefined" || !(e.name === "ErrnoError")) throw e;
return -e.errno;
}
}
It seems the second time ___syscall_connect
is called, we get data from our curl_exec
.
I hope this is helpful. I'm uncertain about the next steps to take, so I'm looking forward to hearing your insights.
ERRNO 26 is "EINPROGRESS":
https://github.com/WordPress/wordpress-playground/blob/096a01782fc73f7d0aad3ffa8913aa0163fb03f6/packages/php-wasm/node/public/php_8_0.js#L7059
It's documented as follows:
EINPROGRESS The socket socket is non-blocking and the connection could not be established immediately. You can determine when the connection is completely established with select; see Waiting for Input or Output. Another connect call on the same socket, before the connection is completely established, will fail with EALREADY.
It would be interesting to step through the ___syscall_connect
execution and get to the point where it throws that exception. I also wonder, why at first the socket says 127.0.0.1:5400
, and only later it says wordpress.org:80
. Is it CURL reusing the same file descriptor? Or perhaps that's related to Asyncify stack rewinding? It would be interesting to console.log Asyncify.state
on each of these calls and compare it to all the entries from Asyncify.State
(notice the capital letter), e.g. Asyncify.State.Normal
.
@adamziel I am on it.
The method sock.sock_ops.connect(sock, info.addr, info.port);
throws the error 26 :
connect(sock, addr, port) {
if (sock.server) {
throw new FS.ErrnoError(138);
}
if (typeof sock.daddr != "undefined" && typeof sock.dport != "undefined") {
var dest = SOCKFS.websocket_sock_ops.getPeer(sock, sock.daddr, sock.dport);
if (dest) {
if (dest.socket.readyState === dest.socket.CONNECTING) {
throw new FS.ErrnoError(7);
} else {
throw new FS.ErrnoError(30);
}
}
}
var peer = SOCKFS.websocket_sock_ops.createPeer(sock, addr, port);
sock.daddr = peer.addr;
sock.dport = peer.port;
throw new FS.ErrnoError(26);
},
On last line.
And for Asyncify.state
, the two syscalls are of type Asyncify.State.Normal
Aha, so error code 26 seems like the correct outcome – there's no no-error way for that function to conclude anyway. Well, but in this case the SOCKFS.websocket_sock_ops.createPeer()
method should be called, and that's where the WebSocket is created. It seems like it fails in some way and never gets to start that WS connection – I wonder why is that.
@adamziel Do you have any suggestions for a comparison I could conduct to troubleshoot why our case is failing when running the ___syscall_connect
method?
The var peer = SOCKFS.websocket_sock_ops.createPeer(sock, addr, port);
returns a valid peer
and the data from the peer is given to the sock
, so after the catch
, the sock
should be operational.
the peer
object :
This probably indicates that the websocket is correctly created. However, communication is not established.
Oh! It's the FetchWebsocketConstructor
, it doesn't have a console.log() in its constructor – so it's actually created correctly! It seems like console.log('Send called with ', data);
never shows up in the console so libcurl never attempts to send any data.
The problem could be with how _wasm_poll_socket
implements waiting for the connection to be ready. Also, I wonder what readyStates does the WS instance communicate?
@adamziel Yes sorry, I thought you would see it with my screenshot that this was a FetchWebsocketConstructor
so no console.log
necessary to find out it probably created a socket.
It seems the _wasm_poll_socket
is never called. I added a simple console.log
in it and it never triggered.
I see wasp_poll_socket
is called in custom implemented php_pollfd_for
and wasm_select
functions in php-wasm.c
. Maybe curl_exec
does not call these functions ?
Oooh I think you're right! That function is wired by patching PHP, not by replacing the libc function. That could be the root cause of this issue! There are two ways forward here:
- Patch libcurl (and run into this issue again in the future)
- Replace
select
with_wasm_select
– I'm not sure how viable that is, though
@adamziel What do you mean by replacing select
with _wasm_select
? In fact I am not sure to fully understand the first way either 😅.