telescope icon indicating copy to clipboard operation
telescope copied to clipboard

Error loading - buffer event error

Open valadect opened this issue 1 year ago • 17 comments

Steps to reproduce.

  1. Navigate to gemini://gnubox.org/
  2. Throws error

More Info

This happens in both the latest release as well as when building from source.

The same capsule works fine on bombadillo and lagrange.

valadect avatar Jan 18 '24 07:01 valadect

Hello @valadect , thanks for the report. I can reproduce (with any capsule actually) on OpenBSD using libevent2 from packages. I've been bitten again by some differences in libevent 1.x (which is in base on OpenBSD) and libevent2.

omar-polo avatar Jan 18 '24 09:01 omar-polo

Yeah, this is due to me wanting to use libtls and reaching into libevent' bufferevent abstraction. It'll take me a few days to fix this unfortunately, it's not straightforward.

omar-polo avatar Jan 20 '24 15:01 omar-polo

Best of luck! At least on my end it's only the odd capsule here and there so take all the time you need.

valadect avatar Jan 20 '24 21:01 valadect

Actually my testing was busted. I was in a hurry and haven't noticed that I was mixing libevent 1.x from base and libevent2.x from ports.

Now that I have more time I tried again and can't reproduce. I tested on alpine linux using libevent 2.1.12 and libretls 3.7.0. (this was a few hours ago)

Right now the capsule seems down, but I get the same error in telescope, gg(1) and lagrange:

% gg gemini://gnubox.org
gg: handshake: handshake failed: error:02FFF036:system library:func(4095):Connection reset by peer

maybe they're doing maintenance right now.

Can you please tell me a bit more about your system? (OS, version) I'd like to replicate and understand this issue.

Thanks!

omar-polo avatar Jan 21 '24 17:01 omar-polo

Yeah seems to be down at the moment. I've been trying to find another capsule with the same issue but no luck so far.

I've run both latest git as well as latest tagged release and got the same results. I'm running telescope on Fedora Asahi Remix (Fedora 39) aarch64.

All other capsules I've come across work fine, just strange that only telescope was the one that couldn't load it previously.

valadect avatar Jan 22 '24 06:01 valadect

Maybe I have finally a clue.

After another user report, I looked closely and it seems that using openssl (on linux) I sometimes get the bufferevent error (now "read error") on some capsules due to the missing close_notify. I couldn't reproduce with gnubox.org, but gemini://gmi.noulin.net/ quite often results in the error here.

Can you reproduce it too? If not, could you please checkout the debug-tls branch, run make && ./telescope 2>>log and post the contents of the file log after reproducing the issue?

omar-polo avatar Feb 12 '24 22:02 omar-polo

Can confirm I'm getting a buffer event error on that capsule for 0.8.1.

In debug-tls it's a read error.

failure(s) in tls_read:
- error:0A000126:SSL routines::unexpected eof while reading
failure(s) in tls_close:
- error:0A000197:SSL routines::shutdown while in init

Thanks for sticking with this!

valadect avatar Feb 15 '24 11:02 valadect

  • error:0A000126:SSL routines::unexpected eof while reading

that's the missing close_notify.

unfortunately I can't do anything about it, it depends on the TLS library used by libtls. It seems that LibreSSL is a bit more permissive and reports the failure later, in fact on OpenBSD I still manage to read these capsules, while newer OpenSSL (I guess 3.x+, still have to double check) is more strict.

From the point of view of a TLS library a missing close_notify matters a lot, since it means that the connection could have been abruptly interrupted.

gmniserv is one of the misbehaving servers, and it's also unmaintained :(

I'll try at least to improve the logging in these cases, so it's easier to understand what's going wrong.

omar-polo avatar Feb 15 '24 15:02 omar-polo

Bummer thanks for checking it out, feel free to close.

valadect avatar Feb 19 '24 11:02 valadect

I have similar problem on macos (telescope 0.9, libretls):

$ sudo dtruss telescope gemini://gemini.omarpolo.com
...
recvmsg(0x4, 0x7FF7B6AB75B0, 0x0)		 = 0 0
write_nocancel(0x2, "telescope: \0", 0xB)		 = 11 0
write_nocancel(0x2, "connection closed\0", 0x11)		 = 17 0
write_nocancel(0x2, ": \0", 0x2)		 = 2 0
write_nocancel(0x2, "No such file or directory\n\0", 0x1A)		 = 26 0
...

telescope 0.8.1 has no such problem.

sikmir avatar Feb 23 '24 23:02 sikmir

@sikmir this seems like a different issue. That message is issued either by the main process or by the network process when the other party dies. I suspect in this case it's the network process dying. Do you have some core file lying around after it crashes? Maybe you can attach a debugger to the network process and see why it's dying?

It could also be interesting to try to bisect this. I have a suspect it may be related to the new event loop, so knowing if telescope as of b19b8dbca985e2f567bb3f476b116ea18c1ca9a2 works (it's the parent of 98d3e6c172747dc58042bde09a848d3e03572934 where the new event loop was used) could be interesting.

Thanks! :)

omar-polo avatar Feb 24 '24 00:02 omar-polo

Do you have some core file lying around after it crashes?

No.

It could also be interesting to try to bisect this. I have a suspect it may be related to the new event loop, so knowing if telescope as of b19b8db works (it's the parent of 98d3e6c where the new event loop was used) could be interesting.

No, the same problem with b19b8dbca985e2f567bb3f476b116ea18c1ca9a2.

I guess something wrong with dependencies, since telescope 0.9 works fine on macOS if built with nix (https://github.com/NixOS/nixpkgs/pull/290955), but don't if built with macports (https://github.com/sikmir/macports-ports/blob/telescope/net/telescope/Portfile).

sikmir avatar Feb 27 '24 14:02 sikmir

@sikmir Oh, I see. It's strange. I don't have a mac so I can't test unfortunately, but if you have some time, a useful thing would be to start telescope one one of the built-in pages (so for e.g. echo about:new > ~/.cache/telescope/session), launch telescope and then attach a debugger to the net process, then attempt to open a page. If my intuition is right, it's the network process dying and you should have a backtrace.

don't know if macos has a working setproctitle, on OpenBSD at least pgrep -lf telescope shows two entries: telescope:net and telescope:ui. Otherwise, if mac doesn't have random pids, the greater one will be the one for the net process.

If it's not the network process crashing somehow, then it must be the main one (the ui). In that case, running gdb telescope (or lldb) and then running telescope should similarly give you a backtrace.

Thanks :)

P.S.: I don't grok nix, but it seems that the derivation still requires libevent, which is no longer a dependency as of 0.9. Regarding the Portfile instead, why are you removing libgrapheme? if found, the bundled version is not used at all, unless there's a bug. I actually could add a --with-libgrapheme flag to assert that it must use the system version and not the bundled one.

omar-polo avatar Feb 27 '24 14:02 omar-polo

it seems that the derivation still requires libevent, which is no longer a dependency as of 0.9.

Good point, I've missed it.

Regarding the Portfile instead, why are you removing libgrapheme? if found, the bundled version is not used at all, unless there's a bug.

Yes, it's just to be sure.

sikmir avatar Feb 27 '24 15:02 sikmir

On 2024/02/27 07:56:56 -0800, Nikolay Korotkiy @.***> wrote:

Regarding the Portfile instead, why are you removing libgrapheme? if found, the bundled version is not used at all, unless there's a bug.

Yes, it's just to be sure.

Ah, good, I thought it was still built!

Thanks,

Omar Polo

omar-polo avatar Feb 27 '24 17:02 omar-polo

otool -L work/telescope-0.9/telescope
work/telescope-0.9/telescope:
	/opt/local/libexec/openssl3/lib/libssl.3.dylib (compatibility version 3.0.0, current version 3.0.0)
	/opt/local/libexec/openssl3/lib/libcrypto.3.dylib (compatibility version 3.0.0, current version 3.0.0)
	/opt/local/lib/libtls.24.dylib (compatibility version 25.0.0, current version 25.1.0)
	/opt/local/lib/libncurses.6.dylib (compatibility version 6.0.0, current version 6.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)

I guess that's the root cause, telescope requires libressl, but libressl and openssl can't be co-installed.

sikmir avatar Mar 02 '24 22:03 sikmir

@sikmir oh yeah, you can't mix LibreSSL and OpenSSL in the same address space. (there are some tricks, but don't know them and won't recommend either.)

I made telescope link to libssl and libcrypto for the client certificate generation feature, and it's not possible to disable it yet.

I think the best solution would be to pick just one of the two TLS libraries (etiher Libre or OpenSSL) and just stick with that. The namings are unfortunately too close for my taste, but the choices are:

  • linking with LibreSSL (which provides libtls too)
  • using libretls and OpenSSL

As far as I can see on macports there are both LibreSSL and libretls+OpenSSL packaged, so either should be viable. I'm biased towards LibreSSL, but both works (and in general is better to choose the one more 'popular' for the target system.)

Thanks!

omar-polo avatar Mar 02 '24 22:03 omar-polo

This issues has been fixed by @ThomasAdam in a3e4d56b6d9bcfca48f3d8c8f1e526e95b0c2f64 (https://codeberg.org/op/telescope/pulls/3), thank you!

Now telescope would keep rendering what it received and shows a W character in the modeline.

omar-polo avatar May 25 '24 07:05 omar-polo