unison icon indicating copy to clipboard operation
unison copied to clipboard

2.52.1 intermittant crash with Unison server failed: Invalid argument: Bytearray.blit_to_bytes

Open infraweavers opened this issue 2 years ago • 10 comments

We run unison quite a lot to sync files around on our web tier, very rarely it will blow up with an error of Invalid argument: Bytearray.blit_to_bytes, retrying it is successful.

Version: unison version 2.52.1 (ocaml 4.14.0) Environment: Server 2019, running a windows to windows sync over a LAN so no WAN funniness going on.

Server side:

"C:\Unison-2.52\bin\unison.exe" "-socket" "874"

Client Side:

"C:\Unison-2.52\bin\unison.exe" "-force" "C:\folder\subfolder" "-batch" "-times" "-silent" "-halfduplex" "-killserver" "-ignorearchives" "C:\folder\subfolder" "socket://server2.internaldomain.name:874/C:\folder\subfolder" "-ignore" "Name NHiberateLogs" "-ignore" "Name media" "-ignore" "Name logs" "-ignore" "Name cap_car" "-ignore" "Name CapCsv" "-ignore" "Name SiteLinks";

The error we see is:

Unison server failed: Invalid argument: Bytearray.blit_to_bytes
Raised at Lwt.ignore_result in file "./lwt/lwt.ml", line 135, characters 6-13
Called from Stdlib__List.iter in file "list.ml", line 110, characters 12-15
Called from Lwt.restart in file "./lwt/lwt.ml", line 31, characters 2-37
Called from Stdlib__List.iter in file "list.ml", line 110, characters 12-15
Called from Lwt_unix_impl.run in file "./lwt/win/lwt_unix_impl.ml", line 243, characters 6-51
Called from Remote.waitOnPort.(fun).handleClients in file "./remote.ml", line 2138, characters 11-69
Called from Remote.waitOnPort.(fun) in file "./remote.ml", line 2146, characters 9-25
Called from Util.convertUnixErrorsToExn in file "./ubase/util.ml", line 180, characters 6-9
Called from Main.catch_all in file "./main.ml", line 161, characters 6-10

I did notice this: https://github.com/bcpierce00/unison/issues/267 which looks similiar, however that was on an old version; this (as far as I can tell) is under the latest version

infraweavers avatar Jul 15 '22 10:07 infraweavers

That looks like the exact same backtrace except for a frame in the middle, which probably has been simplified/fixed and isn't relevant.

gdt avatar Jul 15 '22 11:07 gdt

I've labeled this as Windows, because both reports (is the previous report from the same setup - port number matches?) are Windows and we have no non-Windows reports.

gdt avatar Jul 15 '22 11:07 gdt

I put in a random number for the port as it's injected into the script, so I copied the port number from the other issue because it "looked right" as it were.

infraweavers avatar Jul 15 '22 11:07 infraweavers

I've prepared a debug build (without the GUI) for you. Can you grab it and give it a go? https://github.com/tleedjarv/unison/actions/runs/2676841186 (the downloads will appear once all builds have finished) It is otherwise exactly the same as 2.52.1. Let me know if you need the GUI.

Once you hit the issue again, it should print out some debug info. Please post it here.

Meanwhile, could you give more information on your setup? What is the number of files kept in sync (it suffices to know if it's 10 000, 100 000 or 1 000 000) and how many files are updated during each sync? In your experience, is this issue related to how long the server has been running? Could it happen on the first sync after the server is started?

tleedjarv avatar Jul 15 '22 12:07 tleedjarv

It is otherwise exactly the same as 2.52.1. Let me know if you need the GUI.

Thanks, I can roll that out and get it running, is there any advantage to a specific ocaml version or should I just go for 4.14?

Once you hit the issue again, it should print out some debug info. Please post it here.

Will do

What is the number of files kept in sync (it suffices to know if it's 10 000, 100 000 or 1 000 000) and how many files are updated during each sync?

So in the case that blew up earlier it was approx 110 files being sync'd, it's expected that only about 10 change normally with each sync.

In your experience, is this issue related to how long the server has been running?

I don't believe so, the server is freshly started each time we wish to use it, (it's not a long running daemon) and stopped afterwards.

Could it happen on the first sync after the server is started?

See above

infraweavers avatar Jul 15 '22 13:07 infraweavers

is there any advantage to a specific ocaml version or should I just go for 4.14?

Just go for 4.14.

tleedjarv avatar Jul 15 '22 13:07 tleedjarv

Where are we on this? I'd like to get it turned into a high-quality bug report or closed. There's been a debug build on the table since July 15 with no response.

gdt avatar Sep 01 '22 13:09 gdt

Hiya,

It's deployed and being used; however the problem hasn't appeared yet! The moment it does, I'll get the data out for you

Thanks

infraweavers avatar Sep 01 '22 15:09 infraweavers

Thanks for confirming you are running the test code and are still out there. I suppose it's possible that there is a bug and it was fixed between 2.52.1 and the test build, too.

gdt avatar Sep 01 '22 17:09 gdt

It's a month later - are you free of crashes and we can declare victory? Or is it crashy? Or something else?

gdt avatar Oct 09 '22 00:10 gdt

I'm guessing this is fixed. Feel free to reopen or start over if you get crashes with 2.53.0 or later.

gdt avatar Nov 15 '22 14:11 gdt