nix icon indicating copy to clipboard operation
nix copied to clipboard

nix copy hangs forever sometimes

Open elaforge opened this issue 4 years ago • 13 comments

Here is a process that has been hung for 6 days:

elaforge 2909 0.0 0.1 514780 17684 ? Sl Jul25 0:00 /nix/store/zg66y04g2bvmw41cgrywysr86s40g5cc-nix-2.2/bin/nix copy --from https://cache.nixos.org /nix/store/jxw2sxagx9smpjklb00qzgiqgqv1zvl6-Groq.Util.Exceptions --option allowed-impure-host-deps /groq /etc/hostname --option extra-sandbox-paths /groq/models --option substituters https://cache.nixos.org http://narpile.groq --option sandbox relaxed --option require-sigs false --option fallback true --option keep-outputs true --option max-jobs 0 --option builders-use-substitutes true --option cores 0

Note that the store path doesn't exist on cache.nixos.org, and when I run by hand, I get a "path not valid" right away. But this one hung for some reason.

eu-stack -p 2909 -s says:

PID 2909 - process
TID 2909:
#0  0x00007ff0bd80f2dd __GI___pthread_timedjoin_ex
#1  0x00007ff0bdc95fd3 std::thread::join()
#2  0x00007ff0bdf0d34b std::_Sp_counted_ptr_inplace<nix::CurlDownloader, std::allocator<nix::CurlDownloader>, (__gnu_cxx::_Lock_policy)2>::_M_dispose()
#3  0x0000000000438da6 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
#4  0x00007ff0bd686351 __run_exit_handlers
#5  0x00007ff0bd68643a exit
#6  0x00007ff0bd670b95 __libc_start_main
#7  0x00000000004323ea _start
TID 2982:
#0  0x00007ff0bd6848dc __sigtimedwait
#1  0x00007ff0bd818434 sigwait
#2  0x00007ff0bddd9955 nix::signalHandlerThread(__sigset_t)
#3  0x00007ff0bdddfb1d std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(__sigset_t), __sigset_t> > >::_M_run()
#4  0x00007ff0bdc95d7f
#5  0x00007ff0bd80def7 start_thread
#6  0x00007ff0bd74122f __clone
TID 2983:
#0  0x00007ff0bd813ee2 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00007ff0be074147 GC_wait_marker
#2  0x00007ff0be07460a GC_help_marker
#3  0x00007ff0be0746ef GC_mark_thread
#4  0x00007ff0bd80def7 start_thread
#5  0x00007ff0bd74122f __clone
TID 3009:
#0  0x00007ff0bd8171bc __lll_lock_wait
#1  0x00007ff0bd8104b5 __pthread_mutex_lock
#2  0x00007ff0bdf603eb std::unique_lock<std::mutex>::lock() [clone .constprop.405]
#3  0x00007ff0bdf61b15 nix::Store::computeFSClosure(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool, bool, bool)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(std::future<nix::ref<nix::ValidPathInfo> >)#1}::operator()(nix::ref<nix::ValidPathInfo>) const
#4  0x00007ff0bdf61d2f std::_Function_handler<void (std::future<nix::ref<nix::ValidPathInfo> >), nix::Store::computeFSClosure(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool, bool, bool)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(std::future<nix::ref<nix::ValidPathInfo> >)#1}>::_M_invoke(std::_Any_data const&, std::future<nix::ref<nix::ValidPathInfo> >&&)
#5  0x00007ff0bdfb5e15 nix::Callback<nix::ref<nix::ValidPathInfo> >::rethrow(std::__exception_ptr::exception_ptr const&) const
#6  0x00007ff0bdfb204d std::_Function_handler<void (std::future<std::shared_ptr<nix::ValidPathInfo> >), nix::Store::queryPathInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Callback<nix::ref<nix::ValidPathInfo> >)::{lambda(std::future<std::shared_ptr<nix::ValidPathInfo> >)#1}>::_M_invoke(std::_Any_data const&, std::future<std::shared_ptr<nix::ValidPathInfo> >&&)
#7  0x00007ff0bde99d85 nix::Callback<std::shared_ptr<nix::ValidPathInfo> >::rethrow(std::__exception_ptr::exception_ptr const&) const
#8  0x00007ff0bde902b9 std::_Function_handler<void (std::future<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >), nix::BinaryCacheStore::queryPathInfoUncached(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Callback<std::shared_ptr<nix::ValidPathInfo> >)::{lambda(std::future<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)#1}>::_M_invoke(std::_Any_data const&, std::future<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&&)
#9  0x00007ff0bde98455 nix::Callback<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::rethrow(std::__exception_ptr::exception_ptr const&) const
#10 0x00007ff0bdf38dd2 std::_Function_handler<void (std::future<nix::DownloadResult>), nix::HttpBinaryCacheStore::getFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Callback<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)::{lambda(std::future<nix::DownloadResult>)#1}>::_M_invoke(std::_Any_data const&, std::future<nix::DownloadResult>&&)
#11 0x00007ff0bdf0ecf5 nix::Callback<nix::DownloadResult>::rethrow(std::__exception_ptr::exception_ptr const&) const
#12 0x00007ff0bdf0ef35 void nix::CurlDownloader::DownloadItem::fail<nix::DownloadError>(nix::DownloadError const&)
#13 0x00007ff0bdf12f5e nix::CurlDownloader::DownloadItem::~DownloadItem()
#14 0x0000000000438da6 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
#15 0x00007ff0bdf16596 nix::CurlDownloader::workerThreadEntry()
#16 0x00007ff0bdc95d7f
#17 0x00007ff0bd80def7 start_thread
#18 0x00007ff0bd74122f __clone

It's not just for invalid paths, here's another one:

elaforge 29718 0.0 0.1 514780 17892 ? Sl Jul23 0:00 /nix/store/zg66y04g2bvmw41cgrywysr86s40g5cc-nix-2.2/bin/nix copy --from http://narpile.groq /nix/store/5naa5ppxqz233bhwms2hkh8ifvn3f1z2-silently-1.2.5-doc --option allowed-impure-host-deps /groq /etc/hostname --option extra-sandbox-paths /groq/models --option substituters http://narpile.groq --option sandbox relaxed --option require-sigs false --option fallback true --option keep-outputs true --option max-jobs 0 --option builders-use-substitutes true --option cores 0

This path exists and when I run by hand it completes immediately (I have it locally). The stack looks similar:

PID 29718 - process
TID 29718:
#0  0x00007f9ac76d72dd __GI___pthread_timedjoin_ex
#1  0x00007f9ac7b5dfd3 std::thread::join()
#2  0x00007f9ac7dd534b std::_Sp_counted_ptr_inplace<nix::CurlDownloader, std::allocator<nix::CurlDownloader>, (__gnu_cxx::_Lock_policy)2>::_M_dispose()
#3  0x0000000000438da6 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
#4  0x00007f9ac754e351 __run_exit_handlers
#5  0x00007f9ac754e43a exit
#6  0x00007f9ac7538b95 __libc_start_main
#7  0x00000000004323ea _start
TID 29734:
#0  0x00007f9ac754c8dc __sigtimedwait
#1  0x00007f9ac76e0434 sigwait
#2  0x00007f9ac7ca1955 nix::signalHandlerThread(__sigset_t)
#3  0x00007f9ac7ca7b1d std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(__sigset_t), __sigset_t> > >::_M_run()
#4  0x00007f9ac7b5dd7f
#5  0x00007f9ac76d5ef7 start_thread
#6  0x00007f9ac760922f __clone
TID 29740:
#0  0x00007f9ac76dbee2 pthread_cond_wait@@GLIBC_2.3.2
#1  0x00007f9ac7f3c147 GC_wait_marker
#2  0x00007f9ac7f3c60a GC_help_marker
#3  0x00007f9ac7f3c6ef GC_mark_thread
#4  0x00007f9ac76d5ef7 start_thread
#5  0x00007f9ac760922f __clone
TID 29747:
#0  0x00007f9ac76df1bc __lll_lock_wait
#1  0x00007f9ac76d84b5 __pthread_mutex_lock
#2  0x00007f9ac7e283eb std::unique_lock<std::mutex>::lock() [clone .constprop.405]
#3  0x00007f9ac7e29b15 nix::Store::computeFSClosure(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool, bool, bool)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(std::future<nix::ref<nix::ValidPathInfo> >)#1}::operator()(nix::ref<nix::ValidPathInfo>) const
#4  0x00007f9ac7e29d2f std::_Function_handler<void (std::future<nix::ref<nix::ValidPathInfo> >), nix::Store::computeFSClosure(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool, bool, bool)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(std::future<nix::ref<nix::ValidPathInfo> >)#1}>::_M_invoke(std::_Any_data const&, std::future<nix::ref<nix::ValidPathInfo> >&&)
#5  0x00007f9ac7e7de15 nix::Callback<nix::ref<nix::ValidPathInfo> >::rethrow(std::__exception_ptr::exception_ptr const&) const
#6  0x00007f9ac7e7a04d std::_Function_handler<void (std::future<std::shared_ptr<nix::ValidPathInfo> >), nix::Store::queryPathInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Callback<nix::ref<nix::ValidPathInfo> >)::{lambda(std::future<std::shared_ptr<nix::ValidPathInfo> >)#1}>::_M_invoke(std::_Any_data const&, std::future<std::shared_ptr<nix::ValidPathInfo> >&&)
#7  0x00007f9ac7d61d85 nix::Callback<std::shared_ptr<nix::ValidPathInfo> >::rethrow(std::__exception_ptr::exception_ptr const&) const
#8  0x00007f9ac7d582b9 std::_Function_handler<void (std::future<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >), nix::BinaryCacheStore::queryPathInfoUncached(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Callback<std::shared_ptr<nix::ValidPathInfo> >)::{lambda(std::future<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)#1}>::_M_invoke(std::_Any_data const&, std::future<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&&)
#9  0x00007f9ac7d60455 nix::Callback<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::rethrow(std::__exception_ptr::exception_ptr const&) const
#10 0x00007f9ac7e00dd2 std::_Function_handler<void (std::future<nix::DownloadResult>), nix::HttpBinaryCacheStore::getFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Callback<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)::{lambda(std::future<nix::DownloadResult>)#1}>::_M_invoke(std::_Any_data const&, std::future<nix::DownloadResult>&&)
#11 0x00007f9ac7dd6cf5 nix::Callback<nix::DownloadResult>::rethrow(std::__exception_ptr::exception_ptr const&) const
#12 0x00007f9ac7dd6f35 void nix::CurlDownloader::DownloadItem::fail<nix::DownloadError>(nix::DownloadError const&)
#13 0x00007f9ac7ddaf5e nix::CurlDownloader::DownloadItem::~DownloadItem()
#14 0x0000000000438da6 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
#15 0x00007f9ac7dde596 nix::CurlDownloader::workerThreadEntry()
#16 0x00007f9ac7b5dd7f
#17 0x00007f9ac76d5ef7 start_thread
#18 0x00007f9ac760922f __clone

This is nixStable from nixpkgs 19.03 at 5c52b25283a6cccca443ffb7a358de6fe14b4a81. The OS is the GCP image: ubuntu-1604-xenial-v20180306

elaforge avatar Jul 31 '19 19:07 elaforge

I marked this as stale due to inactivity. → More info

stale[bot] avatar Feb 18 '21 07:02 stale[bot]

Still relevant

balsoft avatar Jul 11 '21 20:07 balsoft

I marked this as stale due to inactivity. → More info

stale[bot] avatar Jan 08 '22 23:01 stale[bot]

I have run into this issue multiple times and seemingly at random in my own CI. Here is an example where it hung for over 25 minutes until I manually cancelled it, strangely when I did hit the cancel button the job exited successfully, which was a surprise. https://github.com/nrdxp/nrdos/actions/runs/3357138657/jobs/5562747027

I did another run right after to try and get more information by slapping a --debug on the nix copy command. Alas it did not hang on that run: https://github.com/nrdxp/nrdos/actions/runs/3357218857/jobs/5562877628#step:3:4776

The line I link to here is usually where it hangs though, which seems to be after an ssh connection is already made.

My suspicion, since the hung job mentioned above actually exited successful and even shows successful uploads in the logs (that portion of the log only showed up after I cancelled it) is that there is some sort of a dead lock in the parallel upload code. Just a guess though.

nrdxp avatar Oct 30 '22 22:10 nrdxp

I just noticed that this seems to happen frequently when a large amount of derivations are going to be copied (either because you passed them, or because the ones you passed have a lot of dependencies to copy). I also noticed that if you just wait for a long time that the copying actually does start.

So now there are two possible explanations at this point: Does nix copy compress all the files beforehand? That would explain why there is such a massive stall, but it doesn't seem to line up with CPU usage, which is high even during the file transfer, which I originally assumed was because it compresses on the fly.

Only other explanation I can think of is that nix copy is just making a ton of roundtrips when trying to determine the dependency closure. Based on the behavior I observed this seems to be the most likely answer.

Update It is definitely the number of queries being done, the stall was happening in a non-interactive environment so there was no output, but if I run the same command locally then I see Nix is making a ton of queries to the cache, causing it to stall for quite some time.

nrdxp avatar Dec 12 '22 23:12 nrdxp

NIX_SSHOPTS"-oControlMaster=no" fixes this issue as far as I can tell

kittywitch avatar Jul 21 '23 17:07 kittywitch

But even then, ControlPath should probably be set to path which doesn't exist, since the Control socket will still be used regardless

kittywitch avatar Jul 21 '23 17:07 kittywitch

@kittywitch very interesting, can you say how you discovered this workaround? Is nix intentionally using ControlMaster for latency reasons, or accidentally picking it up from the ambient ssh config? Also, wouldn't this only apply to ssh:// transport? I saw these timeouts from https://, so are you sure it's the same issue?

Just as an update, we still see this, but rarely.

elaforge avatar Jul 26 '23 18:07 elaforge

NIX_SSHOPTS"-oControlMaster=no" fixes this issue as far as I can tell

For me, closing my existing ssh session to the target system worked, but I'll try this next time. Almost certainly the same root cause.

In my case, I was running colmena apply and it successfully updated one system but got stuck at Pushing system closure on the others, while running nix-copy-closure.

I tried running it manually with -vvvvv:

$ nix-copy-closure --to --include-outputs --gzip -vvvvv root@m2 /nix/store/ch3vl30026v18di49035r73xyj2dd8xf-nixos-system-m2-23.05pre-git 
OpenSSH_9.3p2, OpenSSL 3.0.9 30 May 2023
debug1: Reading configuration data /home/ivan/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 5: Applying options for *
debug1: auto-mux: Trying existing master
debug1: mux_client_request_session: master session id: 3
^Ckilling process 960563
error: interrupted by the user

My stack trace looks a bit different though (possibly because this is nix-copy-closure rather than nix copy?)

$ eu-stack -p 998166
PID 998166 - process
TID 998166:
#0  0x00007fcc9597e74c read
#1  0x00007fcc95ec011f nix::readLine[abi:cxx11](int)
#2  0x00007fcc960ce121 nix::SSHMaster::startCommand(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#3  0x00007fcc960723db nix::LegacySSHStore::openConnection()
#4  0x00007fcc96075210 std::_Function_handler<nix::ref<nix::LegacySSHStore::Connection> (), nix::LegacySSHStore::ref(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&)
#5  0x00007fcc960796c6 nix::Pool<nix::LegacySSHStore::Connection>::get()
#6  0x00007fcc960799d5 virtual thunk to nix::LegacySSHStore::queryValidPaths(std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> > const&, nix::SubstituteFlag)
#7  0x00007fcc960e20ee nix::copyPaths(nix::Store&, nix::Store&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> > const&, nix::RepairFlag, nix::CheckSigsFlag, nix::SubstituteFlag)
#8  0x00007fcc960e33d7 nix::copyPaths(nix::Store&, nix::Store&, std::set<nix::RealisedPath, std::less<nix::RealisedPath>, std::allocator<nix::RealisedPath> > const&, nix::RepairFlag, nix::CheckSigsFlag, nix::SubstituteFlag)
#9  0x00007fcc960e3d64 nix::copyClosure(nix::Store&, nix::Store&, std::set<nix::RealisedPath, std::less<nix::RealisedPath>, std::allocator<nix::RealisedPath> > const&, nix::RepairFlag, nix::CheckSigsFlag, nix::SubstituteFlag)
#10 0x000055b2a40caf1e main_nix_copy_closure(int, char**) [clone .lto_priv.0]
#11 0x000055b2a413a53e nix::mainWrapped(int, char**)
#12 0x00007fcc96259fa1 nix::handleExceptions(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>)
#13 0x000055b2a40993d7 main
#14 0x00007fcc958abace __libc_start_call_main
#15 0x00007fcc958abb89 __libc_start_main@@GLIBC_2.34
#16 0x000055b2a409f6c5 _start
TID 998168:
#0  0x00007fcc958c19ea __sigtimedwait
#1  0x00007fcc958c10bc sigwait
#2  0x00007fcc95ec7105 nix::signalHandlerThread(__sigset_t)
#3  0x00007fcc95ec38cc std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(__sigset_t), __sigset_t> > >::_M_run()
#4  0x00007fcc95c515c3 execute_native_thread_routine
#5  0x00007fcc9590ddd4 start_thread
#6  0x00007fcc9598f9b0 __clone3

aij avatar Aug 11 '23 20:08 aij

FWIW switching to compression=zstd fixed the issue for me. Using the default compression method consistently hanged nix copy for certain larger outputs (tested up to 10min before I interrupted). The same output compressed and uploaded in 5sec after switching to zstd.

simonzkl avatar Sep 15 '23 12:09 simonzkl

FWIW switching to compression=zstd fixed the issue for me.

@simonzkl Where did you set compression=zstd? I'd like to give this a try, but I see no compression option in NixOS Options search page, or in deploy-rs docs.

boxofrox avatar Jan 04 '24 16:01 boxofrox

FWIW switching to compression=zstd fixed the issue for me.

@simonzkl Where did you set compression=zstd? I'd like to give this a try, but I see no compression option in NixOS Options search page, or in deploy-rs docs.

Where it accepts the Nix Store URL, e.g. in nix copy --to <url>.

https://nixos.org/manual/nix/unstable/command-ref/new-cli/nix3-help-stores#store-HTTP-Binary-Cache-Store-compression

simonzkl avatar Jan 04 '24 17:01 simonzkl

I am trying to copy closure of one of my systems to another and it is freezing all the time.

My command:

nix copy --verbose  --to "ssh://<MY-MACHINE>?compress=true" .#nixosConfigurations.<MY-MACHINE>.config.system.build.toplevel

imincik avatar Feb 20 '24 22:02 imincik

As I also have this problem quite often, I've noticed that for me it was always #5304, so that there are file locks that weren't cleaned up by previous nix commands (I suspect, maybe because they were canceled?).

To workaround this(?) issue, I wrote a simple script to obtain file locks (not sure if it's absolutely safe though, to do it like that):

#!/bin/sh

for file in $(ls /nix/store/); do
  test -f /nix/store/${file}.lock && echo "/nix/store/$file.lock"
done

which lists all the stale file locks and then I remove those locks via

sudo mount -o remount,rw,bind /nix/store
sudo rm #<list of these locks>

Not optimal obviously...

Edit: As I'm just facing it again, it seems there are also locks for files/folders that don't exist (so they aren't catched by that script unfortunately). deleting them still works around the issue, but it's a little bit cumbersome...

Edit2: Maybe to improve that workaround script, check the file size to be == 0 of *.lock files (I still doubt that it's safe though...)

Philipp-M avatar Mar 13 '24 18:03 Philipp-M