ZFS-for-SystemRescueCD
ZFS-for-SystemRescueCD copied to clipboard
"zpool upgrade -v" aborting with "illegal hardware instruction" error
Hello KOT,
First of all, thanks for making ZoL available as SysRescueCD SRMs. Once I get it working, it will be invaluable here.
Second, I'm having the following issue:
root@sysresccd /root % depmod
root@sysresccd /root % modprobe zfs
root@sysresccd /root % zpool upgrade -v
This system supports ZFS pool feature flags.
The following features are supported:
FEAT DESCRIPTION
-------------------------------------------------------------
zsh: illegal hardware instruction zpool upgrade -v
And dmesg shows the following:
[ 99.585830] traps: zpool[2365] trap invalid opcode ip:7f144eee46c1 sp:7fff408dee60 error:0 in libc.so.6[7f144ee9b000+1af000]
This is running on a VirtualBox VM booted from the standard systemrescuecd-x86-4.3.0.iso, to which I added your SRMs (current (git "latest commit c0dfcc6423") following the procedure described at http://www.sysresccd.org/Modules section "Adding SRM modules to the ISO Image from Linux").
Thanks in advance for your help in solving this.
Cheers, Durval.
Hi @DurvalMenezes ,
that sounds like your hardware doesn't support all the instruction sets the kernel modules were built with .
Something similar also occured on the gentoo forums: http://forums.gentoo.org/viewtopic-p-6931744.html#6931744
Could you please post the output of
cat /proc/cpuinfo
Do you, by chance, try to run this with an AMD cpu ?
if it's a multi-core CPU only the output for the first CPU or core is needed
those were compiled with march=nocona
Sorry for causing so much confusion with the v6 (march=core2) & v8 (march=nocona, less needed instructions).
Last time I tried to delete the v6 (earlier) state of the kernel modules and to eliminate confusion - ultimately that only the latest version was there - github wouldn't let me do it :-1:
But you anyway ran the latest modules =)
Let's see what we can do to get it running :+1:
Hi @kernelOfTruth,
that sounds like your hardware doesn't support all the instruction sets the kernel modules were built with .
Perhaps, but I will be very surprised if that's indeed the case. Processor is a Core i7 Sandybridge, very new and supports almost every Intel feature.
cat /proc/cpuinfo
Here's it, from inside the VM booted from the sysresccd 4.3.0 iso:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
stepping : 7
cpu MHz : 2367.305
cache size : 6144 KB
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm constant_tsc rep_good nopl pni monitor ssse3 lahf_lm
bogomips : 4734.61
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
Do you, by chance, try to run this with an AMD cpu ?
Nope, just on the one above.
those were compiled with march=nocona
Humrmrmr... according to "man gcc"
nocona
Improved version of Intel Pentium4 CPU with 64-bit extensions, MMX, SSE, SSE2 and SSE3 instruction set support.
I do think every one of these shows on "flags"above...
Let's see what we can do to get it running :+1:
Thanks very much for the help!
Cheers, Durval.
Hi @DurvalMenezes ,
do you, by chance,
use sha256 checksums ?
If yes, this could have been caused by https://github.com/zfsonlinux/zfs/pull/2351
in that case I'll leave it out, probably the same with https://github.com/zfsonlinux/zfs/pull/2672 which might not work so well with https://github.com/zfsonlinux/zfs/pull/2129 - just to be sure that the rescue environment is as stable as possible - but will also speed up transfers by a great deal
Compiling the modules with
march=generic mtune=generic
in accordance with
CONFIG_GENERIC_CPU=y
should also help in reducing future problems ...
Since there have been lots of changes in the upstream branches - I first have to see if all currently used patchsets still work on my system - this probably will be within the next few days
Hi KoT, On Oct 28, 2014 11:23 PM, "kernelOfTruth aka. kOT, Gentoo user" < [email protected]> wrote:
Hi @DurvalMenezes ,
do you, by chance,
use sha256 checksums ?
Nope, standard Fletcher checksums only.
In fact, at the moment the error occurred, I I didn't have a single pool or dataset created; I'd just booted the VM with a blank 10gb virtual HD and ran the "zpool upgrade -v" to see what features etc were supported (BTW, as recommended on your README).
If yes, this could have been caused by zfsonlinux/zfs#2351
in that case I'll leave it out, probably the same with zfsonlinux/zfs#2672 which might not work so well with zfsonlinux/zfs#2129 - just to be sure that the rescue environment is as stable as possible - but will also speed up transfers by a great deal
Compiling the modules with
march=generic mtune=generic
CONFIG_GENERIC_CPU=y should also help in reducing future problems ... Since there have been lots of changes in the upstream branches - I first have to see if all currently used patchsets still work - this probably will be within the next few days
Great! No hurry here, please take your time. And if I can help with anything, just let me know.
Cheers,
Durval.
— Reply to this email directly or view it on GitHub.
-march=generic -mtune=generic
doesn't work,
right now settling for:
-march=nocona -mtune=generic
that hopefully should do it
haven't had time to test the modules yet (e.g. by importing an external pool),
modules compiled fine & uploaded in "testing" for now:
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/commit/ca50865edb6186ed3b1f9b05a1c0964a338b7a73
@kernelOfTruth I'd suggest not including any additional patches for ZFS in SystemRescueCD - if it's not released, it's not been thoroughly vetted, and any SRCD support should be optimized for recovery, not performance.
Can you create the testing modules in ca50865 also for the standard kernel? When I try to boot the alternative kernel the system is hanging. I'm having the same error (Illegal Hardware Instruction) on AMD hardware with r2 module branch.
# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 6
model name : AMD Turion(tm) II Neo N40L Dual-Core Processor
stepping : 3
cpu MHz : 800.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save
bogomips : 2995.09
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
@kOT: Thanks for making the new SRMs available. I've just tested them here (on exactly the same environment as before), and it worked: not only the initial "zpool upgrade -v" ran normally producing the expected output, but also I was able to import a pool originally created on another machine, and do lots of (read-only for now) tests with it: copying files from it, perusing directory hierarchies, checking snapshots, etc. Everything was just perfect.
In the near future I plan on doing some read-write tests too, will post here if I find any issues, but for now I think we can consider this issue as closed.
@pdf in the near future (or even next compile I might leave out the 2129 patch) latest upstream changes don't seem to play well with the ARC changes (https://github.com/zfsonlinux/zfs/pull/2129 ), these kernel modules are intended to have the best mix of features vs. stability
2129 is supposed to fix OOM issues and low read & write transfer rates, the others also seem to address specific issues, I'm running it here on my box and haven't had any issues yet, granted I'm not running many advanced features but I'll see whether certain patches better ought to be left out due to stability concerns
thanks for your insight ! :+1:
@SenH sure thing ! I hope that the module will work with that processor, then
otherwiseI might have to drop the cflags to a lower level though since that Turion processor doesn't seem to expose the SSE3 instruction set, even though wikipedia claims it does (http://en.wikipedia.org/wiki/List_of_AMD_Turion_microprocessors#Turion_II_.2F_Turion_II_Ultra_.2F_Turion_II_Neo )
@DurvalMenezes glad it worked out for you =)
@kernelOfTruth Any estimate when you can upload the std kernel? It's a HP Microserver N40L with an AMD Turion II Neo N40L (1.5 Ghz) K625. I'm pretty sure it supports SSE3.
@SenH ETA 0 minutes :smile:
Commit: https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/commit/729895d94074b9be9a90f1a43367b2544fc05101
Branch: https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/commits/ZFS-for-SysRescCD-4.3.0-r2-testing02
lots of the additional patches since testing01 have been merged upstream meanwhile,
both the modules for the 3.14* and 3.10* kernel have been rebuilt
https://github.com/zfsonlinux/zfs/pull/2129 is left out this time (lower write & read rates, higher probability of OOM issues if your workflow or setup is prone to trigger it)
besides that it should (hopefully) work without issues
only compile-tested for now - haven't had any time to test them yet (just came home)
@kernelOfTruth ZFS modules from 729895d work great on the HP Microserver N40L (std kernel). Thanks for your effort!
Lrl c
@DurvalMenezes ??? (lrl == lol ?)
@kernelOfTruth, I'm getting this under a virtual machine (vmware 8) booted with 4.6.1 sysrcd and latest srm release
what is the solution?
"pdf commented on Nov 18, 2014 @kernelOfTruth I'd suggest not including any additional patches for ZFS in SystemRescueCD - if it's not released, it's not been thoroughly vetted, and any SRCD support should be optimized for recovery, not performance."
+1
@mailinglists35
That means that the modules apparently included some cpu instructions that are incompatible with your Core 2 Duo processor.
Honestly I don't know what could be causing this :question:
I've researched for several days in the past already and deliberately set the C- and CXXFLAGS to
-march=x86-64 -mtune=generic
to counter this, but seemingly to no avail - actually this shouldn't happen.
A similar behavior I've only observed so far when using e.g. WINE with flags (even though) compiled on a Haswell processor to be compatible with IvyBridge and attempting to run the binaries giving off these errors - deleting the folders & creating these anew (some local compilation involved ?) worked - not certain how this would work on the kernel modules since they were already compiled specifically with GENERIC instructions in mind ...
If you could suggest a different set of C- and CXXFLAGS with the lowest common denominator to make it workable on a broad basis (including e.g. HP MicroServer with AMD processors and your box), I'd be grateful
Thanks for your input !
It's been through the buildbots - that ensures data integrity, and as you can see most of those patches have been included into the official master releases meanwhile.
I'm making sure to only include patches that e.g. would help in the case one is stuck and can't get out of a situation (e.g. trying to delete files but can't which was addressed by Illumos 4950 https://github.com/zfsonlinux/zfs/pull/4207 ) - so clearly recovery-related
Illumos 4950, 6292, 6268, 5745 and zfs_object_mutex_size option
All of those have been thoroughly tested in production (e.g. by sempervictus , DeHacked, by myself, etc.) and been through the buildbots - so it's not that those were included to test through the SystemRescueCD modules.
I'll see to it, that no huge "experimental" patchsets like ABD (although also tested, e.g. going through buildbots, or being used by some users in production who need it) will be included into the modules, like in the past
That's great to hear, thank you! Any idea how to make the build run inside the mentioned test vm under the T5600 cpu?
On Thu, Feb 11, 2016 at 3:50 pm, kernelOfTruth aka. kOT, Gentoo user [email protected] wrote: @mailinglists35 [https://github.com/mailinglists35] Thanks for your input !
It's been through the buildbots - that ensures data integrity, and as you can see most of those patches have been included into the official master releases meanwhile.
I'm making sure to only include patches that e.g. would help in the case one is stuck and can't get out of a situation (e.g. trying to delete files but can't which was addressed by Illumos 4950 zfsonlinux/zfs#4207 [https://github.com/zfsonlinux/zfs/pull/4207] )
Illumos 4950, 6292, 6268, 5745 and zfs_object_mutex_size option
All of those have been thoroughly tested in production (e.g. by sempervictus , DeHacked, by myself, etc.) and been through the buildbots - so it's not that those were included to test through the SystemRescueCD modules.
I'll see to it, that no huge "experimental" patchsets like ABD (although also tested, e.g. going through buildbots, or being used by some users in production who need it) will be included into the modules, like in the past
— Reply to this email directly or view it on GitHub [https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/issues/2#issuecomment-182849027] .[https://github.com/notifications/beacon/AB9Ynq0PD110ziXEkczmCsZItXpNxQD3ks5pjHsSgaJpZM4Cz97F.gif]
@mailinglists35 let me take a look at the documentation ...
Do you compile programs on your box ? If yes, what C-/CXXFLAGS do you use ?
hi, I don't usually compile, I use debian prebuilt binaries.
I booted the same vm using ubuntu mate xenial edition (a dailiy build of post 16.04 alpha 2). then in that vm i have installed zfsutils-linux from here http://packages.ubuntu.com/source/xenial/zfs-linux and worked just fine.
inside the tar package you should be able to find a debian/ folder which should have build Makefiles, hopefully you can find the flags that were used.
@mailinglists35 please try out the following modules:
https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.6.1_nocona%2B
thanks; tried, no change http://i.imgur.com/a7EsgcL.png
Hm, I might be running into: https://forums.gentoo.org/viewtopic-p-7757350.html?sid=234b9ae8dbc1b926ae9b9e91c41a44db , https://bugs.gentoo.org/show_bug.cgi?id=528712
Argh - how I hate Intel, I want my money back and hexacore dual socket replacement :P
That thread got me an idea:
-march=x86_64
might be enough and it could be that
-mtune=generic
is causing these troubles ...