Discussion:
[lvm-devel] [BISECT] Regression: SEGV: 9156c5d dmeventd rework locking code
Zdenek Kabelac
2017-04-01 09:25:42 UTC
Permalink
Hello all,
After upgrading from el7.2 to el7.3, we started getting dmeventd segment
faults immediately after update. A bisect of the lvm2 git tree shows the
first bad commit below. This bug prevents us from activating our logical
volumes without disabling lvm2-monitor and setting activation{monitoring =
0} in lvm.conf.
I was able to get a git backtrace from a core dump in case that is useful,
also below.
Please let me know if you need additional information or have a patch that
I can test with.
Thank you for your help!
-Eric
===== GDB =====
Mar 31 12:07:06 server1.localhost kernel: dmeventd[7885]: segfault at 7f753ae4c6a8 ip 00007f7537b69617 sp 00007f753ae4c6b0 error 7 in liblvm2cmd.so.2.02[7f7537ac8000+191000]
~]# gdb /usr/sbin/dmeventd /var/coredumps/core-dmeventd-sig11-user0-group0-pid20364-time1490987932
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Reading symbols from /usr/sbin/dmeventd...Reading symbols from /usr/lib/debug/usr/sbin/dmeventd.debug...done.
done.
warning: core file may not match specified executable file.
[New LWP 20408]
[New LWP 20364]
[New LWP 20409]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `dmeventd'.
Program terminated with signal 11, Segmentation fault.
#0 _touch_memory (size=<optimized out>, mem=<optimized out>) at mm/memlock.c:141
141 size_t pagesize = lvm_getpagesize();
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.166-2.el7.x86_64 elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64 libattr-2.4.46-12.el7.x86_64 libblkid-2.23.2-33.el7.x86_64 libcap-2.22-8.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 libsepol-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 systemd-libs-219-30.el7_3.7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 _touch_memory (size=<optimized out>, mem=<optimized out>) at mm/memlock.c:141
#1 _allocate_memory () at mm/memlock.c:163
Hi

Hmm few theories - from your gdb backtrace it suggests it has failed on libc
syscall (getpagesize()) ??
So have you upgraded all related packages ? (device-mapper*, kernel*)
or it's some 'mixture' in use ?

Also don't you have some large/changed values of 'reserved_stack' or
'reserved_memory' in your lvm.conf ?

Recent version of lvm2 (169) has added 'extra page' for 'stack guard' for
PPC64le - but since you report suggests you do use 'x86_64' it should not be
affecting this arch.

Please open BZ and attach your lvm.conf file in use and all other info
(installed packages) - selinux enabled/disabled ?
Some non-standard kernel options in use ?

Ragards

Zdenek
Zdenek Kabelac
2017-04-05 08:05:19 UTC
Permalink
Post by Zdenek Kabelac
(gdb) bt
#0 _touch_memory (size=<optimized out>, mem=<optimized out>) at mm/memlock.c:141
#1 _allocate_memory () at mm/memlock.c:163
Hi
Hmm few theories - from your gdb backtrace it suggests it has failed on libc
syscall (getpagesize()) ??
So have you upgraded all related packages ? (device-mapper*, kernel*)
or it's some 'mixture' in use ?
Also don't you have some large/changed values of 'reserved_stack' or
'reserved_memory' in your lvm.conf ?
Yes! Actually, reserved_stack was the problem. By trial and error we found
that when reserved_stack is 290 or more, dmeventd will segfault. We tried
on servers with far fewer logical volumes and did not have a problem, so
while I am not going to try and figure out how many logical volumes hit
this stack limit, this is the problem!
Is there some kind of hard limit to reserved_stack that should be
enforced?
I seem to recall increasing these values because lvcreate (or lvchange or
something) suggested that the values were too small.
Do you still want a bugzilla report?
Hi

Yep - so this explains it. Not clear we need BZ yet.
I'll explain the reason of limitation.

Dmeventd is using threads - and to minimize the RAM usage by memlocked process
with number of threads - we picked 'relatively' low value for pthread_stack
with assumption noone will ever need bigger value than this :)

Now lvm.conf does defined 'reserved_stack' amount - and this stack is then
'mapped' in dmeventd lvm plugin thread. However this is after 'dmeventd' has
created thread with 128K stack limit (dmeventd itself doesn't 'see/use'
lvm.conf so can't create threads with different settings)

So clearly a logical problem we can't really solve in any easy way.
We limited amount of stack used by lvm2 code - so current defaults should be
rather good enough for almost every possible use-case.

So before we start to fix this 'catch 42' case - could you check and
eventually describe which use-case was not working well we 'default'
reserved_stack so you had to raise the value to 290 to make it work ?

Otherwise I think the best solution would be to simply internally limit
the 'accepted' value and ignore any higher setting and just document it.


Regards

Zdenek

Loading...