Using kdump and crash to analyze kernel crashes
1. Install debuginfo and crash packages
enable debuginfo repository and run "yum -y install kernel-debuginfo crash kdump"2. enable kdump
add crashdump=192M to kernel parameter or use system-config-kdump to set it graphicallychkconfig kdump on
3. enable kernel to reboot on panic
add kernel.panic = 20 to /etc/sysctl.conf4. install watchdog and use /dev/watchdog as the device
yum -y install watchdog, edit /etc/watchdog.conf5. trigger a crash
This should load your kernel-kdump image via kexec, leaving the system ready to capture a vmcore upon crashing. To test this out, you can force a crash with the following commands. Run then on console or schedule a crash with the at utilityat next minute > chkconfig atd off > echo 1 > /proc/sys/kernel/sysrq > echo c > /proc/sysrq-trigger > ^D
Notice I prevented atd from starting on next boot. This is because the at job is not de-queued before the crash dump. On next boot, the system will re-run the job, causing your system continuously crash! Just remember to turn it back on when it's rebooted.
When the crash happens, the system will reboot and copy the crash dump file to /var/crash
Scientific Linux release 6.1 (Carbon) Kernel 2.6.32-131.12.1.el6.x86_64 on an x86_64 sl61.comme.ca login: SysRq : Trigger a crash BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8131aa36>] sysrq_handle_crash+0x16/0x20 PGD 37bd8067 PUD 37590067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/kernel/kexec_crash_loaded Dumping ftrace buffer: (ftrace buffer empty) CPU 1 Modules linked in: autofs4 sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log ppdev parport_pc parport microcode sg iTCO_wdt iTCO_vendor_support i2c_piix4 i2c_core snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ahci ata_generic pata_acpi ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: freq_table] Modules linked in: autofs4 sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log ppdev parport_pc parport microcode sg iTCO_wdt iTCO_vendor_support i2c_piix4 i2c_core snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ahci ata_generic pata_acpi ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: freq_table] Pid: 1909, comm: bash Not tainted 2.6.32-131.12.1.el6.x86_64 #1 VirtualBox RIP: 0010:[<ffffffff8131aa36>] [<ffffffff8131aa36>] sysrq_handle_crash+0x16/0x20 RSP: 0018:ffff88003728fe18 EFLAGS: 00010096 RAX: 0000000000000010 RBX: 0000000000000063 RCX: 00000000000012e3 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063 RBP: ffff88003728fe18 R08: ffffffff81bfe340 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000 R13: ffffffff81af9f00 R14: 0000000000000286 R15: 0000000000000004 FS: 00007f26443d2700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000003778a000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 1909, threadinfo ffff88003728e000, task ffff88003c94b540) Stack: ffff88003728fe68 ffffffff8131acf2 ffff88003c94b540 ffff880000000000 <0> 0000000d80df4018 0000000000000002 ffff88003d60f780 00007f26443d7000 <0> 0000000000000002 fffffffffffffffb ffff88003728fe98 ffffffff8131adae Call Trace: [<ffffffff8131acf2>] __handle_sysrq+0x132/0x1a0 [<ffffffff8131adae>] write_sysrq_trigger+0x4e/0x50 [<ffffffff811d509e>] proc_reg_write+0x7e/0xc0 [<ffffffff811727f8>] vfs_write+0xb8/0x1a0 [<ffffffff810d1b52>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff81173231>] sys_write+0x51/0x90 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Code: d0 88 81 a3 a2 fc 81 c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 c7 05 bd c6 77 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c9 c3 55 48 89 e5 0f 1f 44 00 00 8d 47 RIP [<ffffffff8131aa36>] sysrq_handle_crash+0x16/0x20 RSP <ffff88003728fe18> CR2: 0000000000000000 ..MP-BIOS bug: 8254 timer not connected to IO-APIC Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option.
Shortly after, the kdump kernel will be booted and it will copy the dump to /var/crash.

6. run crash on the dump file
> pwd /var/crash/127.0.0.1-2011-09-10-14:40:56 > ls -lh total 83M -rw------- 1 root root 83M Sep 10 14:41 vmcore > crash /usr/lib/debug/lib/modules/2.6.32-131.12.1.el6.x86_64/vmlinux vmcore
do a bt (backtrace) to see which process triggered the panic
crash> bt PID: 1743 TASK: ffff880037556b40 CPU: 0 COMMAND: "bash" #0 [ffff88003d1459e0] machine_kexec at ffffffff810310cb #1 [ffff88003d145a40] crash_kexec at ffffffff810b6392 #2 [ffff88003d145b10] oops_end at ffffffff814de670 #3 [ffff88003d145b40] no_context at ffffffff81040c9b #4 [ffff88003d145b90] __bad_area_nosemaphore at ffffffff81040f25 #5 [ffff88003d145be0] bad_area at ffffffff8104104e #6 [ffff88003d145c10] __do_page_fault at ffffffff81041773 #7 [ffff88003d145d30] do_page_fault at ffffffff814e067e #8 [ffff88003d145d60] page_fault at ffffffff814dda05 [exception RIP: sysrq_handle_crash+22] RIP: ffffffff8131aa36 RSP: ffff88003d145e18 RFLAGS: 00010096 RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000beb RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063 RBP: ffff88003d145e18 R8: 0000000000000000 R9: 0000000000000002 R10: ffffffff8163bfe0 R11: ffff88003796bfcf R12: 0000000000000000 R13: ffffffff81af9f00 R14: 0000000000000286 R15: 0000000000000004 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88003d145e20] __handle_sysrq at ffffffff8131acf2 #10 [ffff88003d145e70] write_sysrq_trigger at ffffffff8131adae #11 [ffff88003d145ea0] proc_reg_write at ffffffff811d509e #12 [ffff88003d145ef0] vfs_write at ffffffff811727f8 #13 [ffff88003d145f30] sys_write at ffffffff81173231 #14 [ffff88003d145f80] system_call_fastpath at ffffffff8100b172 RIP: 0000003bfdad84e0 RSP: 00007ffff707a0b0 RFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffffff8100b172 RCX: 0000000000000063 RDX: 0000000000000002 RSI: 00007f8a5ccf0000 RDI: 0000000000000001 RBP: 00007f8a5ccf0000 R8: 000000000000000a R9: 00007f8a5cceb700 R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000002 R13: 0000003bfdd89780 R14: 0000000000000002 R15: 0000003bfdd89780 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b crash>
references
http://people.redhat.com/anderson/crash_whitepaper/http://www.dedoimedo.com/computers/kdump.html
There are 41 comments on this page. [Display comments]