HomePage » Linux » LinuxCrashDump


Using kdump and crash to analyze kernel crashes


1. Install debuginfo and crash packages

enable debuginfo repository and run "yum -y install kernel-debuginfo crash kdump"

2. enable kdump

add crashdump=192M to kernel parameter or use system-config-kdump to set it graphically
chkconfig kdump on

3. enable kernel to reboot on panic

add kernel.panic = 20 to /etc/sysctl.conf

4. install watchdog and use /dev/watchdog as the device

yum -y install watchdog, edit /etc/watchdog.conf

5. trigger a crash

This should load your kernel-kdump image via kexec, leaving the system ready to capture a vmcore upon crashing. To test this out, you can force a crash with the following commands. Run then on console or schedule a crash with the at utility
at next minute
> chkconfig atd off
> echo 1 > /proc/sys/kernel/sysrq
> echo c > /proc/sysrq-trigger 
> ^D


Notice I prevented atd from starting on next boot. This is because the at job is not de-queued before the crash dump. On next boot, the system will re-run the job, causing your system continuously crash! Just remember to turn it back on when it's rebooted.

When the crash happens, the system will reboot and copy the crash dump file to /var/crash
Scientific Linux release 6.1 (Carbon)
Kernel 2.6.32-131.12.1.el6.x86_64 on an x86_64

sl61.comme.ca login: SysRq : Trigger a crash
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8131aa36>] sysrq_handle_crash+0x16/0x20
PGD 37bd8067 PUD 37590067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/kernel/kexec_crash_loaded
Dumping ftrace buffer:
   (ftrace buffer empty)
CPU 1 
Modules linked in: autofs4 sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log ppdev parport_pc parport microcode sg iTCO_wdt iTCO_vendor_support i2c_piix4 i2c_core snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ahci ata_generic pata_acpi ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: freq_table]

Modules linked in: autofs4 sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log ppdev parport_pc parport microcode sg iTCO_wdt iTCO_vendor_support i2c_piix4 i2c_core snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ahci ata_generic pata_acpi ata_piix virtio_pci virtio_ring virtio dm_mod [last unloaded: freq_table]
Pid: 1909, comm: bash Not tainted 2.6.32-131.12.1.el6.x86_64 #1 VirtualBox
RIP: 0010:[<ffffffff8131aa36>]  [<ffffffff8131aa36>] sysrq_handle_crash+0x16/0x20
RSP: 0018:ffff88003728fe18  EFLAGS: 00010096
RAX: 0000000000000010 RBX: 0000000000000063 RCX: 00000000000012e3
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
RBP: ffff88003728fe18 R08: ffffffff81bfe340 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff81af9f00 R14: 0000000000000286 R15: 0000000000000004
FS:  00007f26443d2700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000003778a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 1909, threadinfo ffff88003728e000, task ffff88003c94b540)
Stack:
 ffff88003728fe68 ffffffff8131acf2 ffff88003c94b540 ffff880000000000
<0> 0000000d80df4018 0000000000000002 ffff88003d60f780 00007f26443d7000
<0> 0000000000000002 fffffffffffffffb ffff88003728fe98 ffffffff8131adae
Call Trace:
 [<ffffffff8131acf2>] __handle_sysrq+0x132/0x1a0
 [<ffffffff8131adae>] write_sysrq_trigger+0x4e/0x50
 [<ffffffff811d509e>] proc_reg_write+0x7e/0xc0
 [<ffffffff811727f8>] vfs_write+0xb8/0x1a0
 [<ffffffff810d1b52>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81173231>] sys_write+0x51/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Code: d0 88 81 a3 a2 fc 81 c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 c7 05 bd c6 77 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c9 c3 55 48 89 e5 0f 1f 44 00 00 8d 47 
RIP  [<ffffffff8131aa36>] sysrq_handle_crash+0x16/0x20
 RSP <ffff88003728fe18>
CR2: 0000000000000000
..MP-BIOS bug: 8254 timer not connected to IO-APIC
Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with apic=debug and send a report.  Then try booting with the 'noapic' option.


Shortly after, the kdump kernel will be booted and it will copy the dump to /var/crash.
kdump dump


6. run crash on the dump file
> pwd
/var/crash/127.0.0.1-2011-09-10-14:40:56

> ls -lh
total 83M
-rw------- 1 root root 83M Sep 10 14:41 vmcore

> crash /usr/lib/debug/lib/modules/2.6.32-131.12.1.el6.x86_64/vmlinux vmcore


do a bt (backtrace) to see which process triggered the panic
crash> bt
PID: 1743   TASK: ffff880037556b40  CPU: 0   COMMAND: "bash"
 #0 [ffff88003d1459e0] machine_kexec at ffffffff810310cb
 #1 [ffff88003d145a40] crash_kexec at ffffffff810b6392
 #2 [ffff88003d145b10] oops_end at ffffffff814de670
 #3 [ffff88003d145b40] no_context at ffffffff81040c9b
 #4 [ffff88003d145b90] __bad_area_nosemaphore at ffffffff81040f25
 #5 [ffff88003d145be0] bad_area at ffffffff8104104e
 #6 [ffff88003d145c10] __do_page_fault at ffffffff81041773
 #7 [ffff88003d145d30] do_page_fault at ffffffff814e067e
 #8 [ffff88003d145d60] page_fault at ffffffff814dda05
	[exception RIP: sysrq_handle_crash+22]
	RIP: ffffffff8131aa36  RSP: ffff88003d145e18  RFLAGS: 00010096
	RAX: 0000000000000010  RBX: 0000000000000063  RCX: 0000000000000beb
	RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000063
	RBP: ffff88003d145e18   R8: 0000000000000000   R9: 0000000000000002
	R10: ffffffff8163bfe0  R11: ffff88003796bfcf  R12: 0000000000000000
	R13: ffffffff81af9f00  R14: 0000000000000286  R15: 0000000000000004
	ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88003d145e20] __handle_sysrq at ffffffff8131acf2
#10 [ffff88003d145e70] write_sysrq_trigger at ffffffff8131adae
#11 [ffff88003d145ea0] proc_reg_write at ffffffff811d509e
#12 [ffff88003d145ef0] vfs_write at ffffffff811727f8
#13 [ffff88003d145f30] sys_write at ffffffff81173231
#14 [ffff88003d145f80] system_call_fastpath at ffffffff8100b172
	RIP: 0000003bfdad84e0  RSP: 00007ffff707a0b0  RFLAGS: 00010202
	RAX: 0000000000000001  RBX: ffffffff8100b172  RCX: 0000000000000063
	RDX: 0000000000000002  RSI: 00007f8a5ccf0000  RDI: 0000000000000001
	RBP: 00007f8a5ccf0000   R8: 000000000000000a   R9: 00007f8a5cceb700
	R10: 00000000ffffffff  R11: 0000000000000246  R12: 0000000000000002
	R13: 0000003bfdd89780  R14: 0000000000000002  R15: 0000003bfdd89780
	ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash> 


references

http://people.redhat.com/anderson/crash_whitepaper/
http://www.dedoimedo.com/computers/kdump.html

There are 41 comments on this page. [Display comments]

Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by WikkaWiki