Kdump is a kernel crash dumping mechanism and is very reliable because the crash dump is captured from the context of a freshly booted kernel and not from the context of the crashed kernel. Kdump uses kexec to boot into a second kernel whenever system crashes. This second kernel, often called the crash kernel, boots with very little memory and captures the dump image.
The first kernel reserves a section of memory that the second kernel uses to boot. Kexec enables booting the capture kernel without going through the BIOS, so contents of the first kernel's memory are preserved, which is essentially the kernel crash dump.
Configuring and using the kdump
[root@test ~] yum install kexec-tools crash
[root@test ~] vi /boot/grub/grub.conf add kernel /boot/vmlinuz-2.6.18-128.el5 ro root=LABEL=/ rhgb quiet crashkernel=128M@16M initrd /boot/initrd-2.6.18-128.el5.img This crashkernel parameter reserve memory for dump that is 128M while keeping initial memory allocation to 16M . [root@test ~] init 6 Reboot and choose the crash kernel to boot. check the Crash kernel correctly installed or not using the below command. [root@gai-1399 ~]# cat /proc/iomem|grep Crash 01000000-08ffffff : Crash kernel In /etc/kdump.conf file we can specify location where dump file get stored . Normally dump file get stored inside /var/crash folder. [root@test ~] vi /etc/kdump path /var/crash [root@test ~]# /etc/init.d/kdump start Starting kdump: [ OK ] [root@test ~]# /etc/init.d/kdump status Kdump is operational [root@test ~]# chkconfig --level 3 kdump on [root@test ~]# chkconfig --list | grep kdump kdump 0:off 1:off 2:off 3:on 4:off 5:off 6:off [root@test ~]# cat /proc/cmdline ro root=LABEL=/1 crashkernel=128M@16M rhgb quiet [root@test ~] yum install kernel-debug* Installed: kernel-debug.i686 0:2.6.18-308.el5 kernel-debug-devel.i686 0:2.6.18-308.el5 To analyze the crash kernel We need kernel-debuginfo package, we can the analyse the vmcore file through the crash utility. [root@test ~]# echo 1 > /proc/sys/kernel/sysrq [root@test ~]# echo c > /proc/sysrq-trigger The above command makes Linux kernel to crash, and the YYYY-MM-DD-HH:MM/vmcore file will be generated to the location we have selected in the configuration. /proc/sysrq-trigger -Using the echo command to write to this file, a remote root user can execute most System Request Key commands remotely as if at the local terminal. To echo values to this file, the /proc/sys/kernel/sysrq must be set to a value other than 0 rebooted twice. [root@test ~]# cd /var/crash/ [root@test crash]# ls 2012-11-04-14:23 [root@test ~]#crash /var/crash/2012-11-04-14:23\vmcore /usr/lib/debug/lib/modules/2.6.18-308.16.1.el5.centos.plus/vmlinux To display the kernel message buffer, type the log command at the interactive prompt. crash>log To display the kernel stack strace crash> bt Display process status using ps
crash > ps crash> help * files mod runq union alias foreach mount search vm ascii fuser net set vtop bt gdb p sig waitq btop help ps struct whatis dev irq pte swap wr dis kmem ptob sym q eval list ptov sys exit log rd task extend mach repeat timer crash version: 5.1.8-1.el5.centos gdb version: 7.0 For help on any command above, enter "help <command>". For help on input options, enter "help input". For help on output options, enter "help output". crash> exit
Below is a sample crash analysis of my system.
[root@test 2012-11-04-15:57]# crash vmcore /usr/lib/debug/lib/modules/2.6.18 -308.16.1.el5.centos.plus/vmlinux crash 5.1.8-1.el5.centos Copyright (C) 2002-2011 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.0 Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... WARNING: kernel version inconsistency between vmlinux and dumpfile KERNEL: /usr/lib/debug/lib/modules/2.6.18-308.16.1.el5.centos.plus/vmlinux DUMPFILE: vmcore CPUS: 2 DATE: Sun Nov 4 15:57:11 2012 UPTIME: 01:28:17 LOAD AVERAGE: 0.00, 0.00, 0.00 TASKS: 175 NODENAME: test.net RELEASE: 2.6.18-308.13.1.el5 VERSION: #1 SMP Tue Aug 21 17:10:06 EDT 2012 MACHINE: i686 (2926 Mhz) MEMORY: 1.9 GB PANIC: "SysRq : Trigger a crashdump" PID: 315 COMMAND: "bash" TASK: f5b9caa0 [THREAD_INFO: eff0c000] CPU: 0 STATE: TASK_RUNNING (SYSRQ) crash> log Linux version 2.6.18-308.13.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-52)) #1 SMP Tue Aug 21 17:10:06 EDT 2012 BIOS-provided physical RAM map: BIOS-e820: 0000000000010000 - 000000000009f800 (usable) BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007b9a1740 (usable) BIOS-e820: 000000007b9a1740 - 000000007b9a37a0 (ACPI NVS) BIOS-e820: 000000007b9a37a0 - 000000007e000000 (reserved) BIOS-e820: 00000000f4000000 - 00000000f8000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fed40000 (reserved) BIOS-e820: 00000000fed45000 - 0000000100000000 (reserved) 1081MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000f9bf0 Using x86 segment limits to approximate NX protection On node 0 totalpages: 506273 DMA zone: 4096 pages, LIFO batch:0 Normal zone: 225280 pages, LIFO batch:31 HighMem zone: 276897 pages, LIFO batch:31 DMI 2.6 present. DMI: Hewlett-Packard HP Compaq 6000 Pro MT PC/3048h, BIOS 786G2 v01.09 08/25/2009 Using APIC driver default ACPI: RSDP (v000 COMPAQ ) @ 0x000e5810 ACPI: RSDT (v001 HPQOEM SLIC-BPC 0x20090825 0x00000000) @ 0x7b9c5840 ACPI: FADT (v001 COMPAQ EAGLLAKE 0x00000001 0x00000000) @ 0x7b9c58e8 ACPI: MADT (v001 COMPAQ EAGLLAKE 0x00000001 0x00000000) @ 0x7b9c595c ACPI: ASF! (v032 COMPAQ EAGLLAKE 0x00000001 0x00000000) @ 0x7b9c59e0 ACPI: MCFG (v001 COMPAQ EAGLLAKE 0x00000001 0x00000000) @ 0x7b9c5a43 ACPI: TCPA (v001 COMPAQ EAGLLAKE 0x00000001 0x00000000) @ 0x7b9c5a7f ACPI: HPET (v001 COMPAQ EAGLLAKE 0x00000001 0x00000000) @ 0x7b9c5c27 ACPI: DSDT (v001 COMPAQ DSDT_PRJ 0x00000001 MSFT 0x0100000e) @ 0x00000000 ACPI: PM-Timer IO Port: 0xf808 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x00] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x00] disabled) Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 80000000 (gap: 7e000000:76000000) Detected 2926.118 MHz processor. Built 1 zonelists. Total pages: 506273 Kernel command line: ro root=LABEL=/1 crashkernel=128M@16M rhgb quiet mapped APIC to ffffd000 (fee00000) mapped IOAPIC to ffffc000 (fec00000) Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 CPU 0 irqstacks, hard=c0774000 soft=c0754000 PID hash table entries: 4096 (order: 12, 16384 bytes) Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 1868804k/2025092k available (2208k kernel code, 155032k reserved, 921k data, 232k init, 1107588k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. hpet0: at MMIO 0xfed00000 (virtual 0xf8800000), IRQs 2, 8, 0, 0, 0, 0, 0, 0 hpet0: 8 64-bit timers, 14318180 Hz Using HPET for base-timer Calibrating delay loop (skipped), value calculated using timer frequency.. 5852.23 BogoMIPS (lpj=2926118) Security Framework v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 512 CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000 0408e3bd 00000000 00000001 CPU: After vendor identify, caps: bfebfbff 20100000 00000000 00000000 0408e3bd 00000000 00000001 monitor/mwait feature present. using mwait in idle threads. CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 3072K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 CPU: After all inits, caps: bfebf3ff 20100000 00000000 00000940 0408e3bd 00000000 00000001 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Checking 'hlt' instruction... OK. SMP alternatives: switching to UP code ACPI: Core revision 20060707 ACPI Warning (utinit-0077): Invalid FADT value PM2_CNT_LEN=0 at offset 5A FADT=f7ff9780 [20060707] CPU0: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz stepping 0a SMP alternatives: switching to SMP code Booting processor 1/1 eip 11000 CPU 1 irqstacks, hard=c0775000 soft=c0755000 Initializing CPU#1 Calibrating delay using timer specific routine.. 5851.96 BogoMIPS (lpj=2925983) CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000 0408e3bd 00000000 00000001 CPU: After vendor identify, caps: bfebfbff 20100000 00000000 00000000 0408e3bd 00000000 00000001 monitor/mwait feature present. CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 3072K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 CPU: After all inits, caps: bfebf3ff 20100000 00000000 00000940 0408e3bd 00000000 00000001 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. CPU1: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz stepping 0a Total of 2 processors activated (11704.20 BogoMIPS). ENABLING IO-APIC IRQs ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1 Using local APIC timer interrupts. checking TSC synchronization across 2 CPUs: passed. Brought up 2 CPUs sizeof(vma)=84 bytes sizeof(page)=32 bytes sizeof(inode)=340 bytes sizeof(dentry)=136 bytes sizeof(ext3inode)=492 bytes sizeof(buffer_head)=52 bytes sizeof(skbuff)=176 bytes migration_cost=36 checking if image is initramfs... it is Freeing initrd memory: 2622k freed HP Compaq Laptop series board detected. Selecting BIOS-method for reboots. NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using MMCONFIG Setting up standard PCI resources ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: No dock devices found. AetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default pnp: 00:0c: ioport range 0x4d0-0x4d1 has been reserved pnp: 00:0d: ioport range 0x400-0x41f could not be reserved pnp: 00:0d: ioport range 0x420-0x43f has been reserved pnp: 00:0d: ioport range 0x440-0x45f has been reserved pnp: 00:0d: ioport range 0x460-0x47f has been reserved pnp: 00:0d: ioport range 0x800-0x87f has been reserved pnp: 00:0d: ioport range 0x880-0x8ff has been reserved pnp: 00:0d: ioport range 0xf800-0xf81f could not be reserved pnp: 00:0d: ioport range 0xf820-0xf83f could not be reserved pnp: 00:0e: iomem range 0x0-0x9ffff could not be reserved pnp: 00:0e: iomem range 0x100000-0x7dffffff could not be reserved pnp: 00:0e: iomem range 0xe4000-0xfffff could not be reserved pnp: 00:0e: iomem range 0xfec01000-0xfecfffff has been reserved PCI: Bridge: 0000:00:1c.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:1c.1 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:1e.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Setting latency timer of device 0000:00:1c.0 to 64 ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 21 (level, low) -> IRQ 169 Phci_hcd 0000:00:1a.7: debug port 1 PCI: cache line size of 32 is not supported by device 0000:00:1a.7 ehci_hcd 0000:00:1a.7: irq 209, io mem 0xf0526800 ehci_hcd 0000:00:1a.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 6 ports detected ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 20 (level, low) -> IRQ 217 PCI: Setting latency timer of device 0000:00:1d.7 to 64 ehci_hcd 0000:00:1d.7: EHCI Host Controller ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 2 ehci_hcd 0000:00:1d.7: debug port 1 PCI: cache line size of 32 is not supported by device 0000:00:1d.7 ehci_hcd 0000:00:1d.7: irq 217, io mem 0xf0526c00 ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 6 ports detected ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) USB Universal Host Controller Interface driver v3.0 ACPI: PCI Interrupt 0000:00:1a.0[A] -> GSI 20 (level, low) -> IRQ 217 PCI: Setting latency timer of device 0000:00:1a.0 to 64 uhci_hcd 0000:00:1a.0: UHCI Host Controller uhci_hcd 0000:00:1a.0: new USB bus registered, assigned bus number 3 uhci_hcd 0000:00:1a.0: irq 217, io base 0x00001120 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1a.1[B] -> GSI 21 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:00:1a.1 to 64 uhci_hcd 0000:00:1a.1: UHCI Host Controller uhci_hcd 0000:00:1a.1: new USB bus registered, assigned bus number 4 uhci_hcd 0000:00:1a.1: irq 169, io base 0x00001140 usb usb4: configuration #1 chosen from 1 choice hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1a.2[C] -> GSI 22 (level, low) -> IRQ 209 PCI: Setting latency timer of device 0000:00:1a.2 to 64 uhci_hcd 0000:00:1a.2: UHCI Host Controller uhci_hcd 0000:00:1a.2: new USB bus registered, assigned bus number 5 uhci_hcd 0000:00:1a.2: irq 209, io base 0x00001160 usb usb5: configuration #1 chosen from 1 choice hub 5-0:1.0: USB hub found hub 5-0:1.0: 2 ports detected input: ImPS/2 Logitech Wheel Mouse as /class/input/input1 ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 20 (level, low) -> IRQ 217 PCI: Setting latency timer of device 0000:00:1d.0 to 64 uhci_hcd 0000:00:1d.0: UHCI Host Controller uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 6 Bluetooth: RFCOMM ver 1.8 Bluetooth: HIDP (Human Interface Emulation) ver 1.1 Installing knfsd (copyright (C) 1996 okir@monad.swb.de). NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period Bridge firewalling registered [drm] Initialized drm 1.0.1 20051102 ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 90 [drm] Initialized i915 1.8.0 20060929 on minor 0 set status page addr 0x01fff000 SysRq : Trigger a crashdump crash> bt PID: 315 TASK: f5b9caa0 CPU: 0 COMMAND: "bash" #0 [eff0cee8] crash_kexec at c0445cbc #1 [eff0cf2c] __handle_sysrq at c054ed56 #2 [eff0cf58] uptime_read_proc at c04ad81e #3 [eff0cf64] proc_delete_inode at c04a8882 #4 [eff0cf84] vfs_write at c047842e #5 [eff0cf9c] sys_write at c0478a55 #6 [eff0cfb8] system_call at c0404f44 EAX: ffffffda EBX: 00000001 ECX: b7f72000 EDX: 00000002 DS: 007b ESI: 00000002 ES: 007b EDI: b7f72000 SS: 007b ESP: bf88ff9c EBP: bf88ffb8 CS: 0073 EIP: 00b79f9e ERR: 00000004 EFLAGS: 00000246 crash> vm PID: 315 TASK: f5b9caa0 CPU: 0 COMMAND: "bash" MM PGD RSS TOTAL_VM f7882580 f07f7000 1564k 4744k VMA START END FLAGS FILE f4597df4 16e000 16f000 8040075 f45934c4 48f000 499000 75 /lib/libnss_files-2.5.so f4593ca4 499000 49b000 100073 /lib/libnss_files-2.5.so f45979b0 a97000 ab1000 875 /lib/ld-2.5.so f4597cf8 ab1000 ab3000 100873 /lib/ld-2.5.so f4593a58 ab5000 c0c000 75 /lib/libc-2.5.so f45937b8 c0c000 c0d000 70 /lib/libc-2.5.so f45932cc c0d000 c0e000 100071 /lib/libc-2.5.so f459380c c0e000 c10000 100073 /lib/libc-2.5.so f4593374 c10000 c13000 100073 f4593a04 c40000 c43000 75 /lib/libdl-2.5.so f4593b00 c43000 c45000 100073 /lib/libdl-2.5.so f4597ba8 7d1f000 7d22000 75 /lib/libtermcap.so.2.0.8 f45938b4 7d22000 7d23000 100073 /lib/libtermcap.so.2.0.8 f4597f44 8047000 80f6000 1875 /bin/bash f459795c 80f6000 80fb000 101873 /bin/bash f4597b54 80fb000 8100000 100073 f4597ca4 9aab000 9aea000 100073 f4593f44 b7d53000 b7f53000 71 /usr/lib/locale/locale-archive f4597710 b7f53000 b7f55000 100073 f79d0b00 b7f71000 b7f73000 100073 f7b83aac bf87c000 bf892000 100173
No comments:
Post a Comment