Kdump is a kernel crash dumping mechanism and is very reliable because the crash dump is captured from the context of a freshly booted kernel and not from the context of the crashed kernel. Kdump uses kexec to boot into a second kernel whenever system crashes. This second kernel, often called the crash kernel, boots with very little memory and captures the dump image.
The first kernel reserves a section of memory that the second kernel uses to boot. Kexec enables booting the capture kernel without going through the BIOS, so contents of the first kernel's memory are preserved, which is essentially the kernel crash dump.
Configuring and using the kdump
[root@test ~] yum install kexec-tools crash
[root@test ~] vi /boot/grub/grub.conf add kernel /boot/vmlinuz-2.6.18-128.el5 ro root=LABEL=/ rhgb quiet crashkernel=128M@16M initrd /boot/initrd-2.6.18-128.el5.img This crashkernel parameter reserve memory for dump that is 128M while keeping initial memory allocation to 16M . [root@test ~] init 6 Reboot and choose the crash kernel to boot. check the Crash kernel correctly installed or not using the below command. [root@gai-1399 ~]# cat /proc/iomem|grep Crash 01000000-08ffffff : Crash kernel In /etc/kdump.conf file we can specify location where dump file get stored . Normally dump file get stored inside /var/crash folder. [root@test ~] vi /etc/kdump path /var/crash [root@test ~]# /etc/init.d/kdump start Starting kdump: [ OK ] [root@test ~]# /etc/init.d/kdump status Kdump is operational [root@test ~]# chkconfig --level 3 kdump on [root@test ~]# chkconfig --list | grep kdump kdump 0:off 1:off 2:off 3:on 4:off 5:off 6:off [root@test ~]# cat /proc/cmdline ro root=LABEL=/1 crashkernel=128M@16M rhgb quiet [root@test ~] yum install kernel-debug* Installed: kernel-debug.i686 0:2.6.18-308.el5 kernel-debug-devel.i686 0:2.6.18-308.el5 To analyze the crash kernel We need kernel-debuginfo package, we can the analyse the vmcore file through the crash utility. [root@test ~]# echo 1 > /proc/sys/kernel/sysrq [root@test ~]# echo c > /proc/sysrq-trigger The above command makes Linux kernel to crash, and the YYYY-MM-DD-HH:MM/vmcore file will be generated to the location we have selected in the configuration. /proc/sysrq-trigger -Using the echo command to write to this file, a remote root user can execute most System Request Key commands remotely as if at the local terminal. To echo values to this file, the /proc/sys/kernel/sysrq must be set to a value other than 0 rebooted twice. [root@test ~]# cd /var/crash/ [root@test crash]# ls 2012-11-04-14:23 [root@test ~]#crash /var/crash/2012-11-04-14:23\vmcore /usr/lib/debug/lib/modules/2.6.18-308.16.1.el5.centos.plus/vmlinux To display the kernel message buffer, type the log command at the interactive prompt. crash>log To display the kernel stack strace crash> bt Display process status using ps
crash > ps crash> help * files mod runq union alias foreach mount search vm ascii fuser net set vtop bt gdb p sig waitq btop help ps struct whatis dev irq pte swap wr dis kmem ptob sym q eval list ptov sys exit log rd task extend mach repeat timer crash version: 5.1.8-1.el5.centos gdb version: 7.0 For help on any command above, enter "help <command>". For help on input options, enter "help input". For help on output options, enter "help output". crash> exit
Below is a sample crash analysis of my system.
[root@test 2012-11-04-15:57]# crash vmcore /usr/lib/debug/lib/modules/2.6.18
-308.16.1.el5.centos.plus/vmlinux
crash 5.1.8-1.el5.centos
Copyright (C) 2002-2011 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
WARNING: kernel version inconsistency between vmlinux and dumpfile
KERNEL: /usr/lib/debug/lib/modules/2.6.18-308.16.1.el5.centos.plus/vmlinux
DUMPFILE: vmcore
CPUS: 2
DATE: Sun Nov 4 15:57:11 2012
UPTIME: 01:28:17
LOAD AVERAGE: 0.00, 0.00, 0.00
TASKS: 175
NODENAME: test.net
RELEASE: 2.6.18-308.13.1.el5
VERSION: #1 SMP Tue Aug 21 17:10:06 EDT 2012
MACHINE: i686 (2926 Mhz)
MEMORY: 1.9 GB
PANIC: "SysRq : Trigger a crashdump"
PID: 315
COMMAND: "bash"
TASK: f5b9caa0 [THREAD_INFO: eff0c000]
CPU: 0
STATE: TASK_RUNNING (SYSRQ)
crash> log
Linux version 2.6.18-308.13.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-52)) #1 SMP Tue Aug 21 17:10:06 EDT 2012
BIOS-provided physical RAM map:
BIOS-e820: 0000000000010000 - 000000000009f800 (usable)
BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007b9a1740 (usable)
BIOS-e820: 000000007b9a1740 - 000000007b9a37a0 (ACPI NVS)
BIOS-e820: 000000007b9a37a0 - 000000007e000000 (reserved)
BIOS-e820: 00000000f4000000 - 00000000f8000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fed40000 (reserved)
BIOS-e820: 00000000fed45000 - 0000000100000000 (reserved)
1081MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f9bf0
Using x86 segment limits to approximate NX protection
On node 0 totalpages: 506273
DMA zone: 4096 pages, LIFO batch:0
Normal zone: 225280 pages, LIFO batch:31
HighMem zone: 276897 pages, LIFO batch:31
DMI 2.6 present.
DMI: Hewlett-Packard HP Compaq 6000 Pro MT PC/3048h, BIOS 786G2 v01.09 08/25/2009
Using APIC driver default
ACPI: RSDP (v000 COMPAQ ) @ 0x000e5810
ACPI: RSDT (v001 HPQOEM SLIC-BPC 0x20090825 0x00000000) @ 0x7b9c5840
ACPI: FADT (v001 COMPAQ EAGLLAKE 0x00000001 0x00000000) @ 0x7b9c58e8
ACPI: MADT (v001 COMPAQ EAGLLAKE 0x00000001 0x00000000) @ 0x7b9c595c
ACPI: ASF! (v032 COMPAQ EAGLLAKE 0x00000001 0x00000000) @ 0x7b9c59e0
ACPI: MCFG (v001 COMPAQ EAGLLAKE 0x00000001 0x00000000) @ 0x7b9c5a43
ACPI: TCPA (v001 COMPAQ EAGLLAKE 0x00000001 0x00000000) @ 0x7b9c5a7f
ACPI: HPET (v001 COMPAQ EAGLLAKE 0x00000001 0x00000000) @ 0x7b9c5c27
ACPI: DSDT (v001 COMPAQ DSDT_PRJ 0x00000001 MSFT 0x0100000e) @ 0x00000000
ACPI: PM-Timer IO Port: 0xf808
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 7:7 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 7:7 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x00] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x00] disabled)
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 80000000 (gap: 7e000000:76000000)
Detected 2926.118 MHz processor.
Built 1 zonelists. Total pages: 506273
Kernel command line: ro root=LABEL=/1 crashkernel=128M@16M rhgb quiet
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c0774000 soft=c0754000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1868804k/2025092k available (2208k kernel code, 155032k reserved, 921k data, 232k init, 1107588k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
hpet0: at MMIO 0xfed00000 (virtual 0xf8800000), IRQs 2, 8, 0, 0, 0, 0, 0, 0
hpet0: 8 64-bit timers, 14318180 Hz
Using HPET for base-timer
Calibrating delay loop (skipped), value calculated using timer frequency.. 5852.23 BogoMIPS (lpj=2926118)
Security Framework v1.0.0 initialized
SELinux: Initializing.
SELinux: Starting in permissive mode
selinux_register_security: Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512
CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000 0408e3bd 00000000 00000001
CPU: After vendor identify, caps: bfebfbff 20100000 00000000 00000000 0408e3bd 00000000 00000001
monitor/mwait feature present.
using mwait in idle threads.
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 3072K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU: After all inits, caps: bfebf3ff 20100000 00000000 00000940 0408e3bd 00000000 00000001
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
ACPI: Core revision 20060707
ACPI Warning (utinit-0077): Invalid FADT value PM2_CNT_LEN=0 at offset 5A FADT=f7ff9780 [20060707]
CPU0: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz stepping 0a
SMP alternatives: switching to SMP code
Booting processor 1/1 eip 11000
CPU 1 irqstacks, hard=c0775000 soft=c0755000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5851.96 BogoMIPS (lpj=2925983)
CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000 0408e3bd 00000000 00000001
CPU: After vendor identify, caps: bfebfbff 20100000 00000000 00000000 0408e3bd 00000000 00000001
monitor/mwait feature present.
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 3072K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU: After all inits, caps: bfebf3ff 20100000 00000000 00000940 0408e3bd 00000000 00000001
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz stepping 0a
Total of 2 processors activated (11704.20 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
Using local APIC timer interrupts.
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
sizeof(vma)=84 bytes
sizeof(page)=32 bytes
sizeof(inode)=340 bytes
sizeof(dentry)=136 bytes
sizeof(ext3inode)=492 bytes
sizeof(buffer_head)=52 bytes
sizeof(skbuff)=176 bytes
migration_cost=36
checking if image is initramfs... it is
Freeing initrd memory: 2622k freed
HP Compaq Laptop series board detected. Selecting BIOS-method for reboots.
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using MMCONFIG
Setting up standard PCI resources
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
AetLabel: Initializing
NetLabel: domain hash size = 128
NetLabel: protocols = UNLABELED CIPSOv4
NetLabel: unlabeled traffic allowed by default
pnp: 00:0c: ioport range 0x4d0-0x4d1 has been reserved
pnp: 00:0d: ioport range 0x400-0x41f could not be reserved
pnp: 00:0d: ioport range 0x420-0x43f has been reserved
pnp: 00:0d: ioport range 0x440-0x45f has been reserved
pnp: 00:0d: ioport range 0x460-0x47f has been reserved
pnp: 00:0d: ioport range 0x800-0x87f has been reserved
pnp: 00:0d: ioport range 0x880-0x8ff has been reserved
pnp: 00:0d: ioport range 0xf800-0xf81f could not be reserved
pnp: 00:0d: ioport range 0xf820-0xf83f could not be reserved
pnp: 00:0e: iomem range 0x0-0x9ffff could not be reserved
pnp: 00:0e: iomem range 0x100000-0x7dffffff could not be reserved
pnp: 00:0e: iomem range 0xe4000-0xfffff could not be reserved
pnp: 00:0e: iomem range 0xfec01000-0xfecfffff has been reserved
PCI: Bridge: 0000:00:1c.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.1
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:1e.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Setting latency timer of device 0000:00:1c.0 to 64
ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 21 (level, low) -> IRQ 169
Phci_hcd 0000:00:1a.7: debug port 1
PCI: cache line size of 32 is not supported by device 0000:00:1a.7
ehci_hcd 0000:00:1a.7: irq 209, io mem 0xf0526800
ehci_hcd 0000:00:1a.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 6 ports detected
ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 20 (level, low) -> IRQ 217
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 2
ehci_hcd 0000:00:1d.7: debug port 1
PCI: cache line size of 32 is not supported by device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: irq 217, io mem 0xf0526c00
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 6 ports detected
ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
USB Universal Host Controller Interface driver v3.0
ACPI: PCI Interrupt 0000:00:1a.0[A] -> GSI 20 (level, low) -> IRQ 217
PCI: Setting latency timer of device 0000:00:1a.0 to 64
uhci_hcd 0000:00:1a.0: UHCI Host Controller
uhci_hcd 0000:00:1a.0: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1a.0: irq 217, io base 0x00001120
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1a.1[B] -> GSI 21 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:1a.1 to 64
uhci_hcd 0000:00:1a.1: UHCI Host Controller
uhci_hcd 0000:00:1a.1: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:1a.1: irq 169, io base 0x00001140
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1a.2[C] -> GSI 22 (level, low) -> IRQ 209
PCI: Setting latency timer of device 0000:00:1a.2 to 64
uhci_hcd 0000:00:1a.2: UHCI Host Controller
uhci_hcd 0000:00:1a.2: new USB bus registered, assigned bus number 5
uhci_hcd 0000:00:1a.2: irq 209, io base 0x00001160
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
input: ImPS/2 Logitech Wheel Mouse as /class/input/input1
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 20 (level, low) -> IRQ 217
PCI: Setting latency timer of device 0000:00:1d.0 to 64
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 6
Bluetooth: RFCOMM ver 1.8
Bluetooth: HIDP (Human Interface Emulation) ver 1.1
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
Bridge firewalling registered
[drm] Initialized drm 1.0.1 20051102
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 90
[drm] Initialized i915 1.8.0 20060929 on minor 0
set status page addr 0x01fff000
SysRq : Trigger a crashdump
crash> bt
PID: 315 TASK: f5b9caa0 CPU: 0 COMMAND: "bash"
#0 [eff0cee8] crash_kexec at c0445cbc
#1 [eff0cf2c] __handle_sysrq at c054ed56
#2 [eff0cf58] uptime_read_proc at c04ad81e
#3 [eff0cf64] proc_delete_inode at c04a8882
#4 [eff0cf84] vfs_write at c047842e
#5 [eff0cf9c] sys_write at c0478a55
#6 [eff0cfb8] system_call at c0404f44
EAX: ffffffda EBX: 00000001 ECX: b7f72000 EDX: 00000002
DS: 007b ESI: 00000002 ES: 007b EDI: b7f72000
SS: 007b ESP: bf88ff9c EBP: bf88ffb8
CS: 0073 EIP: 00b79f9e ERR: 00000004 EFLAGS: 00000246
crash> vm
PID: 315 TASK: f5b9caa0 CPU: 0 COMMAND: "bash"
MM PGD RSS TOTAL_VM
f7882580 f07f7000 1564k 4744k
VMA START END FLAGS FILE
f4597df4 16e000 16f000 8040075
f45934c4 48f000 499000 75 /lib/libnss_files-2.5.so
f4593ca4 499000 49b000 100073 /lib/libnss_files-2.5.so
f45979b0 a97000 ab1000 875 /lib/ld-2.5.so
f4597cf8 ab1000 ab3000 100873 /lib/ld-2.5.so
f4593a58 ab5000 c0c000 75 /lib/libc-2.5.so
f45937b8 c0c000 c0d000 70 /lib/libc-2.5.so
f45932cc c0d000 c0e000 100071 /lib/libc-2.5.so
f459380c c0e000 c10000 100073 /lib/libc-2.5.so
f4593374 c10000 c13000 100073
f4593a04 c40000 c43000 75 /lib/libdl-2.5.so
f4593b00 c43000 c45000 100073 /lib/libdl-2.5.so
f4597ba8 7d1f000 7d22000 75 /lib/libtermcap.so.2.0.8
f45938b4 7d22000 7d23000 100073 /lib/libtermcap.so.2.0.8
f4597f44 8047000 80f6000 1875 /bin/bash
f459795c 80f6000 80fb000 101873 /bin/bash
f4597b54 80fb000 8100000 100073
f4597ca4 9aab000 9aea000 100073
f4593f44 b7d53000 b7f53000 71 /usr/lib/locale/locale-archive
f4597710 b7f53000 b7f55000 100073
f79d0b00 b7f71000 b7f73000 100073
f7b83aac bf87c000 bf892000 100173
No comments:
Post a Comment