网硕互联技术交流社区

 找回密码
 立即注册
搜索
热搜: 活动 交友 discuz
查看: 2946|回复: 0

Centos 频繁宕机

[复制链接]

主题

帖子

0

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
0
发表于 2017-9-6 14:17:48 | 显示全部楼层 |阅读模式
1、查看dmesg日志


  1. grep -E "error|Error|ERROR|fail|Fail|FAIL" dmesg
  2. [Hardware Error]: This system BIOS has enabled interrupt remapping
  3. ERST: Error Record Serialization Table (ERST) support is initialized.
  4. ACPI Error: No handler for Region [IPMI] (ffff88042a610300) [IPMI] (20090903/evregion-319)
  5. ACPI Error: Region IPMI(7) has no handler (20090903/exfldio-295)
  6. ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PMI0._GHL] (Node ffff88082a7aeab0), AE_NOT_EXIST
  7. ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PMI0._PMC] (Node ffff88082a7aeb00), AE_NOT_EXIST
复制代码
2、查看messages日志


  1. grep -E "error|Error|ERROR|fail|Fail|FAIL" messages
  2. Mar 15 11:32:07 rsyslogd: UDP message reception disabled due to error logged in last message.
  3. Mar 15 11:32:07 kernel: [Hardware Error]: This system BIOS has enabled interrupt remapping
  4. Mar 15 11:32:07 kernel: ERST: Error Record Serialization Table (ERST) support is initialized.
  5. Mar 15 11:32:07 kernel: ACPI Error: No handler for Region [IPMI] (ffff88042a610300) [IPMI] (20090903/evregion-319)
  6. Mar 15 11:32:07 kernel: ACPI Error: Region IPMI(7) has no handler (20090903/exfldio-295)
  7. Mar 15 11:32:07 kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PMI0._GHL] (Node ffff88082a7aeab0), AE_NOT_EXIST
  8. Mar 15 11:32:07 kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PMI0._PMC] (Node ffff88082a7aeb00), AE_NOT_EXIST
  9. Mar 15 11:32:14 mcelog: failed to prefill DIMM database from DMI data
  10. Mar 15 11:32:14 snmpd[20522]: /etc/snmp/snmpd.conf: line 61: Error: ERROR: This output format has been deprecated - Please use the ‘extend‘ directive instead
  11. Mar 15 11:32:14 snmpd[20522]: /etc/snmp/snmpd.conf: line 62: Error: ERROR: This output format has been deprecated - Please use the ‘extend‘ directive instead
  12. Mar 15 11:32:14 snmpd[20522]: /etc/snmp/snmpd.conf: line 63: Error: ERROR: This output format has been deprecated - Please use the ‘extend‘ directive instead
  13. Mar 15 11:32:14 snmpd[20522]: /etc/snmp/snmpd.conf: line 64: Error: ERROR: This output format has been deprecated - Please use the ‘extend‘ directive instead
  14. Mar 15 11:32:14 snmpd[20522]: /etc/snmp/snmpd.conf: line 65: Error: ERROR: This output format has been deprecated - Please use the ‘extend‘ directive instead
  15. Mar 15 11:32:14 snmpd[20522]: /etc/snmp/snmpd.conf: line 66: Error: ERROR: This output format has been deprecated - Please use the ‘extend‘ directive instead
  16. Mar 15 11:32:14 snmpd[20522]: /etc/snmp/snmpd.conf: line 67: Error: ERROR: This output format has been deprecated - Please use the ‘extend‘ directive instead
  17. Mar 15 11:32:14 snmpd[20522]: /etc/snmp/snmpd.conf: line 68: Error: ERROR: This output format has been deprecated - Please use the ‘extend‘ directive instead
  18. Mar 15 11:32:14 snmpd[20522]: net-snmp: 8 error(s) in config file(s)
  19. Mar 15 11:32:37 kernel: oddjobd[26628]: segfault at 3700000000 ip 00000037975292b0 sp 00007fff5397ef28 error 4 in libc-2.12.so[3797400000+18a000]
  20. Mar 15 11:32:37 oddjobd: oddjobd startup failed
复制代码



3、查看kdump的log文件


  1. grep -E "error|Error|ERROR|fail|Fail|FAIL" vmcore-dmesg.txt
  2. <4>[Hardware Error]: This system BIOS has enabled interrupt remapping
  3. <6>ERST: Error Record Serialization Table (ERST) support is initialized.
  4. <4>ACPI Error: No handler for Region [IPMI] (ffff88042a610300) [IPMI] (20090903/evregion-319)
  5. <4>ACPI Error: Region IPMI(7) has no handler (20090903/exfldio-295)
  6. <4>ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PMI0._GHL] (Node ffff88082a7aeab0), AE_NOT_EXIST
  7. <4>ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PMI0._PMC] (Node ffff88082a7aeb00), AE_NOT_EXIST
  8. <6>oddjobd[26656]: segfault at 3700000000 ip 00000037975292b0 sp 00007fffb3fd00d8 error 4 in libc-2.12.so[3797400000+18a000]
  9. <6>chntf[42911]: segfault at 10 ip 000000000045afb4 sp 00007f374386b940 error 4 in chntf[400000+12e000]
  10. <6>dmagic93[26512] general protection ip:3797476065 sp:7f20c803b500 error:0 in libc-2.12.so[3797400000+18a000]
复制代码



分析:

由于BIOS中开启了中断重映射(这是个复杂的东西,也可以简单讲明白,我就不讲了。),在ERST(芯片集中的错误校验表)校验时发生错误,导致高级配置电源管理模块无法处理IPMI驱动请求,预存数据到内存发生错误,引发kernel上演了一出找到空指针的戏法。


解决方法:

在grub.conf的内核启动参数中添加 intremap=off 或者 intremap=no_x2apic_optout

intremap={on,off,nosid,no_x2apic_optout}

    on(默认值)开启中断重映射,BIOS中默认开启

    off 关闭中断重映射

    nosid 重映射时不对SID(Source ID)做检查

    no_x2apic_optout 无视BIOS的设置,强制禁用x2APIC特性,主要用于解决某些对x2APIC支持有缺陷的BIOS导致的故障


回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

Archiver|手机版|小黑屋|网硕互联技术交流社区

GMT+8, 2024-4-28 07:53 , Processed in 0.222805 second(s), 18 queries .

Powered by Discuz! X3.4

Copyright © 2001-2021, Tencent Cloud.

快速回复 返回顶部 返回列表