[电子疑难病] 无规律的Ubuntu 24死机现象, 求赛博专家会诊
本帖最后由 WiiGe 于 2025-6-11 07:46 编辑最新一次死机的日志:https://share.wiige.top/upload/er5yBd
=================================================================================
更新简单脱敏后的日志: https://share.wiige.top/p/aAAYza
=================================================================================
我弄了一台HP的Z8G4工作站当HomeServer, 系统Ubuntu 24 LTS, 所有应用均docker化与系统隔离, 主要就是Nextcloud, Komga, Plex, Gitlab之类的东西
配置为:
Z8G4: https://support.hp.com/us-en/dri ... orkstation/16449803
内存有128GB DDR4 ECC(已隐去空槽信息):
sudo dmidecode -t memory
# dmidecode 3.5
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.
Handle 0x000F, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Single-bit ECC
Maximum Capacity: 4608 GB
Error Information Handle: Not Provided
Number Of Devices: 12
Handle 0x0010, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x000F
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 64 GB
Form Factor: DIMM
Set: None
Locator: CPU0-DIMM1
Bank Locator: CPU0
Type: DDR4
Type Detail: Synchronous LRDIMM
Speed: 2666 MT/s
Manufacturer: Samsung
Serial Number: 398E97A7
Asset Tag: Not Specified
Part Number: M386A8K40BM2-CTD
Rank: 4
Configured Memory Speed: 2666 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Handle 0x001D, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Single-bit ECC
Maximum Capacity: 4608 GB
Error Information Handle: Not Provided
Number Of Devices: 12
Handle 0x001E, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x001D
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 64 GB
Form Factor: DIMM
Set: None
Locator: CPU1-DIMM1
Bank Locator: CPU1
Type: DDR4
Type Detail: Synchronous LRDIMM
Speed: 2666 MT/s
Manufacturer: Samsung
Serial Number: 398F3D61
Asset Tag: Not Specified
Part Number: M386A8K40BM2-CTD
Rank: 4
Configured Memory Speed: 2666 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
有几片SSD:
$sudo smartctl --all /dev/nvme0
=== START OF INFORMATION SECTION ===
Model Number: INTEL SSDPED1D280GA
Serial Number: PHMB75160057280CGN
Firmware Version: E2010480
PCI Vendor/Subsystem ID: 0x8086
IEEE OUI Identifier: 0x5cd2e4
Controller ID: 0
NVMe Version: <1.2
Number of Namespaces: 1
Namespace 1 Size/Capacity: 280,065,171,456
Namespace 1 Formatted LBA Size: 512
Local Time is: Sun Jun1 01:58:46 2025 CST
Firmware Updates (0x02): 1 Slot
Optional Admin Commands (0x0007): Security Format Frmw_DL
Optional NVM Commands (0x0006): Wr_Unc DS_Mngmt
Log Page Attributes (0x0a): Cmd_Eff_Lg Telmtry_Lg
Maximum Data Transfer Size: 32 Pages
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
$ sudo smartctl --all /dev/nvme1
=== START OF INFORMATION SECTION ===
Model Number: MEMBLAZE P6530CH0384M00
Serial Number: SH220600806
Firmware Version: 0A101Z00
PCI Vendor/Subsystem ID: 0x1c5f
IEEE OUI Identifier: 0x00e004
Total NVM Capacity: 3,840,755,982,336
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 3,840,755,982,336
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 38b19e 7418032601
Local Time is: Sun Jun1 01:59:07 2025 CST
Firmware Updates (0x17): 3 Slots, Slot 1 R/O, no Reset required
Optional Admin Commands (0x001f): Security Format Frmw_DL NS_Mngmt Self_Test
Optional NVM Commands (0x0054): DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size: 128 Pages
WarningComp. Temp. Threshold: 70 Celsius
Critical Comp. Temp. Threshold: 77 Celsius
Namespace 1 Features (0x08): No_ID_Reuse
Supported Power States
St Op Max Active Idle RL RT WL WTEnt_LatEx_Lat
0 + 25.00W - - 0000 0 0
1 + 14.00W - - 0000 0 0
2 + 13.00W - - 0000 0 0
3 + 12.00W - - 0000 0 0
4 + 11.00W - - 0000 0 0
5 + 10.00W - - 0000 0 0
6 + 9.00W - - 0000 0 0
7 + 8.00W - - 0000 0 0
8 + 7.00W - - 0000 0 0
9 + 6.00W - - 0000 0 0
Supported LBA Sizes (NSID 0x1)
Id FmtDataMetadtRel_Perf
0 + 512 0 2
1 - 4096 0 0
2 - 512 8 2
3 - 4096 8 0
4 - 4096 64 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
$sudo smartctl --all /dev/nvme2
=== START OF INFORMATION SECTION ===
Model Number: Fanxiang S500Pro 2TB
Serial Number: FXS500Pro243954233
Firmware Version: SN12517
PCI Vendor/Subsystem ID: 0x1e4b
IEEE OUI Identifier: 0x000000
Total NVM Capacity: 2,048,408,248,320
Unallocated NVM Capacity: 0
Controller ID: 0
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,048,408,248,320
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 000000 0243954233
Local Time is: Sun Jun1 01:59:09 2025 CST
Firmware Updates (0x1a): 5 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x02): Cmd_Eff_Lg
Maximum Data Transfer Size: 128 Pages
WarningComp. Temp. Threshold: 90 Celsius
Critical Comp. Temp. Threshold: 95 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WTEnt_LatEx_Lat
0 + 6.50W - - 0000 0 0
1 + 5.80W - - 1111 0 0
2 + 3.60W - - 2222 0 0
3 - 0.7460W - - 3333 5000 10000
4 - 0.7260W - - 4444 8000 45000
Supported LBA Sizes (NSID 0x1)
Id FmtDataMetadtRel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
$ sudo smartctl --all /dev/nvme3
=== START OF INFORMATION SECTION ===
Model Number: MEMBLAZE P6530CH0384M00
Serial Number: SH220600832
Firmware Version: 0A101Z00
PCI Vendor/Subsystem ID: 0x1c5f
IEEE OUI Identifier: 0x00e004
Total NVM Capacity: 3,840,755,982,336
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 3,840,755,982,336
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 38b19e 7418034001
Local Time is: Sun Jun1 01:59:11 2025 CST
Firmware Updates (0x17): 3 Slots, Slot 1 R/O, no Reset required
Optional Admin Commands (0x001f): Security Format Frmw_DL NS_Mngmt Self_Test
Optional NVM Commands (0x0054): DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size: 128 Pages
WarningComp. Temp. Threshold: 70 Celsius
Critical Comp. Temp. Threshold: 77 Celsius
Namespace 1 Features (0x08): No_ID_Reuse
Supported Power States
St Op Max Active Idle RL RT WL WTEnt_LatEx_Lat
0 + 25.00W - - 0000 0 0
1 + 14.00W - - 0000 0 0
2 + 13.00W - - 0000 0 0
3 + 12.00W - - 0000 0 0
4 + 11.00W - - 0000 0 0
5 + 10.00W - - 0000 0 0
6 + 9.00W - - 0000 0 0
7 + 8.00W - - 0000 0 0
8 + 7.00W - - 0000 0 0
9 + 6.00W - - 0000 0 0
Supported LBA Sizes (NSID 0x1)
Id FmtDataMetadtRel_Perf
0 + 512 0 2
1 - 4096 0 0
2 - 512 8 2
3 - 4096 8 0
4 - 4096 64 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
CPU就捡的洋垃圾6138:
$ cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
80Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
弄了张3070 8G 涡轮跑点小模型:
$ lspci | grep -i nvidia
症状表现:
7*24小时运行, 但时不时会死机, 死机间隔短则十数小时, 长则数月. 据统计70%的死机时间在00:00-09:00这段时间, 剩下的30%死机随机分布于其他时间段, 但00:00-09:00这段时间主要就是Plex的声纹分析和媒体分析之类的后台工作
死机时直连显示器无光标闪烁, 直连键盘输入无响应, CapsLoack和ScrLock无响应. LAN内SSH无响应, Intel AMT无响应
机箱内未见故障灯(这内部是真好看呐):
死机时候可明显听到风扇拉满, 至少处于80%+的高转速, 因为平时Z8G4非常安静
除了Hard reset之外, 这个问题从没被我真正解决过
大概是入手这套配置1-2个月后开始出现, 该症状已经持续超过一年
我做了什么:
检查机箱内温度, 发现最高也就62°C, CPU占用率平均低于47%, 某些单核会持续100%(后台转码情有可原)
怀疑是显卡过热所以没显示输出, nvitop看了发现也就不到60°C
怀疑是swap满了卡死, 检查内核日志无OOM字样, 将swap扩到64G问题依旧
怀疑是网卡过热掉了, 发现显示输出也是卡死, 这机要死应是大家齐齐整整一起死了
怀疑是运行环境太热, 切到7*24小时开空调的卧室, 症状依旧, 死机频率仍旧不可琢磨
怀疑是杂牌电饭锅线吃不住1800W的电源, 换了根原装备件电源线, 问题依旧
怀疑是内存不稳, 在BIOS里跑了3小时内存测试, 全绿通过
怀疑是电源不对, 看了原装备件价格, 被劝退
怀疑是主板有问题, 但我没有证据, 店家: 做好散热不要乱猜
挠头: 难不成还能是CPU单核满载把自己热死了?
类似病症:
https://h30471.www3.hp.com/t5/tai-shi-dian-nao/z8g4jing-chang-si-ji/m-p/1134658-> 我没有它那样的故障灯
https://h30471.www3.hp.com/t5/tai-shi-dian-nao/HP-Z8G4-928-PCIE-yan-zhong-chao-shi-si-ji-zhong-qi/m-p/1261763 -> 我没有它那样的POST报错
https://h30471.www3.hp.com/t5/tai-shi-dian-nao/hui-puZ8G4gong-zuo-zhanCPU-wen-du-guo-gao-jing-chang-si-ji/m-p/1264789 -> 我观察了下我没有那么高的温度(数据来自cockpit)
https://h30471.www3.hp.com/t5/tai-shi-dian-nao/HPZ8G4-pin-fan-si-ji/m-p/1077573 -> 和我的状况很像, 我看了看也妹有啥浮灰影响散热
现在我没啥思路了. 就想着发来这里集思广益, 看看诸君有没有什么好点子可以找到这个错误的缘由, 按理说这机器就是散热强力设计稳定, 是个只为性能释放而生的机器, 怎么没事就死机叻? 我不明白(奉化口音)
如有需要我可以贴一下dmesg和journalctl -k的输出, 但我看着这东西又臭又长大家估计不怎么想看......?
装win,看下有没有whea,有就换电源 服务还在跑唉能不能不用再装个系统......能用dmesg之类的直接诊断吗? 如果你猜是电源问题, 那我在dmesg里面确实看到了几个ACPI错误:
[ 7.234940] i2c i2c-0: Systems with more than 8 memory slots not supported yet, not instantiating SPD
[ 7.234950] ioatdma 0000:00:04.1: enabling device (0004 -> 0006)
[ 7.259027] ioatdma 0000:00:04.2: enabling device (0004 -> 0006)
[ 7.280643] ioatdma 0000:00:04.3: enabling device (0004 -> 0006)
[ 7.298799] ioatdma 0000:00:04.4: enabling device (0004 -> 0006)
[ 7.316247] ioatdma 0000:00:04.5: enabling device (0004 -> 0006)
[ 7.330358] ioatdma 0000:00:04.6: enabling device (0004 -> 0006)
[ 7.342621] ioatdma 0000:00:04.7: enabling device (0004 -> 0006)
[ 7.345615] gnss: GNSS driver registered with major 236
[ 7.347248] ACPI Error: Needed , found 00000000bbfdd616 (20230628/exresop-557)
[ 7.348108] ACPI Error: AE_AML_OPERAND_TYPE, While resolving operands for (20230628/dswexec-433)
[ 7.348125]
Initialized Local Variables for Method :
[ 7.348127] Local1: 00000000b7f14b0c <Obj> Buffer(12) 00 00 00 00 00 00 00 00
[ 7.348140] Initialized Arguments for Method :(2 arguments defined for method invocation)
[ 7.348142] Arg0: 000000009fb42d8e <Obj> Integer 0000000000000004
[ 7.348147] Arg1: 00000000bbfdd616 <Obj> Integer 0000000000000000
[ 7.348156] ACPI Error: Aborting method \_SB.WMIV.WVPO due to previous error (AE_AML_OPERAND_TYPE) (20230628/psparse-529)
[ 7.348166] ACPI Error: Aborting method \_SB.WMIV.WMPV due to previous error (AE_AML_OPERAND_TYPE) (20230628/psparse-529)
[ 7.357943] ACPI Error: Needed , found 000000004f09eef4 (20230628/exresop-557)
[ 7.358106] ioatdma 0000:80:04.0: enabling device (0004 -> 0006)
[ 7.359974] ACPI Error: AE_AML_OPERAND_TYPE, While resolving operands for (20230628/dswexec-433)
[ 7.361874]
Initialized Local Variables for Method :
[ 7.361879] Local1: 00000000ea3c534d <Obj> Buffer(12) 00 00 00 00 00 00 00 00
[ 7.361921] Initialized Arguments for Method :(2 arguments defined for method invocation)
[ 7.361926] Arg0: 00000000ac004b87 <Obj> Integer 0000000000000004
[ 7.361943] Arg1: 000000004f09eef4 <Obj> Integer 0000000000000000
[ 7.361972] ACPI Error: Aborting method \_SB.WMIV.WVPO due to previous error (AE_AML_OPERAND_TYPE) (20230628/psparse-529)
[ 7.363854] ACPI Error: Aborting method \_SB.WMIV.WMPV due to previous error (AE_AML_OPERAND_TYPE) (20230628/psparse-529)
[ 7.369425] ACPI Error: Needed , found 00000000305152d8 (20230628/exresop-557)
[ 7.370762] ACPI Error: AE_AML_OPERAND_TYPE, While resolving operands for (20230628/dswexec-433)
[ 7.372121]
Initialized Local Variables for Method :
[ 7.372124] Local1: 0000000051f950d8 <Obj> Buffer(136) 00 00 00 00 00 00 00 00
[ 7.372162] Initialized Arguments for Method :(2 arguments defined for method invocation)
[ 7.372167] Arg0: 000000007cce02b3 <Obj> Integer 0000000000000080
[ 7.372183] Arg1: 00000000305152d8 <Obj> Integer 0000000000000000
[ 7.372206] ACPI Error: Aborting method \_SB.WMIV.WVPO due to previous error (AE_AML_OPERAND_TYPE) (20230628/psparse-529)
[ 7.372276] workqueue: work_for_cpu_fn hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND
[ 7.373520] ACPI Error: Aborting method \_SB.WMIV.WMPV due to previous error (AE_AML_OPERAND_TYPE) (20230628/psparse-529)
[ 7.374961] input: HP WMI hotkeys as /devices/virtual/input/input9
[ 7.378132] ACPI Error: Needed , found 00000000e263359f (20230628/exresop-557)
[ 7.379034] ACPI Error: AE_AML_OPERAND_TYPE, While resolving operands for (20230628/dswexec-433)
[ 7.379951]
Initialized Local Variables for Method :
[ 7.379953] Local1: 00000000c51b5c99 <Obj> Buffer(136) 00 00 00 00 00 00 00 00
[ 7.379973] Initialized Arguments for Method :(2 arguments defined for method invocation)
[ 7.379975] Arg0: 00000000e5261626 <Obj> Integer 0000000000000080
[ 7.379983] Arg1: 00000000e263359f <Obj> Integer 0000000000000000
[ 7.379995] ACPI Error: Aborting method \_SB.WMIV.WVPO due to previous error (AE_AML_OPERAND_TYPE) (20230628/psparse-529)
[ 7.380921] ACPI Error: Aborting method \_SB.WMIV.WMPV due to previous error (AE_AML_OPERAND_TYPE) (20230628/psparse-529)
[ 7.389494] ioatdma 0000:80:04.1: enabling device (0004 -> 0006)
[ 7.405383] ioatdma 0000:80:04.2: enabling device (0004 -> 0006)
[ 7.419566] ioatdma 0000:80:04.3: enabling device (0004 -> 0006)
[ 7.429298] ice: Intel(R) Ethernet Connection E800 Series Linux Driver
[ 7.429301] ice: Copyright (c) 2018, Intel Corporation.
[ 7.432372] ioatdma 0000:80:04.4: enabling device (0004 -> 0006)
[ 7.433469] Creating 1 MTD partitions on "0000:00:1f.5":
[ 7.433498] 0x000000000000-0x000002000000 : "BIOS"
[ 7.444463] ioatdma 0000:80:04.5: enabling device (0004 -> 0006)
[ 7.456213] ioatdma 0000:80:04.6: enabling device (0004 -> 0006)
[ 7.467575] ioatdma 0000:80:04.7: enabling device (0004 -> 0006)
[ 7.468405] snd_hda_intel 0000:00:1f.3: enabling device (0140 -> 0142)
[ 7.469170] snd_hda_intel 0000:2d:00.1: enabling device (0140 -> 0142)
[ 7.469332] snd_hda_intel 0000:2d:00.1: Disabling MSI
[ 7.469397] snd_hda_intel 0000:2d:00.1: Handle vga_switcheroo audio client
这样的内核日志能定位出是电源问题吗? 死机前的日志没什么可疑点或者共同的特征吗?先看死机前状态想办法复现吧,实在没头绪我也觉得可以换个系统或者起码换个内核试试,docker 化的应用应该很好迁移才对 本帖最后由 WiiGe 于 2025-6-1 12:53 编辑
indtability 发表于 2025-6-1 12:11
死机前的日志没什么可疑点或者共同的特征吗?先看死机前状态想办法复现吧,实在没头绪我也觉得可以换个系统 ...
我贴个journalctl -p err 给你看看哦 死机状态我并不能稳定复现, 只是每次死机症状完全一致因此让我认为这应是同一个问题.
话说还有什么其他可以查看历史日志的工具吗? 我也就知道个 journalctl 和 dmesg......
大夫有需要的话我都可以贴出来的 先跑硬件测试,比如内存测试
再确定软件问题 日志下载下来只有8k,没法打开看,不过日志要是没有明显错误或者看不出死机规律的话,感觉也没啥好办法,只能建议换成更上游的内核或者发行版来碰碰运气了。 chachi 发表于 2025-6-1 13:16
先跑硬件测试,比如内存测试
再确定软件问题
内存测试已通过, 全绿 跑个gpu stress test试试? 开机报的那些apci意义不大。journalctl -b -1 -k看能不能抓到上次死机现场。
—— 来自 HONOR PTP-AN10, Android 15, 鹅球 v3.5.99 本帖最后由 WiiGe 于 2025-6-1 23:24 编辑
calmer 发表于 2025-6-1 17:47
开机报的那些apci意义不大。journalctl -b -1 -k看能不能抓到上次死机现场。
—— 来自 HONOR PTP-AN10, A ...
你说得对我也感觉这些ACPI错误似乎不是关键的问题, 不过journalctl -b -1 -k只能拿到5月26号之后的内容, 所以我尝试了 journalctl > all.log, 得到了一个500M的日志. 以下是我的一些观察:
日志中最多的错误是Feb 26 00:23:47 z8g4-portal dockerd: time="2025-02-25T16:23:47.091392377Z" level=error msg=" failed to query external DNS server" client-addr="udp:127.0.0.1:42370" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:42370->127.0.0.53:53: i/o timeout" question=";tracker.tiny-**.com.\tIN\t A"这种内容, 在某次重启后能看到超过4万行的错误(行号1858635-1890899):
这类错误发生过多次, 几乎每次重启后执行一段时间都会大量出现, 其中夹杂一些:
May 31 01:02:49 z8g4-portal dockerd: time="2025-05-31T01:02:49.706566899+08:00" level=error msg=" failed to query external DNS server" client-addr="udp:127.0.0.1:49889" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:49889->127.0.0.53:53: i/o timeout" question=";share.rayser.cn.\tIN\t A"
May 31 01:03:13 z8g4-portal systemd-timesyncd: Timed out waiting for reply from 198.18.26.128:123 (ntp.ubuntu.com).
May 31 01:05:01 z8g4-portal CRON: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 31 01:05:01 z8g4-portal CRON: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 31 01:05:01 z8g4-portal CRON: pam_unix(cron:session): session closed for user root
May 31 01:05:35 z8g4-portal dockerd: time="2025-05-31T01:05:35.012524229+08:00" level=error msg=" failed to query external DNS server" client-addr="udp:127.0.0.1:42567" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:42567->127.0.0.53:53: i/o timeout" question=";tracker.bt4g.com.\tIN\t A"
或者
May 31 00:57:13 z8g4-portal dockerd: time="2025-05-31T00:57:13.910263128+08:00" level=error msg=" failed to query external DNS server" client-addr="udp:127.0.0.1:46779" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:46779->127.0.0.53:53: i/o timeout" question=";share.rayser.cn.\tIN\t A"
May 31 00:58:13 z8g4-portal systemd: Starting pmie_check.service - Check PMIE instances are running...
May 31 00:58:13 z8g4-portal systemd: Starting pmie_farm_check.service - Check and migrate non-primary pmie farm instances...
May 31 00:58:13 z8g4-portal systemd: Started pmie_check.service - Check PMIE instances are running.
May 31 00:58:13 z8g4-portal systemd: Started pmie_farm_check.service - Check and migrate non-primary pmie farm instances.
May 31 00:58:13 z8g4-portal systemd: pmie_farm_check.service: Deactivated successfully.
May 31 00:58:13 z8g4-portal systemd: pmie_check.service: Deactivated successfully.
May 31 01:00:01 z8g4-portal systemd: Starting sysstat-collect.service - system activity accounting tool...
May 31 01:00:01 z8g4-portal systemd: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
May 31 01:00:01 z8g4-portal systemd: sysstat-collect.service: Deactivated successfully.
May 31 01:00:01 z8g4-portal systemd: Finished sysstat-collect.service - system activity accounting tool.
May 31 01:00:01 z8g4-portal systemd: systemd-tmpfiles-clean.service: Deactivated successfully.
May 31 01:00:01 z8g4-portal systemd: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
May 31 01:01:04 z8g4-portal dockerd: time="2025-05-31T01:01:04.350097803+08:00" level=error msg=" failed to query external DNS server" client-addr="udp:127.0.0.1:59390" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:59390->127.0.0.53:53: i/o timeout" question=";sparkle.ghostchu.com.\tIN\t A"
的错误, 看上去是NTP的问题. 根据日志来看, 带-- Boot XXXXXXXXXXXXXXX -- 的记录有29条, 即journalctl记载了29次重启, 以这个 failed to query external DNS server 后紧跟重启的记录有14次, 占比约48%, 我感觉可能这和我遇到的多次死机症状有所关联? 大致形式如下:
Mar 04 10:18:24 z8g4-portal dockerd: time="2025-03-04T02:18:24.008409267Z" level=error msg=" failed to query external DNS server" client-addr="udp:127.0.0.1:47004" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:47004->127.0.0.53:53: i/o timeout" question=";seeders-paradise.org.\tIN\t AAAA"
Mar 04 10:18:24 z8g4-portal dockerd: time="2025-03-04T02:18:24.008511054Z" level=error msg=" failed to query external DNS server" client-addr="udp:127.0.0.1:57097" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:57097->127.0.0.53:53: i/o timeout" question=";seeders-paradise.org.\tIN\t A"
Mar 04 10:18:29 z8g4-portal dockerd: time="2025-03-04T02:18:29.053045662Z" level=error msg=" failed to query external DNS server" client-addr="udp:127.0.0.1:34618" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:34618->127.0.0.53:53: i/o timeout" question=";highteahop.top.\tIN\t AAAA"
Mar 04 10:18:29 z8g4-portal dockerd: time="2025-03-04T02:18:29.053119935Z" level=error msg=" failed to query external DNS server" client-addr="udp:127.0.0.1:50855" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:50855->127.0.0.53:53: i/o timeout" question=";highteahop.top.\tIN\t A"
Mar 04 10:20:04 z8g4-portal systemd: Starting sysstat-collect.service - system activity accounting tool...
Mar 04 10:20:04 z8g4-portal systemd: sysstat-collect.service: Deactivated successfully.
Mar 04 10:20:04 z8g4-portal systemd: Finished sysstat-collect.service - system activity accounting tool.
-- Boot b2563ed2eccc4cb0b982bd5928adb2a7 --
Mar 04 13:07:50 z8g4-portal kernel: Linux version 6.8.0-54-generic (buildd@lcy02-amd64-083) (x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #56-Ubuntu SMP PREEMPT_DYNAMIC Sat Feb8 00:37:57 UTC 2025 (Ubuntu 6.8.0-54.56-generic 6.8.12)
有Journal stopped字样的重启我认为这就是正常维护性重启, 不计入死机现场, 次数16次, 占比约55%:
Mar 12 07:13:44 z8g4-portal systemd-shutdown: Syncing filesystems and block devices.
Mar 12 07:13:44 z8g4-portal kernel: BTRFS info (device dm-0): last unmount of filesystem 533af4a8-6aaf-49f7-8d01-5336db1d8307
Mar 12 07:13:59 z8g4-portal systemd-shutdown: Sending SIGTERM to remaining processes...
Mar 12 07:13:59 z8g4-portal systemd-journald: Journal stopped
-- Boot 440da79b7a1d4f44b1fbc5356f736634 --
Mar 12 07:21:35 z8g4-portal kernel: Linux version 6.8.0-55-generic (buildd@lcy02-amd64-095) (x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #57-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 12 23:42:21 UTC 2025 (Ubuntu 6.8.0-55.57-generic 6.8.12)
Mar 12 07:21:35 z8g4-portal kernel: Command line: BOOT_IMAGE=/vmlinuz-6.8.0-55-generic root=/dev/mapper/ubuntu--vg-lv--0 ro
Mar 12 07:21:35 z8g4-portal kernel: KERNEL supported cpus:
Mar 12 07:21:35 z8g4-portal kernel: Intel GenuineIntel 127.0.0.53连接失败可能是ubuntu的systemd-resolved设置有问题?不过你都用了这么久了不应该吧。而且不至于卡死整个机器吧,除非是网卡硬件问题(自己在win11遇到过一次每次重启之后10分钟左右稳定卡死,注意到每次卡死之前网络先挂,重新插拔网卡之后再无问题) 本帖最后由 WiiGe 于 2025-6-2 00:16 编辑
在这些 failed to query external DNS server 错误之前我看到一些OOM错误, 其他情况我没怎么看明白到底发生了什么, OOM这种情况大致出现了7-8次, 形式如下:
May 02 11:06:54 z8g4-portal dockerd: time="2025-05-02T03:06:54.074625447Z" level=error msg=" failed to query external DNS server" client-addr="udp:127.0.0.1:35975" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:35975->127.0.0.53:53: i/o timeout" question=";taciturn-shadow.spb.ru.\tIN\t A"
May 02 11:09:11 z8g4-portal kernel: Plex Commercial invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
May 02 11:09:11 z8g4-portal kernel: CPU: 0 PID: 287380 Comm: Plex Commercial Tainted: P OE 6.8.0-58-generic #60-Ubuntu
May 02 11:09:11 z8g4-portal kernel: Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v02.95 11/21/2024
May 02 11:09:11 z8g4-portal kernel: Call Trace:
May 02 11:09:11 z8g4-portal kernel:<TASK>
May 02 11:09:11 z8g4-portal kernel:dump_stack_lvl+0x76/0xa0
May 02 11:09:11 z8g4-portal kernel:dump_stack+0x10/0x20
May 02 11:09:11 z8g4-portal kernel:dump_header+0x47/0x1f0
May 02 11:09:11 z8g4-portal kernel:oom_kill_process+0x118/0x280
May 02 11:09:11 z8g4-portal kernel:? oom_evaluate_task+0x143/0x1e0
May 02 11:09:11 z8g4-portal kernel:out_of_memory+0x103/0x350
May 02 11:09:11 z8g4-portal kernel:__alloc_pages_may_oom+0x10c/0x1d0
May 02 11:09:11 z8g4-portal kernel:__alloc_pages_slowpath.constprop.0+0x420/0x9f0
May 02 11:09:11 z8g4-portal kernel:__alloc_pages+0x31f/0x350
May 02 11:09:11 z8g4-portal kernel:alloc_pages_mpol+0x91/0x210
May 02 11:09:11 z8g4-portal kernel:alloc_pages+0x5b/0xd0
May 02 11:09:11 z8g4-portal kernel:folio_alloc+0x15/0x40
May 02 11:09:11 z8g4-portal kernel:filemap_alloc_folio+0xf4/0x100
May 02 11:09:11 z8g4-portal kernel:__filemap_get_folio+0x195/0x2d0
May 02 11:09:11 z8g4-portal kernel:filemap_fault+0x15c/0x8e0
May 02 11:09:11 z8g4-portal kernel:? set_pte_range+0x116/0x3f0
May 02 11:09:11 z8g4-portal kernel:__do_fault+0x3a/0x190
May 02 11:09:11 z8g4-portal kernel:do_read_fault+0x133/0x200
May 02 11:09:11 z8g4-portal kernel:do_fault+0xf0/0x260
May 02 11:09:11 z8g4-portal kernel:handle_pte_fault+0x114/0x1d0
May 02 11:09:11 z8g4-portal kernel:__handle_mm_fault+0x654/0x800
May 02 11:09:11 z8g4-portal kernel:handle_mm_fault+0x18a/0x380
May 02 11:09:11 z8g4-portal kernel:do_user_addr_fault+0x169/0x670
May 02 11:09:11 z8g4-portal kernel:exc_page_fault+0x83/0x1b0
May 02 11:09:11 z8g4-portal kernel:asm_exc_page_fault+0x27/0x30
May 02 11:09:11 z8g4-portal kernel: RIP: 0033:0x709ad07aaffa
May 02 11:09:11 z8g4-portal kernel: Code: Unable to access opcode bytes at 0x709ad07aafd0.
May 02 11:09:11 z8g4-portal kernel: RSP: 002b:00007ffdb9994370 EFLAGS: 00010202
May 02 11:09:11 z8g4-portal kernel: RAX: 0000709ad07ab1c0 RBX: 0000709aceb65080 RCX: 0000000000000010
May 02 11:09:11 z8g4-portal kernel: RDX: 0000000000000002 RSI: ffffffffffffffec RDI: 0000709ad00c7580
May 02 11:09:11 z8g4-portal kernel: RBP: 0000000000000006 R08: 0000709acea212f0 R09: 0000709acea212ec
May 02 11:09:11 z8g4-portal kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000709ad00c7580
May 02 11:09:11 z8g4-portal kernel: R13: 0000000000001800 R14: 0000709acebb7dc0 R15: 0000709aceb913c0
May 02 11:09:11 z8g4-portal kernel:</TASK>
May 02 11:09:11 z8g4-portal kernel: Mem-Info:
May 02 11:09:11 z8g4-portal kernel: active_anon:6681869 inactive_anon:24607315 isolated_anon:0
active_file:30644 inactive_file:20137 isolated_file:0
unevictable:11192 dirty:13 writeback:0
slab_reclaimable:254395 slab_unreclaimable:296998
mapped:64449 shmem:34779 pagetables:220975
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:115427 free_pcp:14330 free_cma:0
May 02 11:09:11 z8g4-portal kernel: Node 0 active_anon:7669552kB inactive_anon:53187884kB active_file:91524kB inactive_file:73772kB unevictable:27528kB isolated(anon):0kB isolated(file):0kB mapped:147512kB dirty:52kB writeback:0kB shmem:25020kB shmem_thp:0kB shmem_p**pped:0kB anon_thp:0kB writeback_tmp:0kB kernel_stack:29912kB pagetables:580788kB sec_pagetables:0kB all_unreclaimable? no
May 02 11:09:11 z8g4-portal kernel: Node 1 active_anon:19057924kB inactive_anon:45241376kB active_file:31052kB inactive_file:6776kB unevictable:17240kB isolated(anon):0kB isolated(file):0kB mapped:110284kB dirty:0kB writeback:0kB shmem:114096kB shmem_thp:0kB shmem_p**pped:0kB anon_thp:0kB writeback_tmp:0kB kernel_stack:35848kB pagetables:303112kB sec_pagetables:0kB all_unreclaimable? no
May 02 11:09:11 z8g4-portal kernel: Node 0 DMA free:11264kB boost:0kB min:8kB low:20kB high:32kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
May 02 11:09:11 z8g4-portal kernel: lowmem_reserve[]: 0 1595 63998 63998 63998
May 02 11:09:11 z8g4-portal kernel: Node 0 DMA32 free:247020kB boost:0kB min:1116kB low:2748kB high:4380kB reserved_highatomic:0KB active_anon:244352kB inactive_anon:1181220kB active_file:1356kB inactive_file:1204kB unevictable:0kB writepending:32kB present:1767796kB managed:1701228kB mlocked:0kB bounce:0kB free_pcp:6252kB local_pcp:0kB free_cma:0kB
May 02 11:09:11 z8g4-portal kernel: lowmem_reserve[]: 0 0 62402 62402 62402
May 02 11:09:11 z8g4-portal kernel: Node 0 Normal free:133464kB boost:257320kB min:301076kB low:364972kB high:428868kB reserved_highatomic:0KB active_anon:7425236kB inactive_anon:52006628kB active_file:89964kB inactive_file:72276kB unevictable:27528kB writepending:20kB present:65011712kB managed:63909016kB mlocked:27528kB bounce:0kB free_pcp:29556kB local_pcp:248kB free_cma:0kB
May 02 11:09:11 z8g4-portal kernel: lowmem_reserve[]: 0 0 0 0 0
May 02 11:09:11 z8g4-portal kernel: Node 1 Normal free:71976kB boost:79872kB min:125096kB low:191136kB high:257176kB reserved_highatomic:0KB active_anon:19057924kB inactive_anon:45241376kB active_file:31052kB inactive_file:6776kB unevictable:17240kB writepending:0kB present:67108864kB managed:66040244kB mlocked:17240kB bounce:0kB free_pcp:19500kB local_pcp:0kB free_cma:0kB
May 02 11:09:11 z8g4-portal kernel: lowmem_reserve[]: 0 0 0 0 0
May 02 11:09:11 z8g4-portal kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
May 02 11:09:11 z8g4-portal kernel: Node 0 DMA32: 722*4kB (UM) 1063*8kB (UME) 902*16kB (UME) 820*32kB (UME) 505*64kB (UME) 325*128kB (UME) 158*256kB (UME) 81*512kB (UME) 37*1024kB (UM) 1*2048kB (M) 0*4096kB = 247840kB
May 02 11:09:11 z8g4-portal kernel: Node 0 Normal: 4166*4kB (UME) 4475*8kB (UME) 2479*16kB (UME) 1276*32kB (UME) 22*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 134368kB
May 02 11:09:11 z8g4-portal kernel: Node 1 Normal: 261*4kB (UME) 303*8kB (UME) 1411*16kB (UME) 1401*32kB (UME) 0*64kB 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 71004kB
May 02 11:09:11 z8g4-portal kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
May 02 11:09:11 z8g4-portal kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
May 02 11:09:11 z8g4-portal kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
May 02 11:09:11 z8g4-portal kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
May 02 11:09:11 z8g4-portal kernel: 116714 total pagecache pages
May 02 11:09:11 z8g4-portal kernel: 26954 pages in swap cache
May 02 11:09:11 z8g4-portal kernel: Free swap= 0kB
May 02 11:09:11 z8g4-portal kernel: Total swap = 8388604kB
May 02 11:09:11 z8g4-portal kernel: 33476090 pages RAM
May 02 11:09:11 z8g4-portal kernel: 0 pages HighMem/MovableOnly
May 02 11:09:11 z8g4-portal kernel: 559628 pages reserved
May 02 11:09:11 z8g4-portal kernel: 0 pages hwpoisoned
May 02 11:09:11 z8g4-portal kernel: Tasks state (memory values in pages):
May 02 11:09:11 z8g4-portal kernel: uidtgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
May 02 11:09:11 z8g4-portal kernel: [ 1203] 01203 35180 160 160 0 0 139264 0 -250 systemd-journal
May 02 11:09:11 z8g4-portal kernel: [ 1234] 01234 20117 6080 3520 2560 0 98304 0 -1000 dmeventd
很长很长很长的Tasks state
很长很长很长的Tasks state
很长很长很长的Tasks state
很长很长很长的Tasks state
很长很长很长的Tasks state
May 02 11:09:11 z8g4-portal kernel: [ 478703] 0 478703 5326 1280 0 1280 0 73728 0 0 curl
May 02 11:09:11 z8g4-portal kernel: [ 478719]1000 478719 3909 0 0 0 0 81920 0 0 sudo
May 02 11:09:11 z8g4-portal kernel: [ 478726] 999 478726 55568 1760 0 1440 320 172032 329 0 postgres
May 02 11:09:11 z8g4-portal kernel: [ 478747] 999 478747 55073 160 0 160 0 126976 329 0 postgres
May 02 11:09:11 z8g4-portal kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=docker-9ebf9b26996e7c9e659ad15d77607dc4bf5a53052370014be92d3a108c675cd2.scope,mems_allowed=0-1,global_oom,task_memcg=/system.slice/docker-82e65ed77c11c0b981e0f273c0193944b9c35d4e93f39d24ba7b5182352c1085.scope,task=ffmpeg,pid=443109,uid=0
May 02 11:09:11 z8g4-portal kernel: Out of memory: Killed process 443109 (ffmpeg) total-vm:118242576kB, anon-rss:116685432kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:228776kB oom_score_adj:0
May 02 11:09:11 z8g4-portal systemd: docker-82e65ed77c11c0b981e0f273c0193944b9c35d4e93f39d24ba7b5182352c1085.scope: A process of this unit has been killed by the OOM killer.
May 02 11:09:12 z8g4-portal dockerd: time="2025-05-02T03:09:11.892772400Z" level=warning msg="Health check for container 9ebf9b26996e7c9e659ad15d77607dc4bf5a53052370014be92d3a108c675cd2 error: timed out starting health check for container 9ebf9b26996e7c9e659ad15d77607dc4bf5a53052370014be92d3a108c675cd2"
May 02 11:09:12 z8g4-portal dockerd: time="2025-05-02T03:09:11.892973734Z" level=warning msg="Health check for container 0fb159a16e740859ef9d5cec397b0bf6588e2868c7444f87caa95fb47b9ee1ca error: timed out starting health check for container 0fb159a16e740859ef9d5cec397b0bf6588e2868c7444f87caa95fb47b9ee1ca"
May 02 11:09:12 z8g4-portal dockerd: time="2025-05-02T03:09:11.893194928Z" level=warning msg="Health check for container e69453d2b9babaf835e77c238e0d3ad151701d79fd03e3b15404d6ff803fb9d3 error: timed out starting health check for container e69453d2b9babaf835e77c238e0d3ad151701d79fd03e3b15404d6ff803fb9d3"
May 02 11:09:19 z8g4-portal kernel: oom_reaper: reaped process 443109 (ffmpeg), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
May 02 11:09:26 z8g4-portal dockerd: time="2025-05-02T03:09:26.710334282Z" level=error msg=" failed to query external DNS server" client-addr="udp:127.0.0.1:35892" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:35892->127.0.0.53:53: i/o timeout" question=";console.打码打码打码打码.\tIN\t AAAA"
May 02 11:09:26 z8g4-portal dockerd: time="2025-05-02T03:09:26.710435489Z" level=error msg=" failed to query external DNS server" client-addr="udp:127.0.0.1:33238" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:33238->127.0.0.53:53: i/o timeout" question=";console.打码打码打码打码.\tIN\t A"
我想我看的应该也不怎么全面, 欢迎有其他工具的大夫加入会诊, 我找了个快传平台搭了个microbin把输出日志简单脱敏放了上去: https://share.wiige.top/p/aAAYza posthoc 发表于 2025-6-1 23:06
127.0.0.53连接失败可能是ubuntu的systemd-resolved设置有问题?不过你都用了这么久了不应该吧。而且不至于 ...
确实, 看上去 failed to query external DNS server是某种错误导致的结果而不是诱因, 但这日志我实在是没法完全看懂 查了下reddit上好像也类似现象:
https://www.reddit.com/r/Ubuntu/comments/1dbt03y/ubuntu_2404_lts_freezes_randomly_sometimes/
不如先换成22试试看 swap 好像占满了,往这方面找找呢?
dns 错误是配置有问题还是运行时出的问题呢?我对 docker 不熟,不过按 podman 的使用经验来说,有些情况确实需要单独配置 dns。 大概率是软件吃满了内存和 swap,导致系统响应极慢形如死机。
可以试试在docker配置里加上 -mem,限制所有 container 合计内存小于总内存,再看看会不会出现死机。 本帖最后由 phorcys02 于 2025-6-2 03:00 编辑
swap耗尽, free 内存也没多少, 还有 内核分配内存 出错的信息
大概率是OOM导致的, 因为如果一个服务会导致系统oom,那就不会只出现一次,只会是无数次。
记录下内存使用的日志,检查下是你的哪个服务吃这么多内存,还是说哪个服务有内存泄露或者配置了太大的cache。
要么修好泄露,要么减少内存使用,要么加点内存。
痴货 发表于 2025-6-2 00:55
查了下reddit上好像也类似现象:
卧槽看着好像是这么回事儿哦? 我动摇了 本帖最后由 WiiGe 于 2025-6-2 06:13 编辑
phorcys02 发表于 2025-6-2 02:55
swap耗尽, free 内存也没多少, 还有 内核分配内存 出错的信息
大概率是OOM导致的, 因为如果一个服务会导 ...
这是怎么看出来的呢? 是关注哪些关键词可以看到内核分配内存失败的哇? 求教啊
是这个:
Apr 24 13:06:13 z8g4-portal kernel: workqueue: work_for_cpu_fn hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
或是:
Apr 27 17:21:24 z8g4-portal kernel: workqueue: vmstat_update hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND还是:Apr 28 18:00:07 z8g4-portal kernel: workqueue: drain_vmap_area_work hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND
这个呢?
记录下内存使用的日志是用sar来看吗? 我没太多这部分经验. 或者说一劳永逸地找个旧路由器搭个Prometheus?
我觉得你和楼上两位的说法很符合我的直觉, 但我没啥排查内存泄漏减少内存的思路, 望大夫们来张详细些的处方.
不过加内存好说, 总之看日志不妨碍我下单, 板子上空插槽也够, 先拉到512G看看是否会影响死机现象的出现频率, 这样可好?
内存爆炸把ffmpeg杀了
你就不能先少开点容器看看会不会死机? WiiGe 发表于 2025-6-2 05:50
这是怎么看出来的呢? 是关注哪些关键词可以看到内核分配内存失败的哇? 求教啊
是这个:
或是:
简单的如atop,sar之类的定时记录的,频率配高一点跑跑看
或者一步到位Prometheus也没毛病...
我感觉你列出的几个服务应该不至于吃满内存,大概率是什么配置的问题 你不打算关容器来测试稳定性的话就给每个容器加上cpu内存限制 moondigi 发表于 2025-6-2 12:45
你不打算关容器来测试稳定性的话就给每个容器加上cpu内存限制
我是docker compose启的,随便加
resource:
limite/reservation:
memory:
cpu:
这些项会导致容器以奇形怪状的方式报错,属实有些ptsd,有什么好的引入方式吗? phorcys02 发表于 2025-6-2 12:35
简单的如atop,sar之类的定时记录的,频率配高一点跑跑看
或者一步到位Prometheus也没毛病...
我感觉你列 ...
老专家有什么排查配置的思路吗?或者说先等一段时间拿到内存使用日志再说? tsubasa9 发表于 2025-6-2 11:48
内存爆炸把ffmpeg杀了
你就不能先少开点容器看看会不会死机?
没…………没有规律嘛我也不知道先噶谁,总之我把ffmpeg先斩了看看情况? WiiGe 发表于 2025-6-2 23:54
我是docker compose启的,随便加
resource:
limite/reservation:
我用podman的不用docker,你看看官方文档是不是配置错了,先用命令行加cpu内存参数启动测试一段时间 本帖最后由 tsubasa9 于 2025-6-3 09:41 编辑
没有规律就应该先空置啥都不跑,控制变量好歹学过吧
先前办公室工作站跑服务也是差不多症状
重装了个minimal的archlinux一样表现
最后换了内存才解决
基本认定是内存兼容问题
内存测试和log毫无问题
moondigi 发表于 2025-6-3 09:29
我用podman的不用docker,你看看官方文档是不是配置错了,先用命令行加cpu内存参数启动测试一段时间 ...
好奇问问podman是不是就是非商业的docker? 我看官网的意思好像是和k8s很近, 也是用containerd的吗? 我只有一些k3s写yaml的经历没用过podman tsubasa9 发表于 2025-6-3 09:39
没有规律就应该先空置啥都不跑,控制变量好歹学过吧
先前办公室工作站跑服务也是差不多症状
重装了个minima ...
对的对的对的, 这种测不出来但是跑起来怎么都不顺的问题最烦人叻
最开始控制变量想的是再弄一台Z8G4然后配置匀一匀弄成两个256G内存的机器负债均衡看看
但是预算不太够, 就想还是先来坛里问问, 指不定有什么思路
空跑肯定是不太行的, 这些服务是我的刚需, 所以还是回到了预算不太够的情况上来了哈哈 线上运行的机器一般是推荐把swap干掉的,这样它即使OOM也不太会拖累大家一起死
另外有带外硬件监控吗? ChengChung 发表于 2025-6-3 10:38
线上运行的机器一般是推荐把swap干掉的,这样它即使OOM也不太会拖累大家一起死
另外有带外硬件监控吗? ...
好问题,HP的工作站都只有Intel AMT,就那个破vPro,我不知道这东西到底能不能算BMC 本帖最后由 WiiGe 于 2025-6-11 10:52 编辑
在这一周内我做了的改进:
- 接受 ChengChung 的建议干掉了 swap
- 接受 moondigi 的建议给compose.yaml 加了limit
- 接受 tsubasa9 的建议空载 48小时 (未见异常)
- 买了台洋垃圾小主机搭 Prometheus, 但机器还在快递员手上
- 添加了两根64G内存, 但有一根无法识别而正处于退换货状态, 机器上已安装3根64G内存
昨天下午仍旧出现一次死机情况, 内核日志已于1楼更新
Update: 刚才2025-06-11-10:40 又一次死机, 机器在拉取docker hub的vllm的镜像的同时我通过WebDav打开了一个BDMV文件夹, 这时Plex正在执行入库操作, 然后它就死机了
会不会和网络/文件系统的IO有关系?
本帖最后由 WiiGe 于 2025-6-11 13:34 编辑
附sar结果: sar -f /var/log/sysstat/sa04
Linux 6.8.0-60-generic (z8g4-portal) 06/04/2025 _x86_64_ (80 CPU)
12:00:22 AM CPU %user %nice %system %iowait %steal %idle
12:10:13 AM all 0.45 0.20 0.43 0.23 0.00 98.69
12:20:12 AM all 0.44 0.18 0.45 0.14 0.00 98.79
12:30:24 AM all 0.38 0.00 0.14 0.00 0.00 99.48
12:40:06 AM all 0.38 0.00 0.14 0.00 0.00 99.48
12:50:04 AM all 0.37 0.00 0.13 0.00 0.00 99.49
01:00:23 AM all 0.35 0.00 0.12 0.00 0.00 99.53
01:10:12 AM all 0.53 0.00 0.22 0.00 0.00 99.24
01:20:10 AM all 0.36 0.00 0.13 0.00 0.00 99.51
01:30:02 AM all 0.48 0.00 0.21 0.00 0.00 99.31
01:40:17 AM all 0.41 0.00 0.18 0.00 0.00 99.41
01:50:06 AM all 0.49 0.00 0.36 0.00 0.00 99.15
02:00:05 AM all 0.47 0.00 0.32 0.00 0.00 99.21
02:10:21 AM all 0.53 0.00 0.39 0.01 0.00 99.07
02:20:02 AM all 0.45 0.00 0.29 0.00 0.00 99.26
02:30:15 AM all 0.43 0.00 0.24 0.00 0.00 99.33
02:40:24 AM all 0.45 0.00 0.28 0.00 0.00 99.27
02:50:06 AM all 0.37 0.00 0.15 0.00 0.00 99.48
03:00:16 AM all 0.34 0.00 0.12 0.00 0.00 99.54
03:10:02 AM all 0.68 0.00 0.43 0.00 0.00 98.88
03:20:05 AM all 0.58 0.00 0.32 0.00 0.00 99.10
03:30:24 AM all 0.37 0.00 0.16 0.00 0.00 99.47
03:40:14 AM all 0.37 0.00 0.15 0.00 0.00 99.48
03:50:24 AM all 0.36 0.00 0.12 0.00 0.00 99.52
04:00:02 AM all 0.35 0.00 0.12 0.00 0.00 99.53
04:10:24 AM all 0.37 0.00 0.12 0.00 0.00 99.51
04:20:04 AM all 0.36 0.00 0.12 0.00 0.00 99.52
04:30:22 AM all 0.35 0.00 0.12 0.00 0.00 99.53
04:40:14 AM all 0.49 0.00 0.24 0.03 0.00 99.24
04:50:12 AM all 0.52 0.00 0.17 0.01 0.00 99.30
05:00:16 AM all 0.36 0.00 0.12 0.00 0.00 99.51
05:10:02 AM all 0.66 0.00 0.24 0.04 0.00 99.06
05:20:08 AM all 0.49 0.00 0.19 0.07 0.00 99.25
05:30:13 AM all 0.36 0.00 0.12 0.00 0.00 99.51
05:40:22 AM all 0.37 0.00 0.13 0.00 0.00 99.50
05:50:01 AM all 0.35 0.00 0.12 0.00 0.00 99.52
06:00:12 AM all 0.35 0.00 0.12 0.00 0.00 99.54
06:10:24 AM all 0.38 0.00 0.12 0.00 0.00 99.49
06:20:02 AM all 0.35 0.00 0.12 0.00 0.00 99.54
06:30:24 AM all 0.35 0.00 0.12 0.00 0.00 99.53
06:40:24 AM all 0.39 0.00 0.13 0.00 0.00 99.47
06:50:10 AM all 0.39 0.00 0.13 0.00 0.00 99.48
07:00:24 AM all 0.40 0.00 0.13 0.00 0.00 99.46
07:10:02 AM all 0.40 0.00 0.13 0.00 0.00 99.47
07:20:08 AM all 0.35 0.00 0.12 0.00 0.00 99.53
07:30:17 AM all 0.35 0.00 0.12 0.00 0.00 99.54
07:40:22 AM all 0.39 0.00 0.12 0.00 0.00 99.49
07:50:12 AM all 0.35 0.00 0.11 0.00 0.00 99.53
08:00:00 AM all 0.35 0.00 0.12 0.00 0.00 99.53
08:10:11 AM all 0.38 0.00 0.12 0.00 0.00 99.49
08:20:10 AM all 0.35 0.00 0.11 0.00 0.00 99.54
08:30:00 AM all 0.36 0.00 0.12 0.00 0.00 99.51
08:40:24 AM all 0.36 0.00 0.12 0.00 0.00 99.52
08:50:02 AM all 0.36 0.00 0.12 0.00 0.00 99.51
09:00:24 AM all 0.36 0.00 0.14 0.00 0.00 99.50
09:10:24 AM all 0.88 0.52 0.38 0.08 0.00 98.14
09:20:01 AM all 0.36 2.08 0.35 0.00 0.00 97.21
09:30:24 AM all 0.35 2.06 0.35 0.00 0.00 97.24
09:40:12 AM all 0.36 2.06 0.36 0.00 0.00 97.21
09:50:05 AM all 0.37 2.07 0.36 0.00 0.00 97.20
10:00:02 AM all 0.35 2.07 0.36 0.00 0.00 97.21
10:00:02 AM CPU %user %nice %system %iowait %steal %idle
10:10:03 AM all 0.93 0.26 0.40 0.08 0.00 98.34
10:20:05 AM all 0.37 0.00 0.13 0.00 0.00 99.50
10:30:12 AM all 0.37 0.00 0.13 0.00 0.00 99.51
10:40:06 AM all 0.37 0.00 0.13 0.00 0.00 99.49
10:50:02 AM all 0.36 0.00 0.12 0.00 0.00 99.51
11:00:24 AM all 0.36 0.00 0.12 0.00 0.00 99.52
11:10:07 AM all 0.38 0.00 0.13 0.00 0.00 99.49
11:20:11 AM all 0.37 0.00 0.13 0.00 0.00 99.50
11:30:17 AM all 0.37 0.00 0.12 0.00 0.00 99.50
11:40:02 AM all 0.37 0.00 0.14 0.00 0.00 99.49
11:50:04 AM all 0.37 0.00 0.12 0.00 0.00 99.50
12:00:24 PM all 0.36 0.00 0.12 0.00 0.00 99.51
12:10:22 PM all 0.37 0.00 0.13 0.00 0.00 99.50
12:20:12 PM all 0.36 0.00 0.12 0.00 0.00 99.51
12:30:08 PM all 0.37 0.00 0.12 0.00 0.00 99.50
12:40:24 PM all 0.37 0.00 0.12 0.00 0.00 99.51
12:50:02 PM all 0.37 0.00 0.12 0.00 0.00 99.51
01:00:14 PM all 0.37 0.00 0.12 0.00 0.00 99.51
01:10:13 PM all 0.38 0.00 0.12 0.00 0.00 99.49
01:20:11 PM all 0.35 0.00 0.12 0.00 0.00 99.52
01:30:24 PM all 0.37 0.00 0.13 0.01 0.00 99.49
01:40:12 PM all 0.37 0.00 0.13 0.01 0.00 99.50
01:50:07 PM all 0.36 0.00 0.12 0.00 0.00 99.52
02:00:02 PM all 0.36 0.00 0.12 0.00 0.00 99.51
02:10:23 PM all 0.39 0.00 0.12 0.00 0.00 99.49
02:20:22 PM all 0.36 0.00 0.12 0.00 0.00 99.52
02:30:12 PM all 0.37 0.00 0.12 0.00 0.00 99.50
02:40:24 PM all 0.37 0.00 0.12 0.00 0.00 99.51
02:50:02 PM all 0.38 0.00 0.13 0.00 0.00 99.48
03:00:15 PM all 0.53 0.00 0.21 0.09 0.00 99.17
03:10:01 PM all 0.59 0.00 0.16 0.00 0.00 99.25
03:20:21 PM all 0.53 0.00 0.15 0.00 0.00 99.32
03:30:24 PM all 0.38 0.00 0.15 0.00 0.00 99.47
03:40:02 PM all 0.46 0.00 0.15 0.00 0.00 99.39
03:50:24 PM all 0.40 0.00 0.14 0.00 0.00 99.46
04:00:24 PM all 0.43 0.00 0.15 0.00 0.00 99.42
04:10:22 PM all 0.38 0.00 0.14 0.00 0.00 99.48
04:20:24 PM all 0.41 0.00 0.15 0.00 0.00 99.44
04:30:07 PM all 0.39 0.00 0.14 0.00 0.00 99.47
04:40:16 PM all 0.43 0.00 0.15 0.00 0.00 99.42
04:50:08 PM all 0.38 0.00 0.14 0.00 0.00 99.48
05:00:14 PM all 0.40 0.00 0.15 0.00 0.00 99.45
05:10:09 PM all 0.44 0.00 0.15 0.00 0.00 99.41
05:20:12 PM all 0.39 0.00 0.13 0.00 0.00 99.48
05:30:24 PM all 0.38 0.00 0.14 0.00 0.00 99.48
05:40:02 PM all 0.41 0.00 0.14 0.00 0.00 99.45
05:50:22 PM all 0.41 0.00 0.17 0.06 0.00 99.35
06:00:00 PM all 0.44 0.00 0.14 0.00 0.00 99.42
06:10:12 PM all 0.47 0.00 0.14 0.00 0.00 99.39
06:20:06 PM all 0.47 0.00 0.18 0.00 0.00 99.35
06:30:02 PM all 0.44 0.00 0.16 0.00 0.00 99.40
06:40:05 PM all 0.46 0.00 0.15 0.00 0.00 99.38
06:50:17 PM all 0.44 0.00 0.12 0.00 0.00 99.43
07:00:06 PM all 0.45 0.00 0.14 0.00 0.00 99.41
07:10:20 PM all 0.44 0.00 0.14 0.00 0.00 99.42
07:20:02 PM all 0.41 0.00 0.13 0.00 0.00 99.45
07:30:09 PM all 0.41 0.00 0.13 0.00 0.00 99.45
07:40:24 PM all 0.42 0.00 0.14 0.00 0.00 99.43
07:50:11 PM all 0.37 0.00 0.13 0.00 0.00 99.49
08:00:24 PM all 0.37 0.00 0.13 0.00 0.00 99.50
08:00:24 PM CPU %user %nice %system %iowait %steal %idle
08:10:12 PM all 0.37 0.00 0.13 0.00 0.00 99.50
08:20:07 PM all 30.58 1.26 0.71 0.01 0.00 67.44
08:30:02 PM all 33.38 12.77 5.61 0.08 0.00 48.15
08:40:24 PM all 27.96 7.33 2.78 0.01 0.00 61.92
08:50:16 PM all 23.59 0.99 0.48 0.00 0.00 74.94
09:00:20 PM all 23.15 0.90 0.49 0.00 0.00 75.46
09:10:24 PM all 23.69 0.96 0.53 0.00 0.00 74.82
09:20:07 PM all 23.88 0.99 0.57 0.00 0.00 74.55
09:30:24 PM all 24.19 1.00 0.56 0.00 0.00 74.25
09:40:02 PM all 24.29 0.96 0.58 0.00 0.00 74.17
09:50:18 PM all 17.12 0.55 0.41 0.00 0.00 81.91
10:00:24 PM all 0.36 0.00 0.15 0.00 0.00 99.49
10:10:12 PM all 0.37 0.00 0.14 0.00 0.00 99.49
10:20:24 PM all 0.40 0.00 0.17 0.00 0.00 99.43
10:30:02 PM all 0.40 0.00 0.14 0.00 0.00 99.46
10:40:03 PM all 0.40 0.00 0.17 0.00 0.00 99.42
10:50:24 PM all 0.38 0.00 0.16 0.00 0.00 99.46
11:00:12 PM all 0.38 0.00 0.15 0.00 0.00 99.47
11:10:07 PM all 0.38 0.00 0.16 0.00 0.00 99.46
11:20:02 PM all 0.39 0.00 0.16 0.00 0.00 99.45
11:30:01 PM all 0.43 0.00 0.16 0.00 0.00 99.40
11:40:24 PM all 0.39 0.00 0.16 0.00 0.00 99.45
11:50:05 PM all 0.39 0.00 0.15 0.00 0.00 99.46
Average: all 2.13 0.27 0.24 0.01 0.00 97.34
sar -f /var/log/sysstat/sa10
Linux 6.8.0-60-generic (z8g4-portal) 06/10/2025 _x86_64_ (80 CPU)
12:00:04 AM CPU %user %nice %system %iowait %steal %idle
12:10:11 AM all 20.56 1.25 2.31 0.05 0.00 75.83
12:20:17 AM all 20.28 1.09 2.39 0.05 0.00 76.18
12:30:09 AM all 19.09 0.06 2.50 0.33 0.00 78.01
12:40:07 AM all 18.94 0.00 2.30 0.04 0.00 78.72
12:50:14 AM all 18.86 0.00 2.32 0.04 0.00 78.78
01:00:08 AM all 18.84 0.00 2.45 0.05 0.00 78.66
01:10:04 AM all 18.88 0.00 2.55 0.06 0.00 78.50
01:20:08 AM all 18.70 0.00 2.49 0.06 0.00 78.74
01:30:02 AM all 18.67 0.00 2.56 0.07 0.00 78.70
01:40:04 AM all 18.50 0.00 2.67 0.07 0.00 78.76
01:50:04 AM all 18.48 0.00 2.66 0.08 0.00 78.78
02:00:01 AM all 18.59 0.00 2.63 0.08 0.00 78.70
02:10:04 AM all 18.60 0.00 2.60 0.07 0.00 78.73
02:20:04 AM all 18.67 0.00 2.53 0.06 0.00 78.74
02:30:15 AM all 18.59 0.00 2.63 0.07 0.00 78.72
02:40:17 AM all 18.45 0.00 2.69 0.07 0.00 78.79
02:50:14 AM all 18.54 0.00 2.61 0.07 0.00 78.79
03:00:12 AM all 18.86 0.00 2.35 0.06 0.00 78.73
03:10:00 AM all 19.15 0.00 2.59 0.06 0.00 78.19
03:20:12 AM all 18.81 0.00 2.62 0.06 0.00 78.51
03:30:17 AM all 18.45 0.00 2.68 0.06 0.00 78.81
03:40:07 AM all 18.41 0.00 2.74 0.06 0.00 78.79
03:50:01 AM all 18.47 0.00 2.67 0.06 0.00 78.79
04:00:04 AM all 18.51 0.00 2.67 0.07 0.00 78.75
04:10:17 AM all 18.47 0.00 2.65 0.06 0.00 78.82
04:20:17 AM all 18.44 0.00 2.69 0.06 0.00 78.80
04:30:13 AM all 18.52 0.00 2.61 0.06 0.00 78.80
04:40:10 AM all 18.53 0.00 2.59 0.07 0.00 78.81
04:50:14 AM all 18.51 0.00 2.63 0.06 0.00 78.80
05:00:08 AM all 18.72 0.00 2.70 0.07 0.00 78.51
05:10:17 AM all 18.76 0.00 2.45 0.06 0.00 78.74
05:20:09 AM all 18.55 0.00 2.68 0.06 0.00 78.71
05:30:13 AM all 18.50 0.00 2.69 0.06 0.00 78.75
05:40:05 AM all 18.66 0.00 2.56 0.06 0.00 78.73
05:50:17 AM all 18.71 0.00 2.50 0.06 0.00 78.73
06:00:04 AM all 18.77 0.00 2.53 0.06 0.00 78.64
06:10:07 AM all 18.84 0.00 2.64 0.06 0.00 78.46
06:20:17 AM all 18.87 0.00 2.55 0.06 0.00 78.52
06:30:14 AM all 18.76 0.00 2.47 0.06 0.00 78.72
06:40:17 AM all 18.66 0.00 2.61 0.06 0.00 78.68
06:50:02 AM all 18.66 0.00 2.59 0.06 0.00 78.70
07:00:14 AM all 18.68 0.00 2.54 0.06 0.00 78.72
07:10:12 AM all 18.58 0.00 2.54 0.05 0.00 78.83
07:20:14 AM all 18.63 0.00 2.20 0.05 0.00 79.12
07:30:14 AM all 1.32 0.00 0.18 0.00 0.00 98.50
07:40:04 AM all 0.23 0.00 0.10 0.00 0.00 99.67
07:50:17 AM all 0.23 0.00 0.10 0.00 0.00 99.67
08:00:04 AM all 0.27 0.00 0.10 0.00 0.00 99.62
08:10:08 AM all 0.24 0.00 0.10 0.00 0.00 99.66
08:20:17 AM all 0.24 0.00 0.10 0.00 0.00 99.65
08:30:00 AM all 0.24 0.00 0.10 0.00 0.00 99.65
08:40:01 AM all 0.24 0.00 0.10 0.00 0.00 99.66
08:50:17 AM all 0.24 0.00 0.11 0.00 0.00 99.65
09:00:13 AM all 0.33 0.00 0.19 0.00 0.00 99.47
09:10:04 AM all 0.74 1.21 0.42 0.08 0.00 97.56
09:20:02 AM all 0.26 2.06 0.34 0.00 0.00 97.34
09:30:14 AM all 0.26 2.06 0.35 0.00 0.00 97.33
09:40:04 AM all 0.24 2.03 0.37 0.00 0.00 97.36
09:50:03 AM all 0.23 2.03 0.37 0.00 0.00 97.36
10:00:10 AM all 0.54 1.58 0.38 0.13 0.00 97.38
10:00:10 AM CPU %user %nice %system %iowait %steal %idle
10:10:11 AM all 0.49 0.00 0.17 0.19 0.00 99.15
10:20:17 AM all 0.25 0.00 0.11 0.00 0.00 99.63
10:30:06 AM all 0.25 0.00 0.11 0.00 0.00 99.64
10:40:17 AM all 0.26 0.00 0.11 0.00 0.00 99.62
10:50:04 AM all 0.26 0.00 0.11 0.00 0.00 99.63
11:00:13 AM all 0.31 0.00 0.12 0.00 0.00 99.56
11:10:17 AM all 0.27 0.00 0.11 0.00 0.00 99.62
11:20:14 AM all 0.39 0.42 0.34 0.16 0.00 98.68
11:30:15 AM all 0.45 3.57 0.69 0.21 0.00 95.08
11:40:04 AM all 0.25 2.06 0.30 0.05 0.00 97.35
11:50:04 AM all 0.24 2.10 0.30 0.05 0.00 97.31
12:00:13 PM all 0.33 2.08 0.35 0.04 0.00 97.20
12:10:17 PM all 0.28 2.08 0.31 0.03 0.00 97.30
12:20:17 PM all 0.25 2.11 0.31 0.04 0.00 97.30
12:30:04 PM all 0.24 2.08 0.32 0.05 0.00 97.31
12:40:17 PM all 0.25 2.09 0.30 0.05 0.00 97.32
12:50:04 PM all 0.25 2.10 0.30 0.04 0.00 97.31
01:00:15 PM all 0.30 2.09 0.31 0.04 0.00 97.24
01:10:17 PM all 0.25 2.08 0.30 0.05 0.00 97.32
01:20:14 PM all 0.25 2.09 0.29 0.01 0.00 97.37
01:30:14 PM all 0.27 2.12 0.49 0.40 0.00 96.73
01:40:17 PM all 0.25 2.11 0.27 0.01 0.00 97.37
01:50:00 PM all 0.48 7.19 2.88 0.16 0.00 89.29
02:00:14 PM all 1.42 3.94 1.77 0.47 0.00 92.40
02:10:14 PM all 1.03 4.71 1.40 0.75 0.00 92.11
02:20:17 PM all 0.73 3.71 1.33 1.48 0.00 92.75
02:30:04 PM all 0.27 1.96 0.32 0.02 0.00 97.43
02:40:17 PM all 0.26 2.07 0.36 0.04 0.00 97.26
02:50:17 PM all 0.28 2.07 0.33 0.01 0.00 97.32
03:00:14 PM all 0.30 2.07 0.34 0.00 0.00 97.28
03:10:17 PM all 0.27 2.08 0.35 0.00 0.00 97.30
03:20:04 PM all 0.26 2.10 0.33 0.00 0.00 97.31
03:30:15 PM all 0.27 2.06 0.28 0.01 0.00 97.39
03:40:17 PM all 0.28 2.16 0.25 0.00 0.00 97.31
03:50:03 PM all 0.31 2.18 0.30 0.01 0.00 97.20
04:00:13 PM all 0.37 2.25 0.31 0.02 0.00 97.05
04:10:04 PM all 0.26 2.34 0.24 0.00 0.00 97.15
04:20:17 PM all 0.25 2.33 0.28 0.00 0.00 97.14
04:30:17 PM all 0.26 2.27 0.29 0.01 0.00 97.18
04:40:09 PM all 0.24 2.17 0.24 0.02 0.00 97.33
04:50:04 PM all 0.25 2.28 0.26 0.12 0.00 97.09
05:00:14 PM all 0.31 2.28 0.26 0.00 0.00 97.15
05:10:17 PM all 0.38 2.68 0.43 0.08 0.00 96.43
Average: all 8.24 0.98 1.31 0.08 0.00 89.39
05:30:58 PMLINUX RESTART (80 CPU)
05:40:07 PM CPU %user %nice %system %iowait %steal %idle
05:50:37 PM all 1.06 0.00 0.63 0.03 0.00 98.28
06:00:07 PM all 0.70 0.00 0.41 0.01 0.00 98.88
06:10:07 PM all 0.54 0.00 0.24 0.01 0.00 99.22
06:20:09 PM all 0.43 0.00 0.17 0.01 0.00 99.40
06:30:13 PM all 0.45 0.00 0.24 0.01 0.00 99.30
06:40:06 PM all 0.41 0.00 0.25 0.21 0.00 99.13
06:50:07 PM all 0.37 0.00 0.11 0.00 0.00 99.51
07:00:34 PM all 0.34 0.00 0.10 0.00 0.00 99.55
07:10:13 PM all 0.39 0.00 0.11 0.00 0.00 99.50
07:20:27 PM all 0.36 0.00 0.10 0.00 0.00 99.54
07:30:06 PM all 0.37 0.00 0.10 0.00 0.00 99.53
07:40:17 PM all 0.37 0.00 0.21 0.02 0.00 99.39
07:50:21 PM all 0.39 0.00 0.33 0.01 0.00 99.26
08:00:07 PM all 0.38 0.00 0.11 0.00 0.00 99.51
08:10:27 PM all 0.99 0.53 0.14 0.00 0.00 98.34
08:20:00 PM all 7.23 3.58 0.44 0.01 0.00 88.74
08:30:17 PM all 4.43 1.58 0.38 0.02 0.00 93.59
08:40:31 PM all 3.73 1.75 0.30 0.00 0.00 94.22
08:50:07 PM all 4.20 2.93 0.34 0.00 0.00 92.53
09:00:14 PM all 3.50 2.32 0.32 0.00 0.00 93.85
09:10:06 PM all 1.14 0.63 0.16 0.00 0.00 98.07
09:20:27 PM all 0.68 0.24 0.13 0.00 0.00 98.95
09:30:12 PM all 0.33 0.00 0.10 0.00 0.00 99.57
09:40:07 PM all 0.36 0.00 0.10 0.00 0.00 99.54
09:50:08 PM all 0.34 0.00 0.10 0.00 0.00 99.56
10:00:21 PM all 0.34 0.00 0.10 0.00 0.00 99.56
10:10:16 PM all 0.38 0.00 0.11 0.00 0.00 99.51
10:20:12 PM all 0.33 0.00 0.10 0.00 0.00 99.56
10:30:17 PM all 0.33 0.00 0.11 0.00 0.00 99.55
10:40:19 PM all 0.34 0.00 0.11 0.00 0.00 99.54
10:50:37 PM all 0.34 0.00 0.10 0.00 0.00 99.55
11:00:11 PM all 0.38 0.00 0.11 0.00 0.00 99.52
11:10:04 PM all 0.36 0.00 0.11 0.00 0.00 99.53
11:20:08 PM all 0.34 0.00 0.10 0.00 0.00 99.56
11:30:24 PM all 0.33 0.00 0.10 0.00 0.00 99.57
11:40:07 PM all 0.33 0.00 0.10 0.00 0.00 99.57
11:50:27 PM all 0.34 0.00 0.10 0.00 0.00 99.56
Average: all 1.01 0.36 0.19 0.01 0.00 98.43 还是用wintogo跑个窗试试吧 怀疑是硬盘问题的话fsck试了吗
页:
[1]
2