Copy Fail:Linux 内核 2017 年至今的高危漏洞(附临时缓解方案) | CVE-2026-31431

Copy Fail:Linux 内核 2017 年至今的高危漏洞(附临时缓解方案) | CVE-2026-31431

最近爆出一个 Linux 内核存在近10年的漏洞,随便找了一台最近在用的机器试了一下,直接成功提权: ~$ whoami songtianlun ~$ python3 test.py # whoami root 临时解决方案如下: 由于 SSH/OpenSSL 等安全基建库几乎都在使用自行维护的用户态加密库, 所以 AF_ALG 可以直接禁用, 为临时缓解措施 (仅供参考): ...

April 30, 2026 | 1 分钟 | 230 字 | Tianlun Song
Hermes Agent — 在 K3s / K8s 中运行指南

Hermes Agent — 在 K3s / K8s 中运行指南

本文基于官方 Docker 文档,将 Hermes Agent 迁移到 Kubernetes / K3s 环境,使用 StatefulSet 管理持久化工作负载。 1. 前置准备 K3s 或 K8s 集群已就绪(本文以 K3s 为例) 节点上已有 containerd(K3s 默认内置) 推荐安装 nerdctl 作为容器管理工具(参考:在 K3s 节点上安装并使用 nerdctl) 镜像:nousresearch/hermes-agent:latest 2. 初始化配置(持久化数据目录) 在首次运行前,需要先执行一次 Setup Wizard,将 API Keys 等配置写入宿主机目录,再挂载进容器使用。 ...

April 27, 2026 | 2 分钟 | 814 字 | Tianlun Song
在 K3s 节点上安装并使用 nerdctl

在 K3s 节点上安装并使用 nerdctl

适用场景:K3s 默认不附带 nerdctl,但其内置的 containerd 与 nerdctl 完全兼容。本教程讲解如何在 K3s 节点上以最小代价安装 nerdctl,并正确指向 K3s 的 containerd socket,无需重复安装 containerd 或 CNI。 ...

April 27, 2026 | 4 分钟 | 1541 字 | Tianlun Song
Mouser:轻量开源的罗技鼠标驱动替代方案

Mouser:轻量开源的罗技鼠标驱动替代方案

以下内容整理自:https://meta.appinn.net/t/topic/83933 项目地址:https://github.com/TomBadash/Mouser 之前曾分享过一个可以按需精简安装 Logitech Options+ 功能的脚本工具 tjsky/logi-options-plus-mini,算是治标之策。然而实际使用下来,即便精简到了极致,Options+ 依然有个"顽疾":它每周自动下载小几百 MB 的更新文件,却从不删除旧副本,硬盘俨然成了它的私人垃圾场。实测三个月后,相关文件体积已经膨胀到惊人的 2 GB。 ...

April 20, 2026 | 4 分钟 | 1542 字 | Tianlun Song
Claude Opus 4.7:优缺点与评测信息汇总

Claude Opus 4.7:优缺点与评测信息汇总

以下内容转载自:https://linux.do/t/topic/1984117 基本资料 官方文:https://www.anthropic.com/news/claude-opus-4-7 官方文档:https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7 官方模型卡:https://cdn.sanity.io/files/4zrzovbb/website/037f06850df7fbe871e206dad004c3db5fd50340.pdf ...

April 17, 2026 | 3 分钟 | 1173 字 | Tianlun Song
openFuyao NPU-Operator故障排查

openFuyao NPU-Operator故障排查

故障 pod describe [root@master1 ~]# kubectl -n kube-system describe pod ascend-device-plugin-ll46f Name: ascend-device-plugin-ll46f Namespace: kube-system Priority: 2000001000 Priority Class Name: system-node-critical Service Account: ascend-device-plugin-sa Node: master1/10.17.30.131 Start Time: Mon, 30 Mar 2026 11:08:32 +0800 Labels: app.kubernetes.io/managed-by=npu-operator controller-revision-hash=7df5dcb887 helm.sh/chart=npu-operator-0.15.0 name=ascend-device-plugin-ds pod-template-generation=1 Annotations: cni.projectcalico.org/containerID: c1f2adcaeaaf2bdcf0a6e09730f68231a293074e31d58f61997f714dfb520878 cni.projectcalico.org/podIP: 192.168.137.118/32 cni.projectcalico.org/podIPs: 192.168.137.118/32 scheduler.alpha.kubernetes.io/critical-pod: seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Running IP: 192.168.137.118 IPs: IP: 192.168.137.118 Controlled By: DaemonSet/ascend-device-plugin Init Containers: init-permission: Container ID: containerd://4406968a522bea48dfefebae81ec53644312762af4781c25de689952ed6c2d27 Image: cr.openfuyao.cn/openfuyao/busybox:1.36.1 Image ID: cr.openfuyao.cn/openfuyao/busybox@sha256:4b8407fadd8100c61b097d63efe992b2c033e7d371c9117f7a9462fe87e31176 Port: <none> Host Port: <none> Command: sh -c chown 9000:9000 /var/log/mindx-dl /var/log/mindx-dl/devicePlugin chmod 750 /var/log/mindx-dl/devicePlugin State: Terminated Reason: Completed Exit Code: 0 Started: Mon, 30 Mar 2026 15:28:32 +0800 Finished: Mon, 30 Mar 2026 15:28:32 +0800 Ready: True Restart Count: 1 Environment: <none> Mounts: /var/log/mindx-dl/devicePlugin from log-path (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfldg (ro) Containers: device-plugin-01: Container ID: containerd://fcc0c4742285847e2621a9a9217502307fc7e28644fbf86b32f9c11d67a2c0ab Image: cr.openfuyao.cn/openfuyao/ascend-image/ascend-k8sdeviceplugin:v6.0.0 Image ID: cr.openfuyao.cn/openfuyao/ascend-image/ascend-k8sdeviceplugin@sha256:a5b9612b21bcd35384f9f19a05b2d7915b865e7b2be6a30bfd7806a9b8a86f58 Port: <none> Host Port: <none> Command: /bin/bash -c -- Args: device-plugin -useAscendDocker=true -volcanoType=false -logFile=/var/log/mindx-dl/devicePlugin/devicePlugin.log -logLevel=0 State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 31 Mar 2026 10:28:58 +0800 Finished: Tue, 31 Mar 2026 10:28:58 +0800 Ready: False Restart Count: 274 Limits: cpu: 500m memory: 500Mi Requests: cpu: 500m memory: 500Mi Environment: NODE_NAME: (v1:spec.nodeName) Mounts: /tmp from tmp (rw) /usr/local/Ascend/driver from hiai-driver (ro) /var/lib/kubelet/device-plugins from device-plugin (rw) /var/lib/kubelet/pod-resources from pod-resource (rw) /var/log/mindx-dl/devicePlugin from log-path (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfldg (ro) Conditions: Type Status PodReadyToStartContainers True Initialized True Ready False ContainersReady False PodScheduled True Volumes: device-plugin: Type: HostPath (bare host directory volume) Path: /var/lib/kubelet/device-plugins HostPathType: pod-resource: Type: HostPath (bare host directory volume) Path: /var/lib/kubelet/pod-resources HostPathType: hiai-driver: Type: HostPath (bare host directory volume) Path: /usr/local/Ascend/driver HostPathType: log-path: Type: HostPath (bare host directory volume) Path: /var/log/mindx-dl/devicePlugin HostPathType: DirectoryOrCreate tmp: Type: HostPath (bare host directory volume) Path: /tmp HostPathType: kube-api-access-gfldg: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt Optional: false DownwardAPI: true QoS Class: Burstable Node-Selectors: openfuyao.com/npu.present= Tolerations: CriticalAddonsOnly op=Exists device-plugin=v2:NoSchedule huawei.com/Ascend910:NoSchedule op=Exists node-role.kubernetes.io/control-plane:NoSchedule node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulled 16m (x205 over 18h) kubelet (combined from similar events): Successfully pulled image "cr.openfuyao.cn/openfuyao/ascend-image/ascend-k8sdeviceplugin:v6.0.0" in 403ms (403ms including waiting). Image size: 48017174 bytes. Warning BackOff 2m47s (x5216 over 18h) kubelet Back-off restarting failed container device-plugin-01 in pod ascend-device-plugin-ll46f_kube-system(8edcd384-ab2d-4998-8077-5ac58801c79e) Normal Pulling 66s (x227 over 19h) kubelet Pulling image "cr.openfuyao.cn/openfuyao/ascend-image/ascend-k8sdeviceplugin:v6.0.0" 故障 pod /dev 检查 [root@master1 fuyao-26.3-rc3]# kubectl -n kube-system exec -it daemonsets/ascend-device-plugin -- ls /dev Defaulted container "device-plugin-01" out of: device-plugin-01, init-permission (init) autofs null tty10 tty34 tty58 vcs5 bsg ppp tty11 tty35 tty59 vcs6 btrfs-control ptmx tty12 tty36 tty6 vcsa bus pts tty13 tty37 tty60 vcsa1 core random tty14 tty38 tty61 vcsa2 cpu_dma_latency raw tty15 tty39 tty62 vcsa3 cuse relationship_ctrl tty16 tty4 tty63 vcsa4 davinci0 rfkill tty17 tty40 tty7 vcsa5 davinci_manager rtc0 tty18 tty41 tty8 vcsa6 devmm_svm sda tty19 tty42 tty9 vcsu dri sda1 tty2 tty43 ttyAMA0 vcsu1 fb0 sda2 tty20 tty44 ttyS0 vcsu2 fd sg0 tty21 tty45 ttyS1 vcsu3 full sg1 tty22 tty46 ttyS2 vcsu4 fuse sg2 tty23 tty47 ttyS3 vcsu5 hidraw0 shm tty24 tty48 uhid vcsu6 hidraw1 snapshot tty25 tty49 uinput vfio hisi_hdc sr0 tty26 tty5 urandom vga_arbiter hwrng sr1 tty27 tty50 usbmon0 vhost-net input stderr tty28 tty51 usbmon1 vhost-vsock kmsg stdin tty29 tty52 usbmon2 vport2p1 loop-control stdout tty3 tty53 vcs zero mapper termination-log tty30 tty54 vcs1 mem tty tty31 tty55 vcs2 mqueue tty0 tty32 tty56 vcs3 net tty1 tty33 tty57 vcs4 故障 pod 驱动检查 [root@master1 fuyao-26.3-rc3]# kubectl -n kube-system exec -it daemonsets/ascend-device-plugin -- ls -lha /usr/local/Ascend/driver Defaulted container "device-plugin-01" out of: device-plugin-01, init-permission (init) total 44K drwxr-xr-x 8 root root 4.0K Mar 27 08:03 . drwxr-xr-x 3 root root 4.0K Mar 31 02:34 .. drwxr-xr-x 2 root root 4.0K Mar 27 08:01 bin -r--r--r-- 1 root root 20 Mar 27 08:01 build.info dr-xr-x--- 2 root root 4.0K Mar 27 08:01 device dr-x------ 41 root root 4.0K Mar 27 08:01 kernel drwxr-xr-x 6 root root 4.0K Mar 27 08:01 lib64 -r--r----- 1 root root 56 Mar 27 08:01 scene.info dr-xr-x--- 2 root root 4.0K Mar 27 08:01 script drwxr-xr-x 2 root root 4.0K Mar 27 08:01 tools -r--r--r-- 1 root root 352 Mar 27 08:03 version.info 故障 pod 日志 [root@master1 ~]# kubectl -n kube-system logs daemonsets/ascend-device-plugin --previous Defaulted container "device-plugin-01" out of: device-plugin-01, init-permission (init) [INFO] 2026/03/31 06:46:54.593254 1 hwlog/api.go:108 devicePlugin.log's logger init success [INFO] 2026/03/31 06:46:54.593449 1 main.go:187 ascend device plugin starting and the version is v6.0.0_linux-aarch64 [INFO] 2026/03/31 06:46:54.593494 1 main.go:188 ascend device plugin starting scene is center [INFO] 2026/03/31 06:46:54.787930 1 devmanager/devmanager.go:104 the dcmi version is 24.1.rc3 [ERROR] 2026/03/31 06:46:54.788019 1 devmanager/devmanager.go:211 get error card quantity: 0 [ERROR] 2026/03/31 06:46:54.788052 1 devmanager/devmanager.go:195 get card list failed for init [ERROR] 2026/03/31 06:46:54.788101 1 main.go:203 init devmanager failed, err: auto init failed, err: get card list failed for init 故障 pod 驱动检查 [root@master1 ~]# kubectl -n kube-system exec -it daemonsets/ascend-device-plugin -- bash -c 'find /usr/local/Ascend/driver -name libdcmi.so 2>/dev/null; echo $LD_LIBRARY_PATH' Defaulted container "device-plugin-01" out of: device-plugin-01, init-permission (init) /usr/local/Ascend/driver/lib64/driver/libdcmi.so command terminated with exit code 137 [root@master1 ~]# ps -ef | grep -E 'dmp_daemon|slogd' | grep -v grep root 21578 1 0 Mar30 ? 00:00:19 /usr/sbin/rsyslogd -n -i/var/run/rsyslogd.pid 检查服务状态? [root@master1 ~]# systemctl status ascend-dmi Unit ascend-dmi.service could not be found. [root@master1 ~]# systemctl status ascend-dkms Unit ascend-dkms.service could not be found. [root@master1 ~]# systemctl status npu-smi Unit npu-smi.service could not be found. [root@master1 ~]# find / -name dmp_daemon 2>/dev/null [root@master1 ~]# find / -name slogd 2>/dev/null [root@master1 ~]# ls -l /var/dmp_daemon /var/slogd 2>/dev/null [root@master1 ~]# dcmi 问题,需硬件排查 ...

April 13, 2026 | 5 分钟 | 2277 字 | Tianlun Song
openFuyao 2603 共测测试报告

openFuyao 2603 共测测试报告

相关链接 特性清单: https://gitcode.com/openFuyao/release-management/blob/main/openFuyao-26.03/release-plan.md 安装部署前置环境校验工具使用指导: https://gitcode.com/openFuyao/sig-installation/blob/master/docs/zh/user_guide/cluster_installation_deployment/environment_pre_check_tool_guide.md 测试环境 CPU: Kunpeng-920 OS: openEuler 24.03 LTS SP3 aarch64 Fuyao Version: v26.03 rc3 docker: 2:18.09.0-346.oe2403sp3 测试特性 在线部署; 离线包制备; 离线部署; 安装部署前置检查工具; NPU Operator; AI推理套件; 建议优化点 环境检测工具,检查 iptables 默认策略是否放行,若未放行可能在部署成功后无法访问;默认防火墙策略为 FORWARD DROP ,对集群运行和访问带来的潜在问题; 运行 cli 前检查是否存在命令并及时抛出错误;检查 tar / unzip 是否安装,安装过程有很多地方会用到,而且出错时不会得到明显的解压失败报错,难以定位问题。 安装命令变化,考虑上下兼容性? 场景记录 离线部署管理面和业务面集群 CPU: Kunpeng-920 OS: openEuler 24.03 LTS SP3 aarch64 Fuyao Version: v26.03 rc3 docker: 2:18.09.0-346.oe2403sp3 arm64 环境下构建离线制品包为什么会执行 amd64 的 bin ...

April 13, 2026 | 16 分钟 | 7872 字 | Tianlun Song
openFuyao InferNex AI推理集成部署 310P(300I Pro) 环境问题记录及解决

openFuyao InferNex AI推理集成部署 310P(300I Pro) 环境问题记录及解决

AI推理集成部署(InferNex)是一个专为云原生环境下AI推理服务优化所设计的端到端集成部署方案。该方案基于Kubernetes Gateway API Inference Extension (GIE) 和主流LLM技术栈构建,通过Helm Chart将开源网关、智能路由、高性能推理后端、全局KVCache管理、扩缩容决策框架及推理可观测体系等核心加速模块无缝集成。它提供从请求接入、动态路由、推理执行到资源管理与监控的完整加速链路,旨在提升推理吞吐量并降低TTFT/TPOT时延,实现一站式的高效AI服务部署体验。 ...

April 13, 2026 | 24 分钟 | 11831 字 | Tianlun Song
Ascend 310P + openFuyao + NPU-Operator 故障排查

Ascend 310P + openFuyao + NPU-Operator 故障排查

[TOC] 故障 pod describe [root@master1 ~]# kubectl -n kube-system describe pod ascend-device-plugin-ll46f Name: ascend-device-plugin-ll46f Namespace: kube-system Priority: 2000001000 Priority Class Name: system-node-critical Service Account: ascend-device-plugin-sa Node: master1/10.17.30.131 Start Time: Mon, 30 Mar 2026 11:08:32 +0800 Labels: app.kubernetes.io/managed-by=npu-operator controller-revision-hash=7df5dcb887 helm.sh/chart=npu-operator-0.15.0 name=ascend-device-plugin-ds pod-template-generation=1 Annotations: cni.projectcalico.org/containerID: c1f2adcaeaaf2bdcf0a6e09730f68231a293074e31d58f61997f714dfb520878 cni.projectcalico.org/podIP: 192.168.137.118/32 cni.projectcalico.org/podIPs: 192.168.137.118/32 scheduler.alpha.kubernetes.io/critical-pod: seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Running IP: 192.168.137.118 IPs: IP: 192.168.137.118 Controlled By: DaemonSet/ascend-device-plugin Init Containers: init-permission: Container ID: containerd://4406968a522bea48dfefebae81ec53644312762af4781c25de689952ed6c2d27 Image: cr.openfuyao.cn/openfuyao/busybox:1.36.1 Image ID: cr.openfuyao.cn/openfuyao/busybox@sha256:4b8407fadd8100c61b097d63efe992b2c033e7d371c9117f7a9462fe87e31176 Port: <none> Host Port: <none> Command: sh -c chown 9000:9000 /var/log/mindx-dl /var/log/mindx-dl/devicePlugin chmod 750 /var/log/mindx-dl/devicePlugin State: Terminated Reason: Completed Exit Code: 0 Started: Mon, 30 Mar 2026 15:28:32 +0800 Finished: Mon, 30 Mar 2026 15:28:32 +0800 Ready: True Restart Count: 1 Environment: <none> Mounts: /var/log/mindx-dl/devicePlugin from log-path (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfldg (ro) Containers: device-plugin-01: Container ID: containerd://fcc0c4742285847e2621a9a9217502307fc7e28644fbf86b32f9c11d67a2c0ab Image: cr.openfuyao.cn/openfuyao/ascend-image/ascend-k8sdeviceplugin:v6.0.0 Image ID: cr.openfuyao.cn/openfuyao/ascend-image/ascend-k8sdeviceplugin@sha256:a5b9612b21bcd35384f9f19a05b2d7915b865e7b2be6a30bfd7806a9b8a86f58 Port: <none> Host Port: <none> Command: /bin/bash -c -- Args: device-plugin -useAscendDocker=true -volcanoType=false -logFile=/var/log/mindx-dl/devicePlugin/devicePlugin.log -logLevel=0 State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 31 Mar 2026 10:28:58 +0800 Finished: Tue, 31 Mar 2026 10:28:58 +0800 Ready: False Restart Count: 274 Limits: cpu: 500m memory: 500Mi Requests: cpu: 500m memory: 500Mi Environment: NODE_NAME: (v1:spec.nodeName) Mounts: /tmp from tmp (rw) /usr/local/Ascend/driver from hiai-driver (ro) /var/lib/kubelet/device-plugins from device-plugin (rw) /var/lib/kubelet/pod-resources from pod-resource (rw) /var/log/mindx-dl/devicePlugin from log-path (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfldg (ro) Conditions: Type Status PodReadyToStartContainers True Initialized True Ready False ContainersReady False PodScheduled True Volumes: device-plugin: Type: HostPath (bare host directory volume) Path: /var/lib/kubelet/device-plugins HostPathType: pod-resource: Type: HostPath (bare host directory volume) Path: /var/lib/kubelet/pod-resources HostPathType: hiai-driver: Type: HostPath (bare host directory volume) Path: /usr/local/Ascend/driver HostPathType: log-path: Type: HostPath (bare host directory volume) Path: /var/log/mindx-dl/devicePlugin HostPathType: DirectoryOrCreate tmp: Type: HostPath (bare host directory volume) Path: /tmp HostPathType: kube-api-access-gfldg: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt Optional: false DownwardAPI: true QoS Class: Burstable Node-Selectors: openfuyao.com/npu.present= Tolerations: CriticalAddonsOnly op=Exists device-plugin=v2:NoSchedule huawei.com/Ascend910:NoSchedule op=Exists node-role.kubernetes.io/control-plane:NoSchedule node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulled 16m (x205 over 18h) kubelet (combined from similar events): Successfully pulled image "cr.openfuyao.cn/openfuyao/ascend-image/ascend-k8sdeviceplugin:v6.0.0" in 403ms (403ms including waiting). Image size: 48017174 bytes. Warning BackOff 2m47s (x5216 over 18h) kubelet Back-off restarting failed container device-plugin-01 in pod ascend-device-plugin-ll46f_kube-system(8edcd384-ab2d-4998-8077-5ac58801c79e) Normal Pulling 66s (x227 over 19h) kubelet Pulling image "cr.openfuyao.cn/openfuyao/ascend-image/ascend-k8sdeviceplugin:v6.0.0" 故障 pod /dev 检查 [root@master1 fuyao-26.3-rc3]# kubectl -n kube-system exec -it daemonsets/ascend-device-plugin -- ls /dev Defaulted container "device-plugin-01" out of: device-plugin-01, init-permission (init) autofs null tty10 tty34 tty58 vcs5 bsg ppp tty11 tty35 tty59 vcs6 btrfs-control ptmx tty12 tty36 tty6 vcsa bus pts tty13 tty37 tty60 vcsa1 core random tty14 tty38 tty61 vcsa2 cpu_dma_latency raw tty15 tty39 tty62 vcsa3 cuse relationship_ctrl tty16 tty4 tty63 vcsa4 davinci0 rfkill tty17 tty40 tty7 vcsa5 davinci_manager rtc0 tty18 tty41 tty8 vcsa6 devmm_svm sda tty19 tty42 tty9 vcsu dri sda1 tty2 tty43 ttyAMA0 vcsu1 fb0 sda2 tty20 tty44 ttyS0 vcsu2 fd sg0 tty21 tty45 ttyS1 vcsu3 full sg1 tty22 tty46 ttyS2 vcsu4 fuse sg2 tty23 tty47 ttyS3 vcsu5 hidraw0 shm tty24 tty48 uhid vcsu6 hidraw1 snapshot tty25 tty49 uinput vfio hisi_hdc sr0 tty26 tty5 urandom vga_arbiter hwrng sr1 tty27 tty50 usbmon0 vhost-net input stderr tty28 tty51 usbmon1 vhost-vsock kmsg stdin tty29 tty52 usbmon2 vport2p1 loop-control stdout tty3 tty53 vcs zero mapper termination-log tty30 tty54 vcs1 mem tty tty31 tty55 vcs2 mqueue tty0 tty32 tty56 vcs3 net tty1 tty33 tty57 vcs4 故障 pod 驱动检查 [root@master1 fuyao-26.3-rc3]# kubectl -n kube-system exec -it daemonsets/ascend-device-plugin -- ls -lha /usr/local/Ascend/driver Defaulted container "device-plugin-01" out of: device-plugin-01, init-permission (init) total 44K drwxr-xr-x 8 root root 4.0K Mar 27 08:03 . drwxr-xr-x 3 root root 4.0K Mar 31 02:34 .. drwxr-xr-x 2 root root 4.0K Mar 27 08:01 bin -r--r--r-- 1 root root 20 Mar 27 08:01 build.info dr-xr-x--- 2 root root 4.0K Mar 27 08:01 device dr-x------ 41 root root 4.0K Mar 27 08:01 kernel drwxr-xr-x 6 root root 4.0K Mar 27 08:01 lib64 -r--r----- 1 root root 56 Mar 27 08:01 scene.info dr-xr-x--- 2 root root 4.0K Mar 27 08:01 script drwxr-xr-x 2 root root 4.0K Mar 27 08:01 tools -r--r--r-- 1 root root 352 Mar 27 08:03 version.info 故障 pod 日志 [root@master1 ~]# kubectl -n kube-system logs daemonsets/ascend-device-plugin --previous Defaulted container "device-plugin-01" out of: device-plugin-01, init-permission (init) [INFO] 2026/03/31 06:46:54.593254 1 hwlog/api.go:108 devicePlugin.log's logger init success [INFO] 2026/03/31 06:46:54.593449 1 main.go:187 ascend device plugin starting and the version is v6.0.0_linux-aarch64 [INFO] 2026/03/31 06:46:54.593494 1 main.go:188 ascend device plugin starting scene is center [INFO] 2026/03/31 06:46:54.787930 1 devmanager/devmanager.go:104 the dcmi version is 24.1.rc3 [ERROR] 2026/03/31 06:46:54.788019 1 devmanager/devmanager.go:211 get error card quantity: 0 [ERROR] 2026/03/31 06:46:54.788052 1 devmanager/devmanager.go:195 get card list failed for init [ERROR] 2026/03/31 06:46:54.788101 1 main.go:203 init devmanager failed, err: auto init failed, err: get card list failed for init 故障 pod 驱动检查 [root@master1 ~]# kubectl -n kube-system exec -it daemonsets/ascend-device-plugin -- bash -c 'find /usr/local/Ascend/driver -name libdcmi.so 2>/dev/null; echo $LD_LIBRARY_PATH' Defaulted container "device-plugin-01" out of: device-plugin-01, init-permission (init) /usr/local/Ascend/driver/lib64/driver/libdcmi.so command terminated with exit code 137 [root@master1 ~]# ps -ef | grep -E 'dmp_daemon|slogd' | grep -v grep root 21578 1 0 Mar30 ? 00:00:19 /usr/sbin/rsyslogd -n -i/var/run/rsyslogd.pid 检查服务状态? [root@master1 ~]# systemctl status ascend-dmi Unit ascend-dmi.service could not be found. [root@master1 ~]# systemctl status ascend-dkms Unit ascend-dkms.service could not be found. [root@master1 ~]# systemctl status npu-smi Unit npu-smi.service could not be found. [root@master1 ~]# find / -name dmp_daemon 2>/dev/null [root@master1 ~]# find / -name slogd 2>/dev/null [root@master1 ~]# ls -l /var/dmp_daemon /var/slogd 2>/dev/null [root@master1 ~]# dcmi 问题,需硬件排查 ...

April 1, 2026 | 5 分钟 | 2278 字 | Tianlun Song
KDE Plasma6 禁用全局菜单,恢复正常应用菜单

KDE Plasma6 禁用全局菜单,恢复正常应用菜单

前情提要 不知道从什么时候开始,KDE Plasma 默认启用类似 macOS 的全局应用菜单。 即应用窗口标题栏下方不显示菜单,而是移动到顶部菜单栏中“全局菜单”小组件中。 但问题是,Linux 桌面生态生态复杂,X11 Wayland Qt GTK 等等技术太过复杂,很难保证常用软件都能够正常显示全局菜单。 ...

April 1, 2026 | 1 分钟 | 431 字 | Tianlun Song