本文 首发于 🌱 煎茶转载 请注明 来源

最近发现环境中 KVM 虚拟机磁盘利用率查不准,使用 virsh 命令查看磁盘使用情况得到如下结果:

# virsh domblkinfo 20 vda --human
Capacity:       2.000 GiB
Allocation:     2.000 GiB
Physical:       2.000 GiB

显然是有问题的,正常的数值三个应该不通,进入系统查看磁盘使用率也仅有 2% 左右,因此试图通过检查源码的方式查看是否正确。

  • libvirt: 5.6.0
  • os: Centos

跟踪记录

首先找到 libvirtd 的 PID:

ps -aux | grep libvirtd
root      1907  0.0  0.0 1385796 25116 ?       Ssl  Aug26   5:22 /usr/sbin/libvirtd --timeout 120

使用 GDB 开始跟踪他:

gdb libvirtd 1907

首先在源码中全局搜索 domblkinfo 关键字,找到该命令的执行函数: tools/virsh-domain-monitor.c→cmdDomblkinfo

分析源码找到获取信息的函数 src/libvirt-domain.c -> virDomainGetBlockInfo

if (virDomainGetBlockInfo(dom, device, &info, 0) < 0)
    goto cleanup;

if (!cmdDomblkinfoGet(ctl, &info, &cap, &alloc, &phy, human))
    goto cleanup;
vshPrint(ctl, "%-15s %s\n", _("Capacity:"), cap);
vshPrint(ctl, "%-15s %s\n", _("Allocation:"), alloc);
vshPrint(ctl, "%-15s %s\n", _("Physical:"), phy);

这其中的 info 包含了所需信息,看一下填充该字段的 virDomainGetBlockInfo 函数实现,用 GDB 跟一下它吧.

跟踪 src/libvirt-domain.c -> virDomainGetBlockInfo

先打个断点:

(gdb) break virDomainGetBlockInfo
Breakpoint 1 at 0x7f4d4394a760: file libvirt-domain.c, line 6094.

再打开一个终端,执行一下命令:

[root@compute-01 ~]# virsh list
 Id   Name                State
-----------------------------------
 2    instance-000001b6   running
 3    instance-000001b8   running
 4    instance-000001b9   running

[root@compute-01 ~]# virsh domblkinfo 4 vda

此时会发现终端卡住了,看一下 GDB 已经将程序中断,单步调试看一下:

[Switching to Thread 0x7f4d32ef0700 (LWP 1918)]

Breakpoint 1, virDomainGetBlockInfo (domain=domain@entry=0x7f4cfc00aeb0, disk=0x7f4cfc00cc60 "vda", 
    info=info@entry=0x7f4d32eefac0, flags=0) at libvirt-domain.c:6094
6094    {
(gdb) n
6097        VIR_DOMAIN_DEBUG(domain, "info=%p, flags=0x%x", info, flags);
(gdb) n
6094    {
(gdb) n
6097        VIR_DOMAIN_DEBUG(domain, "info=%p, flags=0x%x", info, flags);
(gdb) n
6099        virResetLastError();
(gdb) n
6101        if (info)
(gdb) n
6102            memset(info, 0, sizeof(*info));
(gdb) n
6104        virCheckDomainReturn(domain, -1);
(gdb) n
6105        virCheckNonEmptyStringArgGoto(disk, error);
(gdb) n
6106        virCheckNonNullArgGoto(info, error);
(gdb) n
6110        if (conn->driver->domainGetBlockInfo) {
(gdb) n
6112            ret = conn->driver->domainGetBlockInfo(domain, disk, info, flags);
(gdb) s
qemuDomainGetBlockInfo (dom=0x7f4cfc00aeb0, path=0x7f4cfc00cc60 "vda", info=0x7f4d32eefac0, flags=0)
    at qemu/qemu_driver.c:12413
12413   {

发现在 6112 行跳到了另一个函数,继续跟踪它.

跟踪 src/qemu/qemu_driver.c -> qemuDomainGetBlockInfo

(gdb) n
12421       virCheckFlags(0, -1);
(gdb) n
12413   {
(gdb) n
12414       virQEMUDriverPtr driver = dom->conn->privateData;
(gdb) n
12421       virCheckFlags(0, -1);
(gdb) n
12419       qemuBlockStatsPtr entry = NULL;
(gdb) n
12414       virQEMUDriverPtr driver = dom->conn->privateData;
(gdb) n
12421       virCheckFlags(0, -1);
(gdb) n
12423       if (!(vm = qemuDomObjFromDomain(dom)))
(gdb) n
12426       cfg = virQEMUDriverGetConfig(driver);
(gdb) n
12428       if (virDomainGetBlockInfoEnsureACL(dom->conn, vm->def) < 0)
(gdb) n
12431       if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) < 0)
(gdb) n
12434       if (!(disk = virDomainDiskByName(vm->def, path, false))) {
(gdb) n
12440       if (virStorageSourceIsEmpty(disk->src)) {
(gdb) n
12448       if (!virDomainObjIsActive(vm)) {
(gdb) n
12460       if (qemuDomainBlocksStatsGather(driver, vm, path, true, &entry) < 0)
(gdb) n
12463       if (!entry->wr_highest_offset_valid) {
(gdb) n
12466           if (virStorageSourceGetActualType(disk->src) == VIR_STORAGE_TYPE_FILE &&
(gdb) n
12468               info->allocation = entry->physical;
(gdb) n
12466           if (virStorageSourceGetActualType(disk->src) == VIR_STORAGE_TYPE_FILE &&
(gdb) p info->allocation
$2 = 0
(gdb) n
12470               info->allocation = entry->wr_highest_offset;
(gdb) n
12484       if (entry->physical == 0 || info->allocation == 0 ||
(gdb) p info->allocation
$3 = 32870912
(gdb) p entry->wr_highest_offset
$4 = 32870912

至此,我们知道了 info -> allocation 的值来自 entry->wr_highest_offset ,接下来查看源码, entry->wr_highest_offset 的值应该是在这里被赋予的:

if (qemuDomainBlocksStatsGather(driver, vm, path, true, &entry) < 0)
  goto endjob;

下面将断点打在 qemuDomainBlocksStatsGather 看一下其中的 entry->wr_highest_offset 是在哪里被赋值.

跟踪 src/qemu/qemu_driver.c -> qemuDomainBlocksStatsGather

将之前的断点删除,打上新的断点

(gdb) info breakpoints 
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00007f4d4394a760 in virDomainGetBlockInfo at libvirt-domain.c:6094
        breakpoint already hit 1 time
(gdb) delete 1
(gdb) break qemuDomainBlocksStatsGather
Breakpoint 2 at 0x7f4d208e3700: file qemu/qemu_driver.c, line 11427.

之后在 GDB continue ,之后一直按回车,直到程序正常运行了,再执行一下获取磁盘信息的命令,继续跟踪。

Breakpoint 1, qemuDomainBlocksStatsGather (driver=driver@entry=0x7f4d180f99b0, vm=0x7f4d100b8890, 
    path=path@entry=0x7f4d0c00ae50 "vda", capacity=capacity@entry=true, retstats=retstats@entry=0x7f4d336f0980)
    at qemu/qemu_driver.c:11427
11427   {
(gdb) n
11428       qemuDomainObjPrivatePtr priv = vm->privateData;
(gdb) 
11429       bool blockdev = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV);
(gdb) 
11439       if (*path) {
(gdb) 
11440           if (!(disk = virDomainDiskByName(vm->def, path, false))) {
(gdb) 
11445           if (blockdev) {
(gdb) 
11448               if (!disk->info.alias) {
(gdb) 
11458       qemuDomainObjEnterMonitor(driver, vm);
(gdb) 
11459       nstats = qemuMonitorGetAllBlockStatsInfo(priv->mon, &blockstats, false);
(gdb) 
11461       if (capacity && nstats >= 0) {
(gdb) 
11465               rc = qemuMonitorBlockStatsUpdateCapacity(priv->mon, blockstats, false);
(gdb) 
11468       if (qemuDomainObjExitMonitor(driver, vm) < 0 || nstats < 0 || rc < 0)
(gdb) 
11471       if (VIR_ALLOC(*retstats) < 0)
(gdb) 
11474       if (entryname) {
(gdb) 
11475           if (!(stats = virHashLookup(blockstats, entryname))) {
(gdb) 
11481           **retstats = *stats;
(gdb) p stats
$12 = (qemuBlockStats *) 0x7f4d0c001000
(gdb) p *stats
$13 = {rd_req = 712, rd_bytes = 17435136, wr_req = 130, wr_bytes = 418816, rd_total_times = 527027278, 
  wr_total_times = 86798718, flush_req = 20, flush_total_times = 94396427, capacity = 2147483648, 
  physical = 2147483648, wr_highest_offset = 32870912, wr_highest_offset_valid = true, write_threshold = 0}
(gdb) c
Continuing.

分析这一调用过程,发现我们跟踪的 restats 变量来自 stats,而该值在这一行被填充:

11475           if (!(stats = virHashLookup(blockstats, entryname))) {

值来自哈希表查询结果,从 blockstats 中查询 entryname ,而该哈希表在这两行被赋值:

11459       nstats = qemuMonitorGetAllBlockStatsInfo(priv->mon, &blockstats, false);
11465       rc = qemuMonitorBlockStatsUpdateCapacity(priv->mon, blockstats, false);

之后就可以跟踪源码了,经过一番探索,发现他们最终都调用了同一个函数来从 QEMU 获取设备信息,即 src/qemu/qemu_monitor_json.c -> qemuMonitorJSONQueryBlock ,看一下它的函数实现:

static virJSONValuePtr
qemuMonitorJSONQueryBlock(qemuMonitorPtr mon)
{
    virJSONValuePtr cmd;
    virJSONValuePtr reply = NULL;
    virJSONValuePtr devices = NULL;

    if (!(cmd = qemuMonitorJSONMakeCommand("query-block", NULL)))
        return NULL;

    if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0 ||
        qemuMonitorJSONCheckReply(cmd, reply, VIR_JSON_TYPE_ARRAY) < 0)
        goto cleanup;

    devices = virJSONValueObjectStealArray(reply, "return");

 cleanup:
    virJSONValueFree(cmd);
    virJSONValueFree(reply);
    return devices;
}

继续探索会发现 libvirt 在这里调用了 QEMU 提供的 QMP 协议,其中的查询关键词为 query-block ,返回的结果中含有 wr_highest_offset 字段。

最终得到一张 libvirt 查询磁盘使用情况的调用栈示意图:

https://imagehost-cdn.frytea.com/images/2021/09/02/domblkinfoac4ecdcf5caa1926.png

如果继续探索,可能就需要去跟踪 QEMU 源码了,下篇文章见。

参考文献

附件

libvirt-domblkinfo-命令源码调用栈 .xmind