使用 Intel 开源工具 MLC 压测处理器与内存之间的延迟、吞吐量等指标

影响应用程序性能的一个关键因素是应用程序从处理器缓存和从内存子系统获取数据所消耗的时间，Intel Memory Latency Checker(Intel MLC)是一个用于测试延迟和带宽如何随着系统负载的增加而变化; 支持 Linux 、Windows；实现方法是: MLC 创建压测主机逻辑处理器数量减 1 个线程，然后使用这些线程生成压测流量，余下的 1 个 vCPU 用于运行一个测量延迟的线程
Memory Latency Checker 下载地址是一个二进制程序，下载后即可执行

带宽

使用命令 mlc --bandwidth_matrix -W5 压测处理器与内存之间的吞吐量，-W5 表示压测流量的读写比列为1:1

#> Intel(R) Memory Latency Checker - v3.9a
#> Command line parameters: --bandwidth_matrix -W5
#>
#> Using buffer size of 100.000MiB/thread for reads and an additional 100.000MiB/thread for writes
#> Measuring Memory Bandwidths between nodes within system
#> Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
#> Using all the threads from each core if Hyper-threading is enabled
#>                 Numa node
#> Numa node            0       1
#>        0        33327.9 20848.6
#>        1        21141.6 32996.9
压测结果说明:  以不同的读写比列执行处理器与内存之间、跨处理器内存之间的吞吐量测试，单位 MB/sec
33327.9  # 代表第一个处理器与内存之间的吞吐量
20848.6  # 代表第一个处理器与第二个内存之间的吞吐量
21141.6  # 代表第二个处理器与第一个内存之间的吞吐量
32996.9  # 代表第二个处理器与内存之间的吞吐量

可根据实际场景，设置不同读写比列的压测流量

2  - 2:1 read-write ratio
3  - 3:1 read-write ratio
4  - 3:2 read-write ratio
5  - 1:1 read-write ratio
12 - 4:1 read-Write ratio

最大带宽

使用命令mlc --max_bandwidth测量处理器与内存之间的最大吞吐量(带宽)

#> Intel(R) Memory Latency Checker - v3.9a
#> Command line parameters: --max_bandwidth
#>
#> Using buffer size of 100.000MiB/thread for reads and an additional 100.000MiB/thread for writes
#>
#> Measuring Maximum Memory Bandwidths for the system
#> Will take several minutes to complete as multiple injection rates will be tried to get the best bandwidth
#> Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
#> Using all the threads from each core if Hyper-threading is enabled
#> Using traffic with the following read-write ratios
#> ALL Reads        :      80152.58
#> 3:1 Reads-Writes :      71421.09
#> 2:1 Reads-Writes :      44202.35
#> 1:1 Reads-Writes :      34350.70
#> Stream-triad like:      62450.57
压测结果说明: 以`不同的读写压测负载`测试处理器与内存之间的最大带宽

延迟

内存延迟: 表示系统进入数据存取操作就绪状态前等待内存响应的时间

使用命令mlc --latency_matrix 测量处理器与内存之间、跨处理器的内存之间的内存延迟

#> Intel(R) Memory Latency Checker - v3.9a
#> Command line parameters: --latency_matrix
#>
#> Using buffer size of 200.000MiB
#> Measuring idle latencies (in ns)...
#>                 Numa node
#> Numa node            0       1
#>        0          87.4   139.3
#>        1         143.1    83.0
压测结果说明: 0 、1 分别为不同处理器的编号，单位纳秒(ns)
87.4   表示处理器 0 与内存之间的内存延迟
139.3  表示处理器 0 与处理器 1 的内存之间的内存延迟
...

缓存延迟

使用命令mlc --c2c_latency 测量处理器 L2 缓存之间、跨处理器 L2 缓存之间的访问延迟

#> Intel(R) Memory Latency Checker - v3.9a
#> Command line parameters: --c2c_latency
#>
#> Measuring cache-to-cache transfer latency (in ns)...
#> Local Socket L2->L2 HIT  latency        38.5
#> Local Socket L2->L2 HITM latency        42.3
#> Remote Socket L2->L2 HITM latency (data address homed in writer socket)
#>                         Reader Numa Node
#> Writer Numa Node     0       1
#>             0        -   106.5
#>             1    106.1       -
压测结果说明: 同一处理器 L2 缓存访问延迟 40ns, 跨处理器 L2 缓存之间访问延迟 106ns

综合压测

使用命令mlc --loaded_latency -W5压测随着负载变化，其内存吞吐量、访问延迟的变化情况

#> Intel(R) Memory Latency Checker - v3.9a
#> Command line parameters: --loaded_latency -W5
#>
#> Using buffer size of 100.000MiB/thread for reads and an additional 100.000MiB/thread for writes
#>
#> Measuring Loaded Latencies for the system
#> Using all the threads from each core if Hyper-threading is enabled
#> Inject  Latency Bandwidth
#> Delay   (ns)    MB/sec
#> ==========================
#>  00000  383.60    67827.6
#>  00002  393.39    67658.7
#>  00008  419.90    67830.0
#>  00015  382.61    67648.1
#>  00050  377.18    67606.7
#>  00100  368.72    68086.5
#>  00200  396.66    68070.5
#>  00300  390.89    68146.6
#>  00400  359.15    67834.4
#>  00500  301.56    67378.4
#>  00700  143.45    54382.3
#>  01000  124.01    38702.9
#>  01300  116.79    30065.3
#>  01700  113.90    23264.8
#>  02500  111.55    16044.9
#>  03500  114.57    11659.9
#>  05000  113.40     8356.1
#>  09000  111.34     4912.8
#>  20000  110.11     2537.8

重要提醒: 由于笔者时间、视野、认知有限，本文难免出现错误、疏漏等问题，期待各位读者朋友、业界大佬指正交流, 共同进步 !!