java基准测试工具JMH

最后更新:2019-06-10

通过JMH实现基准测试,很早以前的学习笔记,直接复制过来没有排版

Hello World

@Benchmark
public void wellHelloThere() {
	// this method was intentionally left blank.
}

public static void main(String[] args) throws RunnerException {

	Options opt = new OptionsBuilder()
			.include(JMHSample_01_HelloWorld.class.getSimpleName())
			.forks(1)
			.build();
	new Runner(opt).run();
}

eclipse和Idea都需要安装相应的插件,才能直接使用IDE运行

输出结果如下

# JMH version: 1.19
# VM version: JDK 1.8.0_121, VM 25.121-b13
# VM invoker: D:\Program Files\Java\jdk1.8.0_121\jre\bin\java.exe
# VM options: -Didea.launcher.port=7533 -Didea.launcher.bin.path=D:\Program Files (x86)\JetBrains\IntelliJ IDEA 15.0.1\bin -Dfile.encoding=UTF-8
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.edgar.jmh.JMHSample_01_HelloWorld.wellHelloThere

# Run progress: 0.00% complete, ETA 00:00:40
# Fork: 1 of 1
# Warmup Iteration   1: 3334587130.932 ops/s
# Warmup Iteration   2: 3372276836.066 ops/s
# Warmup Iteration   3: 3384310844.194 ops/s
# Warmup Iteration   4: 3387476909.671 ops/s
# Warmup Iteration   5: 3380642559.445 ops/s
# Warmup Iteration   6: 3393149632.385 ops/s
# Warmup Iteration   7: 3378950192.071 ops/s
# Warmup Iteration   8: 3393519457.265 ops/s
# Warmup Iteration   9: 3382872378.666 ops/s
# Warmup Iteration  10: 3368376265.110 ops/s
# Warmup Iteration  11: 3391371337.236 ops/s
# Warmup Iteration  12: 3383800118.631 ops/s
# Warmup Iteration  13: 3383415802.274 ops/s
# Warmup Iteration  14: 3371278558.012 ops/s
# Warmup Iteration  15: 3366417297.267 ops/s
# Warmup Iteration  16: 3374286593.526 ops/s
# Warmup Iteration  17: 3389915878.583 ops/s
# Warmup Iteration  18: 3405257506.150 ops/s
# Warmup Iteration  19: 3408615811.184 ops/s
# Warmup Iteration  20: 3406636073.194 ops/s
Iteration   1: 3399227672.975 ops/s
Iteration   2: 3404088125.346 ops/s
Iteration   3: 3386996791.875 ops/s
Iteration   4: 3386906925.560 ops/s
Iteration   5: 3386609807.965 ops/s
Iteration   6: 3381818133.096 ops/s
Iteration   7: 3406243496.806 ops/s
Iteration   8: 3406180039.465 ops/s
Iteration   9: 3399250423.656 ops/s
Iteration  10: 3390079424.132 ops/s
Iteration  11: 3389738822.993 ops/s
Iteration  12: 3394852882.096 ops/s
Iteration  13: 3396847962.170 ops/s
Iteration  14: 3399010497.882 ops/s
Iteration  15: 3391950932.885 ops/s
Iteration  16: 3398861290.412 ops/s
Iteration  17: 3399068792.245 ops/s
Iteration  18: 3396571548.943 ops/s
Iteration  19: 3388550184.040 ops/s
Iteration  20: 3407022297.611 ops/s


​	
​	Result "com.edgar.jmh.JMHSample_01_HelloWorld.wellHelloThere":
​	  3395493802.608 ±(99.9%) 6423090.888 ops/s [Average]
​	  (min, avg, max) = (3381818133.096, 3395493802.608, 3407022297.611), stdev = 7396841.019
​	  CI (99.9%): [3389070711.719, 3401916893.496] (assumes normal distribution)


​	
​	# Run complete. Total time: 00:00:40
​	
Benchmark                                Mode  Cnt           Score         Error  Units
JMHSample_01_HelloWorld.wellHelloThere  thrpt   20  3395493802.608 ± 6423090.888  ops/s

从上面的描述可以看到wellHelloThere方法的吞吐量

  • 第一部分描述了一些参数设置,然后预热迭代执行,上述方法以秒为单位执行20次迭代,并且使用一个线程同步执行
  • 第二部分描述了正常的迭代执行
  • 第三部分描述了基准测试的结果, ``` 3395493802.608 ±(99.9%) 6423090.888 ops/s [Average]表示20次迭代里平均每次可以执行3395493802.608次操作。

(min, avg, max) = (3381818133.096, 3395493802.608, 3407022297.611), stdev = 7396841.019 表示最小值,平均值,最大值,标准偏差

CI (99.9%): [3389070711.719, 3401916893.496] (assumes normal distribution) 假设结果是正常分布的,基于该样本大小,该方法的真正执行次数在3395493802.608-6423090.8883395493802.608+6423090.888之间

## 基本概念
### Mode

Mode 表示 JMH 进行 Benchmark 时所使用的模式。通常是测量的维度不同,或是测量的方式不同。目前 JMH 共有四种模式:

- Throughput: 整体吞吐量,例如“1秒内可以执行多少次调用”。
- AverageTime: 调用的平均时间,例如“每次调用平均耗时xxx毫秒”。
- SampleTime: 随机取样,最后输出取样结果的分布,例如“99%的调用在xxx毫秒以内,99.99%的调用在xxx毫秒以内”
- SingleShotTime: 以上模式都是默认一次 iteration 是 1s,唯有 SingleShotTime 是只运行一次。往往同时把 warmup 次数设为0,用于测试冷启动时的性能。

### Iteration

Iteration 是 JMH 进行测试的最小单位。在大部分模式下,一次 iteration 代表的是一秒,JMH 会在这一秒内不断调用需要 benchmark 的方法,然后根据模式对其采样,计算吞吐量,计算平均执行时间等。
### Warmup

Warmup 是指在实际进行 benchmark 前先进行预热的行为。为什么需要预热?因为 JVM 的 JIT 机制的存在,如果某个函数被调用多次之后,JVM 会尝试将其编译成为机器码从而提高执行速度。所以为了让 benchmark 的结果更加接近真实情况就需要进行预热。

**因为JIT的存在,一定设置合适的预热,并使用充足的样本进行测量**

## 测试模式 BenchmarkMode

测试方法上@BenchmarkMode注解表示使用特定的测试模式:

- Mode.Throughput 	计算一个时间单位内操作数量
- Mode.AverageTime 	计算平均运行时间
- Mode.SampleTime 	计算一个方法的运行时间(包括百分位)
- Mode.SingleShotTime 	方法仅运行一次(用于冷测试模式)。或者特定批量大小的迭代多次运行(具体查看后面的“`@Measurement“`注解)——这种情况下JMH将计算批处理运行时间(一次批处理所有调用的总时间)
- 上述模式的任意组合,可以指定这些模式的任意组合——该测试运行多次(取决于请求模式的数量)
- Mode.All 	所有模式依次运行

## 时间单位

使用@OutputTimeUnit指定时间单位,它需要一个标准Java类型java.util.concurrent.TimeUnit作为参数。可是如果在一个测试中指定了多种测试模式,给定的时间单位将用于所有的测试(比如,测试SampleTime适宜使用纳秒,但是throughput使用更长的时间单位测量更合适)。

示例:

public static void main(String[] args) throws RunnerException {

Options opt = new OptionsBuilder()
		.include(JMHSample_02_BenchmarkModes.class.getSimpleName())
		.warmupIterations(5)
		.measurementIterations(5)
		.forks(1)
		.build();
new Runner(opt).run(); } ``` ### Throughput SECONDS ```   @Benchmark   @BenchmarkMode(Mode.Throughput)   @OutputTimeUnit(TimeUnit.SECONDS)   public void measureThroughput() throws InterruptedException {
TimeUnit.MILLISECONDS.sleep(100);   } ``` 输出: ``` Result "com.edgar.jmh.JMHSample_02_BenchmarkModes.measureThroughput":   10.002 ±(99.9%) 0.009 ops/s [Average]   (min, avg, max) = (10.000, 10.002, 10.006), stdev = 0.002   CI (99.9%): [9.993, 10.011] (assumes normal distribution)

Run complete. Total time: 00:00:11

Benchmark Mode Cnt Score Error Units JMHSample_02_BenchmarkModes.measureThroughput thrpt 5 10.002 ± 0.009 ops/s

与第一个例子类似


### AverageTime MICROSECONDS

@Benchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) public void measureAvgTime() throws InterruptedException { TimeUnit.MILLISECONDS.sleep(100); }

输出

Result “com.edgar.jmh.JMHSample_02_BenchmarkModes.measureAvgTime”: 99974.534 ±(99.9%) 148.158 us/op [Average] (min, avg, max) = (99929.317, 99974.534, 100013.300), stdev = 38.476 CI (99.9%): [99826.377, 100122.692] (assumes normal distribution)

Run complete. Total time: 00:00:11

Benchmark Mode Cnt Score Error Units JMHSample_02_BenchmarkModes.measureAvgTime avgt 5 99974.534 ± 148.158 us/op

可以看到,每个方法平均需要99974.534微秒(100毫秒左右)


### SampleTime MICROSECONDS

@Benchmark @BenchmarkMode(Mode.SampleTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) public void measureSamples() throws InterruptedException { TimeUnit.MILLISECONDS.sleep(100); }

输出

Result “com.edgar.jmh.JMHSample_02_BenchmarkModes.measureSamples”: N = 55 mean = 99881.630 ±(99.9%) 78.483 us/op

Histogram, us/op: [ 99000.000, 99125.000) = 0 [ 99125.000, 99250.000) = 1 [ 99250.000, 99375.000) = 1 [ 99375.000, 99500.000) = 2 [ 99500.000, 99625.000) = 1 [ 99625.000, 99750.000) = 0 [ 99750.000, 99875.000) = 0 [ 99875.000, 100000.000) = 33 [100000.000, 100125.000) = 16 [100125.000, 100250.000) = 0 [100250.000, 100375.000) = 1 [100375.000, 100500.000) = 0 [100500.000, 100625.000) = 0 [100625.000, 100750.000) = 0 [100750.000, 100875.000) = 0

Percentiles, us/op: p(0.0000) = 99221.504 us/op p(50.0000) = 99876.864 us/op p(90.0000) = 100007.936 us/op p(95.0000) = 100007.936 us/op p(99.0000) = 100270.080 us/op p(99.9000) = 100270.080 us/op p(99.9900) = 100270.080 us/op p(99.9990) = 100270.080 us/op p(99.9999) = 100270.080 us/op p(100.0000) = 100270.080 us/op

Run complete. Total time: 00:00:11

Benchmark Mode Cnt Score Error Units JMHSample_02_BenchmarkModes.measureSamples sample 55 99881.630 ± 78.483 us/op JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.00 sample 99221.504 us/op JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.50 sample 99876.864 us/op JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.90 sample 100007.936 us/op JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.95 sample 100007.936 us/op JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.99 sample 100270.080 us/op JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.999 sample 100270.080 us/op JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.9999 sample 100270.080 us/op JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p1.00 sample 100270.080 us/op

N = 55:表示总共运行了55次方法

mean =  99881.630 ±(99.9%) 78.483 us/op:表示了方法的平均运行时间

Histogram标明了方法运行时间的分布

Percentile表示了直方图的百分位(还不太懂,个人理解是在整个样本中,<=99221.504us的方法占比为0%,<=100007.936 us的方法占比95%之类的意思)

### SingleShotTime MICROSECONDS

@Benchmark @BenchmarkMode(Mode.SingleShotTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) public void measureSingleShot() throws InterruptedException { TimeUnit.MILLISECONDS.sleep(100); }

输出结果与SampleTime基本类似,只是N=5

Result “com.edgar.jmh.JMHSample_02_BenchmarkModes.measureSingleShot”: N = 5 mean = 99684.424 ±(99.9%) 1626.577 us/op

Histogram, us/op: [ 99100.000, 99200.000) = 1 [ 99200.000, 99300.000) = 0 [ 99300.000, 99400.000) = 1 [ 99400.000, 99500.000) = 0 [ 99500.000, 99600.000) = 0 [ 99600.000, 99700.000) = 0 [ 99700.000, 99800.000) = 0 [ 99800.000, 99900.000) = 0 [ 99900.000, 100000.000) = 1

Percentiles, us/op: p(0.0000) = 99113.046 us/op p(50.0000) = 99952.386 us/op p(90.0000) = 100003.584 us/op p(95.0000) = 100003.584 us/op p(99.0000) = 100003.584 us/op p(99.9000) = 100003.584 us/op p(99.9900) = 100003.584 us/op p(99.9990) = 100003.584 us/op p(99.9999) = 100003.584 us/op p(100.0000) = 100003.584 us/op

Run complete. Total time: 00:00:01

Benchmark Mode Cnt Score Error Units JMHSample_02_BenchmarkModes.measureSingleShot ss 5 99684.424 ± 1626.577 us/op

### 组合和所有

@Benchmark @BenchmarkMode({Mode.Throughput, Mode.AverageTime, Mode.SampleTime, Mode.SingleShotTime}) @OutputTimeUnit(TimeUnit.MICROSECONDS) public void measureMultiple() throws InterruptedException { TimeUnit.MILLISECONDS.sleep(100); }

@Benchmark @BenchmarkMode(Mode.All) @OutputTimeUnit(TimeUnit.MICROSECONDS) public void measureAll() throws InterruptedException { TimeUnit.MILLISECONDS.sleep(100); }

输出的结果是上面结果的一些组合

## 测试参数状态 State

测试方法可能接收参数。这需要提供单个的参数类,这个类遵循以下4条规则:

    有无参构造函数(默认构造函数)
    是公共类
    内部类应该是静态的
    该类必须使用@State注解

@State注解定义了给定类实例的可用范围。JMH可以在多线程同时运行的环境测试,因此需要选择正确的状态。

- Scope.Thread 	默认状态。实例将分配给运行给定测试的每个线程。
- Scope.Benchmark 	运行相同测试的所有线程将共享实例。可以用来测试状态对象的多线程性能(或者仅标记该范围的基准)。
- Scope.Group 	实例分配给每个线程组(查看后面的线程组部分)

除了将单独的类标记@State,也可以将你自己的benchmark类使用@State标记。上面所有的规则对这种情况也适用。

示例:

public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(JMHSample_03_States.class.getSimpleName()) .warmupIterations(5) .measurementIterations(5) .threads(4) .forks(1) .build(); new Runner(opt).run(); }

### Scope.Thread

@State(Scope.Thread) public static class ThreadState { volatile double x = Math.PI; }

@Benchmark public void measureUnshared(ThreadState state) { // All benchmark threads will call in this method. // // However, since ThreadState is the Scope.Thread, each thread // will have it’s own copy of the state, and this benchmark // will measure unshared case. state.x++; }

输出结果:

Result “com.edgar.jmh.JMHSample_03_States.measureUnshared”: 593250331.949 ±(99.9%) 46541586.909 ops/s [Average] (min, avg, max) = (572439316.076, 593250331.949, 604079278.138), stdev = 12086702.321 CI (99.9%): [546708745.040, 639791918.858] (assumes normal distribution)

Run complete. Total time: 00:00:10

Benchmark Mode Cnt Score Error Units JMHSample_03_States.measureUnshared thrpt 5 593250331.949 ± 46541586.909 ops/s

### Benchmark

@State(Scope.Benchmark) public static class BenchmarkState { volatile double x = Math.PI; }

@Benchmark public void measureShared(BenchmarkState state) { // All benchmark threads will call in this method. // // Since BenchmarkState is the Scope.Benchmark, all threads // will share the state instance, and we will end up measuring // shared case. state.x++; }

输出结果

Result “com.edgar.jmh.JMHSample_03_States.measureShared”: 42884141.141 ±(99.9%) 3669011.681 ops/s [Average] (min, avg, max) = (41324568.655, 42884141.141, 43864532.886), stdev = 952830.682 CI (99.9%): [39215129.461, 46553152.822] (assumes normal distribution)

Run complete. Total time: 00:00:10

Benchmark Mode Cnt Score Error Units JMHSample_03_States.measureShared thrpt 5 42884141.141 ± 3669011.681 ops/s

通过两个比较可以看到共享示例由于存在临界资源state所以要慢一些(仅仅是参考)

### 标记自己的benchmark类

@State(Scope.Thread) public class JMHSample_04_DefaultState {

double x = Math.PI;

@Benchmark
public void measure() {
	x++;
}

}

## 状态设置和清理

与JUnit测试类似,使用@Setup和@TearDown注解标记状态类的方法(这些方法在JMH文档中称为fixtures)。setup/teardown方法的数量是任意的。这些方法不会影响测试时间(但是Level.Invocation可能影响测量精度)。

@Setup/@TearDown注解使用Level参数来指定何时调用fixture:

- Level.Trial 	默认level。全部benchmark运行(一组迭代)之前/之后
- Level.Iteration 	一次迭代之前/之后(一组调用)
- Level.Invocation 	每个方法调用之前/之后(不推荐使用,除非你清楚这样做的目的)

示例

@State(Scope.Thread) public class JMHSample_05_StateFixtures {

double x;

@Setup
public void prepare() {
	x = Math.PI;
}

@TearDown
public void check() {
	assert x > Math.PI : "Nothing changed?";
}

@Benchmark
public void measureRight() {
	x++;
}

 */
@Benchmark
public void measureWrong() {
	double x = 0;
	x++;
}

}

prepare方法在方法调用先设置了x的值=PI,check方法在方法调用后检查x的值是否正确

### level.Iteration

@TearDown(Level.Iteration) public void check() { assert x > Math.PI : “Nothing changed?”; }

### Level.Invocation

下面的示例对两个线程池做了基准测试,NormalState会在每次迭代开始时启动线程池,结束后关闭线程池,LaggingState除了上述功能外还会在每个方法运行前先停顿10毫秒

@OutputTimeUnit(TimeUnit.MICROSECONDS) public class JMHSample_07_FixtureLevelInvocation {

@State(Scope.Benchmark)
public static class NormalState {
	ExecutorService service;

	@Setup(Level.Trial)
	public void up() {
		service = Executors.newCachedThreadPool();
	}

​ @TearDown(Level.Trial) public void down() { service.shutdown(); }

}

public static class LaggingState extends NormalState {

	public static final int SLEEP_TIME = Integer.getInteger("sleepTime", 10);

	@Setup(Level.Invocation)
	public void lag() throws InterruptedException {
		TimeUnit.MILLISECONDS.sleep(SLEEP_TIME);
	}

} ​	
@Benchmark
@BenchmarkMode(Mode.AverageTime)
public double measureHot(NormalState e, final Scratch s) throws ExecutionException, InterruptedException {
	return e.service.submit(new Task(s)).get();
}

@Benchmark
@BenchmarkMode(Mode.AverageTime)
public double measureCold(LaggingState e, final Scratch s) throws ExecutionException, InterruptedException {
	return e.service.submit(new Task(s)).get();
} ​	
@State(Scope.Thread)
public static class Scratch {

	private double p;

	public double doWork() {
		p = Math.log(p);
		return p;
	}

}

public static class Task implements Callable<Double> {

	private Scratch s;

	public Task(Scratch s) {
		this.s = s;
	}

	@Override
	public Double call() {
		return s.doWork();
	}

}

public static void main(String[] args) throws RunnerException {
	Options opt = new OptionsBuilder()
			.include(JMHSample_07_FixtureLevelInvocation.class.getSimpleName())
			.warmupIterations(5)
			.measurementIterations(5)
			.forks(1)
			.build();
	new Runner(opt).run();
} ​	 } ``` ## 冗余代码 冗余代码消除是microbenchmark中众所周知的问题。通常的解决方法是以某种方式使用计算结果。JMH本身不会实施对冗余代码的消除。**但是如果你想消除冗余代码——要做到测试程序返回值不为void。永远返回你的计算结果**。JMH将完成剩余的工作。

如果测试程序需要返回多个值,将所有这些返回值使用省时操作结合起来(省时是指相对于获取到所有结果所做操作的开销),或者使用BlackHole作为方法参数,将所有的结果放入其中(注意某些情况下BlockHole.consume可能比手动将结果组合起来开销更大)。BlackHole是一个thread-scoped类:

@GenerateMicroBenchmark
public void testSomething( BlackHole bh )
{
	bh.consume( Math.sin( state_field ));
	bh.consume( Math.cos( state_field ));
}

示例:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_08_DeadCode {

	private double x = Math.PI;

	@Benchmark
	public void baseline() {
		// do nothing, this is a baseline
	}

	@Benchmark
	public void measureWrong() {
		// This is wrong: result is not used and the entire computation is optimized away.
		Math.log(x);
	}

	@Benchmark
	public double measureRight() {
		// This is correct: the result is being used.
		return Math.log(x);
	}

}

从结果可以看出,measureWrong被优化了,运行时间和baseline差不多

# Run complete. Total time: 00:00:31

Benchmark                           Mode  Cnt   Score   Error  Units
JMHSample_08_DeadCode.baseline      avgt    5   0.297 ± 0.003  ns/op
JMHSample_08_DeadCode.measureRight  avgt    5  22.166 ± 0.271  ns/op
JMHSample_08_DeadCode.measureWrong  avgt    5   0.295 ± 0.005  ns/op

Blackhole

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class JMHSample_09_Blackholes {

	double x1 = Math.PI;

	double x2 = Math.PI * 2;


​	
	@Benchmark
	public double baseline() {
		return Math.log(x1);
	}

	@Benchmark
	public double measureWrong() {
		Math.log(x1);
		return Math.log(x2);
	}

	@Benchmark
	public double measureRight_1() {
		return Math.log(x1) + Math.log(x2);
	}


​	
	@Benchmark
	public void measureRight_2(Blackhole bh) {
		bh.consume(Math.log(x1));
		bh.consume(Math.log(x2));
	}

	public static void main(String[] args) throws RunnerException {
		Options opt = new OptionsBuilder()
				.include(JMHSample_09_Blackholes.class.getSimpleName())
				.warmupIterations(5)
				.measurementIterations(5)
				.forks(1)
				.build();
		new Runner(opt).run();
	}
}

从结果可以看出 measureWrong进行了优化,运行时间和baseline差不多

# Run complete. Total time: 00:00:41

Benchmark                               Mode  Cnt   Score   Error  Units
JMHSample_09_Blackholes.baseline        avgt    5  22.177 ± 0.156  ns/op
JMHSample_09_Blackholes.measureRight_1  avgt    5  41.199 ± 0.124  ns/op
JMHSample_09_Blackholes.measureRight_2  avgt    5  43.811 ± 0.131  ns/op
JMHSample_09_Blackholes.measureWrong    avgt    5  22.204 ± 0.536  ns/op

常量处理

如果计算结果是可预见的并且不依赖于状态对象,它可能被JIT优化。因此,最好总是从状态对象读取测试的输入并且返回计算的结果。这条规则大体上用于单个返回值的情形。使用BlackHole对象JVM更难优化它(但不是不可能被优化)。下面测试的所有方法都不会被优化

private double x = Math.PI;
  
@GenerateMicroBenchmark
public void bhNotQuiteRight( BlackHole bh )
{
	bh.consume( Math.sin( Math.PI ));
	bh.consume( Math.cos( Math.PI ));
}
  
@GenerateMicroBenchmark
public void bhRight( BlackHole bh )
{
	bh.consume( Math.sin( x ));
	bh.consume( Math.cos( x ));
}

返回单个值的情形更加复杂。下面的测试不会被优化,但是如果使用Math.log替换Math.sin,那么testWrong方法将被常量值替换。

private double x = Math.PI;
  
@GenerateMicroBenchmark
public double testWrong()
{
	return Math.sin( Math.PI );
}
  
@GenerateMicroBenchmark
public double testRight()
{
	return Math.sin( x );
}

因此,为使测试更可靠要严格遵守以下规则:永远从状态对象读取测试输入并返回计算的结果

示例:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_10_ConstantFold {

	private double x = Math.PI;

	private final double wrongX = Math.PI;
​	
	@Benchmark

	public double baseline() {
		// simply return the value, this is a baseline
		return Math.PI;
	}
​	
	@Benchmark
	public double measureWrong_1() {
		// This is wrong: the source is predictable, and computation is foldable.
		return Math.log(Math.PI);

	}
​	
	@Benchmark
	public double measureWrong_2() {
		// This is wrong: the source is predictable, and computation is foldable.
		return Math.log(wrongX);
	}
​	
	@Benchmark
	public double measureRight() {
		// This is correct: the source is not predictable.
		return Math.log(x);
	}

}

从结果可以看出measureWrong_1和measureWrong_2都进行了优化,因为他们的结果是可以预测的。

	# Run complete. Total time: 00:00:41
	
	Benchmark                                 Mode  Cnt   Score   Error  Units
	JMHSample_10_ConstantFold.baseline        avgt    5   2.737 ± 0.133  ns/op
	JMHSample_10_ConstantFold.measureRight    avgt    5  22.194 ± 0.172  ns/op
	JMHSample_10_ConstantFold.measureWrong_1  avgt    5   2.666 ± 0.150  ns/op
	JMHSample_10_ConstantFold.measureWrong_2  avgt    5   2.651 ± 0.025  ns/op

循环

不要在测试中使用循环。JIT非常聪明,在循环中经常出现不可预料的处理。要测试真实的计算,让JMH处理剩余的部分。

在非统一开销操作情况下(比如测试处理列表的时间,这个列表在每个测试后有所增加),你可能使用@BenchmarkMode(Mode.SingleShotTime) 和@Measurement(batchSize = N)。但是不允许你自己实现测试的循环。

示例

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_11_Loops {

	int x = 1;

	int y = 2;

	@Benchmark
	public int measureRight() {
		return (x + y);
	}

	private int reps(int reps) {
		int s = 0;
		for (int i = 0; i < reps; i++) {
			s += (x + y);
		}
		return s;
	}

	@Benchmark
	@OperationsPerInvocation(1)
	public int measureWrong_1() {
		return reps(1);
	}
​	
	@Benchmark
	@OperationsPerInvocation(10)
	public int measureWrong_10() {
		return reps(10);
	}

	@Benchmark
	@OperationsPerInvocation(100)
	public int measureWrong_100() {
		return reps(100);
	}
​	
	@Benchmark
	@OperationsPerInvocation(1000)
	public int measureWrong_1000() {
		return reps(1000);
	}
​	
	@Benchmark
	@OperationsPerInvocation(10000)
	public int measureWrong_10000() {
		return reps(10000);
	}
​	
	@Benchmark
	@OperationsPerInvocation(100000)
	public int measureWrong_100000() {
		return reps(100000);
	}

}

输出结果(尚未明白,可能JMH将循环直接做了优化)

# Run complete. Total time: 00:01:12

Benchmark                               Mode  Cnt  Score    Error  Units
JMHSample_11_Loops.measureRight         avgt    5  2.652 ±  0.051  ns/op
JMHSample_11_Loops.measureWrong_1       avgt    5  2.654 ±  0.062  ns/op
JMHSample_11_Loops.measureWrong_10      avgt    5  0.312 ±  0.004  ns/op
JMHSample_11_Loops.measureWrong_100     avgt    5  0.038 ±  0.001  ns/op
JMHSample_11_Loops.measureWrong_1000    avgt    5  0.041 ±  0.001  ns/op
JMHSample_11_Loops.measureWrong_10000   avgt    5  0.025 ±  0.001  ns/op
JMHSample_11_Loops.measureWrong_100000  avgt    5  0.021 ±  0.001  ns/op

分支

默认JMH为每个试验(迭代集合)fork一个新的java进程。这样可以防止前面收集的“资料”——其他被加载类以及它们执行的信息对当前测试的影响。比如,实现了相同接口的两个类,测试它们的性能,那么第一个实现(目标测试类)可能比第二个快,因为JIT发现第二个实现类后就把第一个实现的直接方法调用替换为接口方法调用。

因此,不要把forks设为0,除非你清楚这样做的目的。

极少数情况下需要指定JVM分支数量时,使用@Fork对方法注解,就可以设置分支数量,预热(warmup)迭代数量和JVM分支的其他参数。

可能通过JMH API调用来指定JVM分支参数也有优势——可以使用一些JVM -XX:参数,通过JMH API访问不到它。这样就可以根据你的代码自动选择最佳的JVM设置(new Runner(opt).run()以简便的形式返回了所有的测试结果)。

示例:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_12_Forking {

	public interface Counter {
		int inc();
	}

	public class Counter1 implements Counter {

		private int x;

		@Override
		public int inc() {
			return x++;
		}
	}

	public class Counter2 implements Counter {

		private int x;

		@Override
		public int inc() {
			return x++;
		}
	}

	public int measure(Counter c) {

		int s = 0;

		for (int i = 0; i < 10; i++) {
			s += c.inc();
		}
		return s;
	}

	Counter c1 = new Counter1();

	Counter c2 = new Counter2();

	@Benchmark
	@Fork(0)
	public int measure_1_c1() {
		return measure(c1);
	}

	@Benchmark
	@Fork(0)
	public int measure_2_c2() {
		return measure(c2);
	}

	@Benchmark
	@Fork(0)
	public int measure_3_c1_again() {
		return measure(c1);
	}

	@Benchmark
	@Fork(1)
	public int measure_4_forked_c1() {
		return measure(c1);
	}

	@Benchmark
	@Fork(1)
	public int measure_5_forked_c2() {
		return measure(c2);
	}

	public static void main(String[] args) throws RunnerException {
		Options opt = new OptionsBuilder()
				.include(JMHSample_12_Forking.class.getSimpleName())
				.warmupIterations(5)
				.measurementIterations(5)
				.build();
		new Runner(opt).run();
	}
​	
}

从结果(不是太懂,需要补下JIT的知识)

# Run complete. Total time: 00:00:50

Benchmark                                 Mode  Cnt   Score   Error  Units
JMHSample_12_Forking.measure_1_c1         avgt    5   2.643 ± 0.033  ns/op
JMHSample_12_Forking.measure_2_c2         avgt    5  19.041 ± 0.387  ns/op
JMHSample_12_Forking.measure_3_c1_again   avgt    5  18.471 ± 0.262  ns/op
JMHSample_12_Forking.measure_4_forked_c1  avgt    5   4.465 ± 0.089  ns/op
JMHSample_12_Forking.measure_5_forked_c2  avgt    5   4.436 ± 0.050  ns/op

不对称测试

上面所有的基础测试都是对称的:所有的线程执行相同的代码。但是有时候我们也需要执行不对称测试。

——比如测试“读取——写入”场景时,读线程数通常高于写线程数量。JMH使用线程组来应对这种情形。

通过@Group注解,我们可以将多个方法组合在一起并且所有的线程都分布在这些方法上

为设置测试组,需要:

使用@Group(name)注解标记所有的测试方法,为同一个组中的所有测试设置相同的名称(否则这些测试将独立运行——没有任何警告提示!)
使用@GroupThreads(threadsNumber)注解标记每个测试,指定运行给定方法的线程数量。

JMH将启动给定组的所有@GroupThreads,并发运行相同实验中同一组的所有测试。组和每个方法的结果将单独给出

示例

@State(Scope.Group)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_15_Asymmetric {

	private AtomicInteger counter;
​	
	@Setup
	public void up() {
		counter = new AtomicInteger();
	}
	
	@Benchmark
	@Group("g")
	@GroupThreads(3)
	public int inc() {
		return counter.incrementAndGet();
	}
​	
	@Benchmark
	@Group("g")
	@GroupThreads(1)
	public int get() {
		return counter.get();
	}
}

输出结果

......
# Threads: 4 threads (1 group; 1x "get", 3x "inc" in each group), will synchronize iterations
......

# Run complete. Total time: 00:00:11

Benchmark                      Mode  Cnt   Score    Error  Units
JMHSample_15_Asymmetric.g      avgt    5  60.361 ±  1.180  ns/op
JMHSample_15_Asymmetric.g:get  avgt    5  21.648 ± 26.019  ns/op
JMHSample_15_Asymmetric.g:inc  avgt    5  73.266 ±  8.852  ns/op

编译器提示

可以为JIT提供关于如何使用测试程序中任何方法的提示。“任何方法”是指任何的方法——不仅仅是@GenerateMicroBenchmark注解的方法。使用@CompilerControl模式(还有更多模式,但是我不确定它们的有用程度):

  • CompilerControl.Mode.DONT_INLINE 该方法不能被内嵌。用于测量方法调用开销和评估是否该增加JVM的inline阈值
  • CompilerControl.Mode.INLINE 要求编译器内嵌该方法。通常与“Mode.DONT_INLINE“联合使用,检查内嵌的利弊。
  • CompilerControl.Mode.EXCLUDE 不编译该方法——解释它。在该JIT有多好的圣战中作为有用的参数:)

现在还不太明白,需要补下JIT的知识

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_16_CompilerControl {

	public void target_blank() {

	}
​	
	@CompilerControl(CompilerControl.Mode.DONT_INLINE)
	public void target_dontInline() {
		// this method was intentionally left blank
	}
​	
	@CompilerControl(CompilerControl.Mode.INLINE)
	public void target_inline() {
		// this method was intentionally left blank
	}
​	
	@CompilerControl(CompilerControl.Mode.EXCLUDE)
	public void target_exclude() {
		// this method was intentionally left blank
	}
​	
	/*

	 * These method measures the calls performance.

	 */

	@Benchmark
	public void baseline() {
		// this method was intentionally left blank
	}

	@Benchmark
	public void blank() {
		target_blank();
	}

	@Benchmark
	public void dontinline() {
		target_dontInline();
	}

	@Benchmark
	public void inline() {
		target_inline();
	}
​	
	@Benchmark
	public void exclude() {
		target_exclude();
	}

}

注解

通过注解指定JMH参数。这些注解用在类或者方法上。方法注解总是优先于类的注解。

  • @Fork 需要运行的试验(迭代集合)数量。每个试验运行在单独的JVM进程中。也可以指定(额外的)JVM参数。
  • @Measurement 提供真正的测试阶段参数。指定迭代的次数,每次迭代的运行时间和每次迭代测试调用的数量(通常使用@BenchmarkMode(Mode.SingleShotTime)测试一组操作的开销——而不使用循环)
  • @Warmup 与@Measurement相同,但是用于预热阶段
  • @Threads 该测试使用的线程数。默认是Runtime.getRuntime().availableProcessors()

示例

@State(Scope.Thread)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(1)
public class JMHSample_20_Annotations {

	double x1 = Math.PI;

	@Benchmark
	@Warmup(iterations = 5, time = 100, timeUnit = TimeUnit.MILLISECONDS)
	@Measurement(iterations = 5, time = 100, timeUnit = TimeUnit.MILLISECONDS)
	public double measure() {
		return Math.log(x1);
	}

	public static void main(String[] args) throws RunnerException {

		Options opt = new OptionsBuilder()
				.include(JMHSample_20_Annotations.class.getSimpleName())
				.build();
		new Runner(opt).run();
	}

}

CPU消耗

有时测试消耗一定CPU周期。通过静态的BlackHole.consumeCPU(tokens)方法来实现。Token是一些CPU指令。这样编写方法代码就可以达到运行时间依赖于该参数的目的(不被任何JIT/CPU优化)。

示例:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_21_ConsumeCPU {

	@Benchmark
	public void consume_0000() {
		Blackhole.consumeCPU(0);
	}
​	
	@Benchmark
	public void consume_0001() {
		Blackhole.consumeCPU(1);
	}
​	
	@Benchmark
	public void consume_0002() {
		Blackhole.consumeCPU(2);
	}
​	
	@Benchmark
	public void consume_0004() {
		Blackhole.consumeCPU(4);
	}
​	
	@Benchmark
	public void consume_0008() {
		Blackhole.consumeCPU(8);
	}

	@Benchmark
	public void consume_0016() {
		Blackhole.consumeCPU(16);
	}

	@Benchmark
	public void consume_0032() {
		Blackhole.consumeCPU(32);
	}

	@Benchmark
	public void consume_0064() {
		Blackhole.consumeCPU(64);
	}
​	
	@Benchmark
	public void consume_0128() {
		Blackhole.consumeCPU(128);
	}
​	
	@Benchmark
	public void consume_0256() {
		Blackhole.consumeCPU(256);
	}
​	
	@Benchmark
	public void consume_0512() {
		Blackhole.consumeCPU(512);
	}
​	
	@Benchmark
	public void consume_1024() {
		Blackhole.consumeCPU(1024);
	}

}

多线程——伪共享字段访问

你可能知道这样一个事实,大多数现代x86 CPU有64字节的cache line(缓存行)。CPU缓存提高了数据读取速率,但同时,如果你需要从多个线程同时读写两个邻近的字段,也会产生性能瓶颈。这种情况称为“伪共享”——字段似乎是独立访问的,但是实际上它们在硬件层面的相互竞争。

这个问题通常的解决方案是两边都增加至少128字节的虚拟数据。因为JVM可以将类的字段重排序,在相同的类内部增加可能不能正确运行。

更加健壮的方法是使用类层次——JVM通常将属于同一个类的字段放在一起。比如,定义类A有一个只读字段,类B继承类A且定义16个long字段,类C继承类B定义可写字段,最后类D继承类C定义另一个16个long字段——这就防止了被分配在下一个内存中对象的写变量竞争。

以防读写的字段类型相同,也可以使用两个数据位置相互距离很远的稀疏数组。在前面的情况中不要使用数组——它们是对象特定类型,仅需要增加4或8字节(取决于JVM设置)。

这个问题的另一种解决方法是如果你已经用到了Java 8:在可写字段上使用@sun.misc.Contended以及-XX:-RestrictContended的JVM设置。更多细节,参见Aleksey Shipilev的说明。

JMH是如何解决竞争字段访问的呢?它在两边都增加了@State对象,但是这并不能在单一对象内部对个别的字段增加——需要自己来处理。

不是很懂:JMHSample_22_FalseSharing

批处理

@Warmup(iterations = 5, batchSize = 5000)
@Measurement(iterations = 5, batchSize = 5000)

Batch size: number of benchmark method calls per operation

示例:

@State(Scope.Thread)
public class JMHSample_26_BatchSize {
​	
	List<String> list = new LinkedList<>();
​	
	@Benchmark
	@Warmup(iterations = 5, time = 1)
	@Measurement(iterations = 5, time = 1)
	@BenchmarkMode(Mode.AverageTime)
	public List<String> measureWrong_1() {
		list.add(list.size() / 2, "something");
		return list;
	}

	@Benchmark
	@Warmup(iterations = 5, time = 5)
	@Measurement(iterations = 5, time = 5)
	@BenchmarkMode(Mode.AverageTime)
	public List<String> measureWrong_5() {
		list.add(list.size() / 2, "something");
		return list;
	}

	@Benchmark
	@Warmup(iterations = 5, batchSize = 5000)
	@Measurement(iterations = 5, batchSize = 5000)
	@BenchmarkMode(Mode.SingleShotTime)
	public List<String> measureRight() {
		list.add(list.size() / 2, "something");
		return list;
	}

	@Setup(Level.Iteration)
	public void setup(){
		list.clear();
	}

}

多参数的测试运行

很多情况下测试代码包含多个参数集合。幸运的是,要测试不同参数集合时JMH不会要求写多个测试方法。或者准确来说,测试参数是基本类型,基本包装类型或者String时,JMH提供了解决方法。

程序需要完成:

定义@State对象
在其中定义所有的参数字段
每个字段都使用@Param注解

@Param注解使用String数组作为参数。这些字符串在任何@Setup方法被调用前转换为字段类型。然而,JMH文档中声称这些字段值在@Setup方法中不能被访问。

JMH使用所有@Param字段的输出结果。因此,如果第一个字段有2个参数,第二个字段有5个参数,测试将运行2 * 5 * Forks次。

示例

	@BenchmarkMode(Mode.AverageTime)
	@OutputTimeUnit(TimeUnit.NANOSECONDS)
	@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
	@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
	@Fork(1)
	@State(Scope.Benchmark)
	public class JMHSample_27_Params {
	
	    @Param({"1", "31", "65", "101", "103"})
	    public int arg;

	
	    @Param({"0", "1", "2", "4", "8", "16", "32"})
	    public int certainty;
​	
	    @Benchmark
	    public boolean bench() {
	        return BigInteger.valueOf(arg).isProbablePrime(certainty);
	    }
	}

还有很多JMH的功能目前用的少,也不是太懂,可以阅读官方提供的例子,以后用到再补充

JMeter

由于不知道怎么用JMH对Vert.x的异步方法做基准测试,所以对基于Vert.x实现的一些组件的基准测试使用JMeter测试

  1. 引入下列依赖 ```
org.apache.jmeterApacheJMeter_core3.2 org.apache.jmeterApacheJMeter_java3.2
2. 实现JavaSamplerClient接口,可以继承AbstractJavaSamplerClient

DEMO1

public class AddNumberTest extends AbstractJavaSamplerClient { public SampleResult runTest(JavaSamplerContext javaSamplerContext) { String var1 = javaSamplerContext.getParameter(“var1”); String var2 = javaSamplerContext.getParameter(“var2”); SampleResult result = new SampleResult(); result.sampleStart(); result.setSampleLabel(“Test Sample”); // Test Code

AddNumbers addNumbers = new AddNumbers();
if (addNumbers.addTwoNumbers(Integer.valueOf(var1), Integer.valueOf(var2)) == 2) {
  result.sampleEnd();
  result.setResponseCode("200");
  result.setResponseMessage("OK");
  result.setSuccessful(true);
} else {
  result.sampleEnd();
  result.setResponseCode("500");
  result.setResponseMessage("NOK");
  result.setSuccessful(false);
}
return result;

}

@Override public Arguments getDefaultParameters() { Arguments defaultParameters = new Arguments(); defaultParameters.addArgument(“var1”, “1”); defaultParameters.addArgument(“var2”, “2”); return defaultParameters; }

}

DEMO2

public class JavaRequestSamplerDemo extends AbstractJavaSamplerClient {

public SampleResult runTest(JavaSamplerContext ctx) { JMeterVariables vars = JMeterContextService.getContext().getVariables(); vars.put(“demo”, “demoVariableContent”);

SampleResult sampleResult = new SampleResult(); sampleResult.setSuccessful(true); sampleResult.setResponseCodeOK(); sampleResult.setResponseData(“haha”.getBytes()); sampleResult.setResponseMessageOK(); return sampleResult; } } ```

  1. 将工程打包后将jar放入 jmeter/lib/ext之后创建java请求的取样,进行测试

参考资料

http://www.importnew.com/12548.html

http://java-performance.info/jmh/

http://java-performance.info/introduction-jmh-profilers/

http://blog.dyngr.com/blog/2016/10/29/introduction-of-jmh/

Edgar

Edgar
一个略懂Java的小菜比