覚えたら書く

IT関係のデベロッパとして日々覚えたことを書き残したいです。 twitter: @yyoshikaw

JMH - 複数スレッドで実行した場合のベンチマーク

JMHでは、デフォルトではベンチマークの対象の処理を1スレッドで実行して計測を行いますが、
オプションでスレッド数を指定することができます。
そうすることで複数スレッドで実行した場合のベンチ―マークを取得することができます。

スレッド数はコード上は以下で指定可能です

  • @Threadsアノテーションのパラメータ
  • OptionsBuilder#threadsメソッド

以下では、スレッド数=1, 2, 4の場合で計測を行い、どのようにスループットの値が変わるかを確認します。
(以下の各種結果は実行環境に依存する部分もありますので値は参考値と考えてください)


スレッド数=1の場合

HashTableConcurrentHashMapのgetメソッドの性能を確認しています。(その他のスレッド数の場合も同様の計測を行っています)

■サンプルコード

スレッド数は明示的に指定しない場合はスレッド数=1での計測が行われます

import java.util.Hashtable;
import java.util.concurrent.ConcurrentHashMap;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@Warmup(iterations=15)
@BenchmarkMode(Mode.Throughput)
@Fork(1)
public class MapBenchmark1 {

    private static final Hashtable<Integer, String> hashTable = new Hashtable<>();

    private static final ConcurrentHashMap<Integer, String> conHashMap = new ConcurrentHashMap<>();

    static {
        hashTable.put(1, "val1");
        hashTable.put(2, "val2");
        hashTable.put(3, "val3");
        hashTable.put(4, "val4");
        hashTable.put(5, "val5");

        conHashMap.put(1, "val1");
        conHashMap.put(2, "val2");
        conHashMap.put(3, "val3");
        conHashMap.put(4, "val4");
        conHashMap.put(5, "val5");
    }

    @Benchmark
    public void hashTable_get() {
        String a = hashTable.get(1);
        String b = hashTable.get(5);
        String c = hashTable.get(4);
    }

    @Benchmark
    public void concurrentHashMap_get() {
        String a = conHashMap.get(1);
        String b = conHashMap.get(5);
        String c = conHashMap.get(4);
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(MapBenchmark1.class.getName())
                .build();
        new Runner(opt).run();
    }
}

■実行結果

(一部省略)

# Warmup: 15 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.jmh.sample.thread.MapBenchmark1.concurrentHashMap_get

# Run progress: 0.00% complete, ETA 00:01:10
# Fork: 1 of 1
# Warmup Iteration   1: 12125191.678 ops/s
# Warmup Iteration   2: 11969736.770 ops/s
# Warmup Iteration   3: 14488469.529 ops/s
# Warmup Iteration   4: 14212100.297 ops/s
# Warmup Iteration   5: 14825562.172 ops/s
# Warmup Iteration   6: 14721540.699 ops/s
# Warmup Iteration   7: 14504045.791 ops/s
# Warmup Iteration   8: 14564234.119 ops/s
# Warmup Iteration   9: 14693212.129 ops/s
# Warmup Iteration  10: 14612430.190 ops/s
# Warmup Iteration  11: 14736692.239 ops/s
# Warmup Iteration  12: 14696557.259 ops/s
# Warmup Iteration  13: 14503798.052 ops/s
# Warmup Iteration  14: 14720710.471 ops/s
# Warmup Iteration  15: 14411043.296 ops/s
Iteration   1: 14465114.978 ops/s
Iteration   2: 14433028.262 ops/s
Iteration   3: 14427684.718 ops/s
Iteration   4: 14682441.253 ops/s
Iteration   5: 14515157.706 ops/s
Iteration   6: 14577843.337 ops/s
Iteration   7: 14845393.937 ops/s
Iteration   8: 14660424.677 ops/s
Iteration   9: 14609385.208 ops/s
Iteration  10: 14818804.469 ops/s
Iteration  11: 14549040.411 ops/s
Iteration  12: 14258251.971 ops/s
Iteration  13: 14413074.131 ops/s
Iteration  14: 14629134.922 ops/s
Iteration  15: 14607855.668 ops/s
Iteration  16: 14559264.588 ops/s
Iteration  17: 14641362.342 ops/s
Iteration  18: 14281674.626 ops/s
Iteration  19: 14291162.857 ops/s
Iteration  20: 13864847.804 ops/s


Result "concurrentHashMap_get":
  14506547.393 ±(99.9%) 191410.592 ops/s [Average]
  (min, avg, max) = (13864847.804, 14506547.393, 14845393.937), stdev = 220428.723
  CI (99.9%): [14315136.801, 14697957.985] (assumes normal distribution)


# Warmup: 15 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.jmh.sample.thread.MapBenchmark1.hashTable_get

# Run progress: 50.00% complete, ETA 00:00:36
# Fork: 1 of 1
# Warmup Iteration   1: 4651673.639 ops/s
# Warmup Iteration   2: 4669567.411 ops/s
# Warmup Iteration   3: 5526670.552 ops/s
# Warmup Iteration   4: 5089100.502 ops/s
# Warmup Iteration   5: 6073023.421 ops/s
# Warmup Iteration   6: 6156319.643 ops/s
# Warmup Iteration   7: 6497317.995 ops/s
# Warmup Iteration   8: 6366948.162 ops/s
# Warmup Iteration   9: 6500571.758 ops/s
# Warmup Iteration  10: 6521461.256 ops/s
# Warmup Iteration  11: 6545226.404 ops/s
# Warmup Iteration  12: 6503068.601 ops/s
# Warmup Iteration  13: 6542618.657 ops/s
# Warmup Iteration  14: 6583631.950 ops/s
# Warmup Iteration  15: 6567973.309 ops/s
Iteration   1: 6487030.869 ops/s
Iteration   2: 6478218.016 ops/s
Iteration   3: 6509613.038 ops/s
Iteration   4: 6536562.548 ops/s
Iteration   5: 6540437.657 ops/s
Iteration   6: 6519850.548 ops/s
Iteration   7: 6493737.236 ops/s
Iteration   8: 6481500.279 ops/s
Iteration   9: 6403582.954 ops/s
Iteration  10: 4189113.278 ops/s
Iteration  11: 5425486.716 ops/s
Iteration  12: 5607032.339 ops/s
Iteration  13: 5122803.521 ops/s
Iteration  14: 4855234.511 ops/s
Iteration  15: 5576280.748 ops/s
Iteration  16: 6109519.679 ops/s
Iteration  17: 5766541.700 ops/s
Iteration  18: 5712743.131 ops/s
Iteration  19: 5773255.631 ops/s
Iteration  20: 5771087.033 ops/s


Result "hashTable_get":
  5917981.572 ±(99.9%) 575860.877 ops/s [Average]
  (min, avg, max) = (4189113.278, 5917981.572, 6540437.657), stdev = 663162.243
  CI (99.9%): [5342120.694, 6493842.449] (assumes normal distribution)


# Run complete. Total time: 00:01:13

Benchmark                             Mode  Cnt         Score        Error  Units
MapBenchmark1.concurrentHashMap_get  thrpt   20  14506547.393 ± 191410.592  ops/s
MapBenchmark1.hashTable_get          thrpt   20   5917981.572 ± 575860.877  ops/s

HashTableに比べてConcurrentHashMapの方が高いスループットとなっていることが分かります。


スレッド数=2の場合

@Threadsアノテーションでパラメータに2を指定した場合の動きの確認です

■サンプルコード

import java.util.Hashtable;
import java.util.concurrent.ConcurrentHashMap;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@Warmup(iterations=15)
@BenchmarkMode(Mode.Throughput)
@Fork(1)
@Threads(2)
public class MapBenchmark2 {

    private static final Hashtable<Integer, String> hashTable = new Hashtable<>();

    private static final ConcurrentHashMap<Integer, String> conHashMap = new ConcurrentHashMap<>();

    static {
        hashTable.put(1, "val1");
        hashTable.put(2, "val2");
        hashTable.put(3, "val3");
        hashTable.put(4, "val4");
        hashTable.put(5, "val5");

        conHashMap.put(1, "val1");
        conHashMap.put(2, "val2");
        conHashMap.put(3, "val3");
        conHashMap.put(4, "val4");
        conHashMap.put(5, "val5");
    }

    @Benchmark
    public void hashTable_get() {
        String a = hashTable.get(1);
        String b = hashTable.get(5);
        String c = hashTable.get(4);
    }

    @Benchmark
    public void concurrentHashMap_get() {
        String a = conHashMap.get(1);
        String b = conHashMap.get(5);
        String c = conHashMap.get(4);
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(MapBenchmark2.class.getName())
                .build();
        new Runner(opt).run();
    }
}

■実行結果

(一部省略)

# Warmup: 15 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.jmh.sample.thread.MapBenchmark2.concurrentHashMap_get

# Run progress: 0.00% complete, ETA 00:01:10
# Fork: 1 of 1
# Warmup Iteration   1: 25842568.058 ops/s
# Warmup Iteration   2: 25021723.019 ops/s
# Warmup Iteration   3: 29500516.838 ops/s
# Warmup Iteration   4: 29670410.244 ops/s
# Warmup Iteration   5: 29643449.178 ops/s
# Warmup Iteration   6: 29461010.450 ops/s
# Warmup Iteration   7: 29650859.923 ops/s
# Warmup Iteration   8: 29687691.806 ops/s
# Warmup Iteration   9: 29674896.671 ops/s
# Warmup Iteration  10: 29639584.302 ops/s
# Warmup Iteration  11: 29692633.414 ops/s
# Warmup Iteration  12: 29696576.470 ops/s
# Warmup Iteration  13: 29668948.784 ops/s
# Warmup Iteration  14: 29679513.397 ops/s
# Warmup Iteration  15: 29674566.733 ops/s
Iteration   1: 29683765.547 ops/s
Iteration   2: 29661471.751 ops/s
Iteration   3: 29643261.462 ops/s
Iteration   4: 29689456.217 ops/s
Iteration   5: 29638663.633 ops/s
Iteration   6: 29704778.940 ops/s
Iteration   7: 29685274.386 ops/s
Iteration   8: 29691408.573 ops/s
Iteration   9: 29648490.388 ops/s
Iteration  10: 29666833.905 ops/s
Iteration  11: 29668629.507 ops/s
Iteration  12: 29730650.465 ops/s
Iteration  13: 29662059.179 ops/s
Iteration  14: 29686658.486 ops/s
Iteration  15: 29654433.215 ops/s
Iteration  16: 29679599.362 ops/s
Iteration  17: 29705428.564 ops/s
Iteration  18: 29667339.827 ops/s
Iteration  19: 29680417.218 ops/s
Iteration  20: 29685491.049 ops/s


Result "concurrentHashMap_get":
  29676705.584 ±(99.9%) 19699.510 ops/s [Average]
  (min, avg, max) = (29638663.633, 29676705.584, 29730650.465), stdev = 22685.985
  CI (99.9%): [29657006.074, 29696405.094] (assumes normal distribution)


# Warmup: 15 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.jmh.sample.thread.MapBenchmark2.hashTable_get

# Run progress: 50.00% complete, ETA 00:00:36
# Fork: 1 of 1
# Warmup Iteration   1: 1873131.787 ops/s
# Warmup Iteration   2: 2018130.843 ops/s
# Warmup Iteration   3: 1687737.525 ops/s
# Warmup Iteration   4: 1686080.405 ops/s
# Warmup Iteration   5: 1676040.733 ops/s
# Warmup Iteration   6: 1685789.811 ops/s
# Warmup Iteration   7: 1678814.113 ops/s
# Warmup Iteration   8: 1681282.310 ops/s
# Warmup Iteration   9: 1680023.009 ops/s
# Warmup Iteration  10: 1688306.188 ops/s
# Warmup Iteration  11: 1707166.628 ops/s
# Warmup Iteration  12: 1687699.170 ops/s
# Warmup Iteration  13: 1672897.551 ops/s
# Warmup Iteration  14: 1678347.343 ops/s
# Warmup Iteration  15: 1672376.495 ops/s
Iteration   1: 1677785.701 ops/s
Iteration   2: 1670938.686 ops/s
Iteration   3: 1691468.097 ops/s
Iteration   4: 1694764.807 ops/s
Iteration   5: 1683808.326 ops/s
Iteration   6: 1684295.849 ops/s
Iteration   7: 1679099.811 ops/s
Iteration   8: 1682863.892 ops/s
Iteration   9: 1684642.390 ops/s
Iteration  10: 1685923.050 ops/s
Iteration  11: 1687660.431 ops/s
Iteration  12: 1703005.425 ops/s
Iteration  13: 1685495.190 ops/s
Iteration  14: 1701577.471 ops/s
Iteration  15: 1685312.560 ops/s
Iteration  16: 1673806.303 ops/s
Iteration  17: 1665778.773 ops/s
Iteration  18: 1670327.254 ops/s
Iteration  19: 1676521.247 ops/s
Iteration  20: 1677474.186 ops/s


Result "hashTable_get":
  1683127.472 ±(99.9%) 8461.572 ops/s [Average]
  (min, avg, max) = (1665778.773, 1683127.472, 1703005.425), stdev = 9744.358
  CI (99.9%): [1674665.901, 1691589.044] (assumes normal distribution)


# Run complete. Total time: 00:01:12

Benchmark                             Mode  Cnt         Score       Error  Units
MapBenchmark2.concurrentHashMap_get  thrpt   20  29676705.584 ± 19699.510  ops/s
MapBenchmark2.hashTable_get          thrpt   20   1683127.472 ±  8461.572  ops/s

2スレッドで処理していることによりHashTable, ConcurrentHashMapいずれも1スレッドの場合に比べてスループットの値が良くなっていることが分かります


スレッド数=4の場合

@Threadsアノテーションでパラメータに4を指定した場合の動きの確認です

■サンプルコード

import java.util.Hashtable;
import java.util.concurrent.ConcurrentHashMap;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@Warmup(iterations=15)
@BenchmarkMode(Mode.Throughput)
@Fork(1)
@Threads(4)
public class MapBenchmark4 {

    private static final Hashtable<Integer, String> hashTable = new Hashtable<>();

    private static final ConcurrentHashMap<Integer, String> conHashMap = new ConcurrentHashMap<>();

    static {
        hashTable.put(1, "val1");
        hashTable.put(2, "val2");
        hashTable.put(3, "val3");
        hashTable.put(4, "val4");
        hashTable.put(5, "val5");

        conHashMap.put(1, "val1");
        conHashMap.put(2, "val2");
        conHashMap.put(3, "val3");
        conHashMap.put(4, "val4");
        conHashMap.put(5, "val5");
    }

    @Benchmark
    public void hashTable_get() {
        String a = hashTable.get(1);
        String b = hashTable.get(5);
        String c = hashTable.get(4);
    }

    @Benchmark
    public void concurrentHashMap_get() {
        String a = conHashMap.get(1);
        String b = conHashMap.get(5);
        String c = conHashMap.get(4);
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(MapBenchmark4.class.getName())
                .build();
        new Runner(opt).run();
    }
}

■実行結果

(一部省略)

# Warmup: 15 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.jmh.sample.thread.MapBenchmark4.concurrentHashMap_get

# Run progress: 0.00% complete, ETA 00:01:10
# Fork: 1 of 1
# Warmup Iteration   1: 54437219.421 ops/s
# Warmup Iteration   2: 49592081.728 ops/s
# Warmup Iteration   3: 55875981.260 ops/s
# Warmup Iteration   4: 55954051.829 ops/s
# Warmup Iteration   5: 56072978.542 ops/s
# Warmup Iteration   6: 56156417.579 ops/s
# Warmup Iteration   7: 56054470.410 ops/s
# Warmup Iteration   8: 54810295.106 ops/s
# Warmup Iteration   9: 55497137.594 ops/s
# Warmup Iteration  10: 56120932.145 ops/s
# Warmup Iteration  11: 55851336.195 ops/s
# Warmup Iteration  12: 55529829.860 ops/s
# Warmup Iteration  13: 56084509.179 ops/s
# Warmup Iteration  14: 55767644.626 ops/s
# Warmup Iteration  15: 55925342.323 ops/s
Iteration   1: 56153722.127 ops/s
Iteration   2: 56469107.240 ops/s
Iteration   3: 57092804.712 ops/s
Iteration   4: 58138097.112 ops/s
Iteration   5: 58538012.388 ops/s
Iteration   6: 58414872.033 ops/s
Iteration   7: 57982101.349 ops/s
Iteration   8: 58110952.108 ops/s
Iteration   9: 58521465.645 ops/s
Iteration  10: 58548385.133 ops/s
Iteration  11: 58197851.577 ops/s
Iteration  12: 58047931.198 ops/s
Iteration  13: 58466481.895 ops/s
Iteration  14: 57998124.405 ops/s
Iteration  15: 58039137.803 ops/s
Iteration  16: 58245449.901 ops/s
Iteration  17: 58135760.113 ops/s
Iteration  18: 58334737.109 ops/s
Iteration  19: 58336232.012 ops/s
Iteration  20: 58422634.477 ops/s


Result "concurrentHashMap_get":
  58009693.017 ±(99.9%) 576773.668 ops/s [Average]
  (min, avg, max) = (56153722.127, 58009693.017, 58548385.133), stdev = 664213.414
  CI (99.9%): [57432919.349, 58586466.685] (assumes normal distribution)


# Warmup: 15 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.jmh.sample.thread.MapBenchmark4.hashTable_get

# Run progress: 50.00% complete, ETA 00:00:36
# Fork: 1 of 1
# Warmup Iteration   1: 1689978.836 ops/s
# Warmup Iteration   2: 1713131.149 ops/s
# Warmup Iteration   3: 1683767.408 ops/s
# Warmup Iteration   4: 1653290.060 ops/s
# Warmup Iteration   5: 1663497.061 ops/s
# Warmup Iteration   6: 1666790.611 ops/s
# Warmup Iteration   7: 1657101.840 ops/s
# Warmup Iteration   8: 1670162.562 ops/s
# Warmup Iteration   9: 1653576.564 ops/s
# Warmup Iteration  10: 1661106.987 ops/s
# Warmup Iteration  11: 1664045.958 ops/s
# Warmup Iteration  12: 1651216.654 ops/s
# Warmup Iteration  13: 1664788.959 ops/s
# Warmup Iteration  14: 1665190.619 ops/s
# Warmup Iteration  15: 1654041.929 ops/s
Iteration   1: 1670081.075 ops/s
Iteration   2: 1665161.159 ops/s
Iteration   3: 1663520.877 ops/s
Iteration   4: 1647759.509 ops/s
Iteration   5: 1664505.883 ops/s
Iteration   6: 1655435.675 ops/s
Iteration   7: 1663570.133 ops/s
Iteration   8: 1663732.140 ops/s
Iteration   9: 1654726.280 ops/s
Iteration  10: 1653693.726 ops/s
Iteration  11: 1696082.458 ops/s
Iteration  12: 1712846.993 ops/s
Iteration  13: 1659906.755 ops/s
Iteration  14: 1659803.498 ops/s
Iteration  15: 1670639.238 ops/s
Iteration  16: 1656648.491 ops/s
Iteration  17: 1665964.611 ops/s
Iteration  18: 1659343.337 ops/s
Iteration  19: 1671318.989 ops/s
Iteration  20: 1656193.931 ops/s


Result "hashTable_get":
  1665546.738 ±(99.9%) 12923.470 ops/s [Average]
  (min, avg, max) = (1647759.509, 1665546.738, 1712846.993), stdev = 14882.687
  CI (99.9%): [1652623.268, 1678470.208] (assumes normal distribution)


# Run complete. Total time: 00:01:13

Benchmark                             Mode  Cnt         Score        Error  Units
MapBenchmark4.concurrentHashMap_get  thrpt   20  58009693.017 ± 576773.668  ops/s
MapBenchmark4.hashTable_get          thrpt   20   1665546.738 ±  12923.470  ops/s

2スレッドの場合に比べてConcurrentHashMapのスループットの値はさらに良くなっていますが、
HashTableは2スレッドの場合とスループットの値がほとんど変わらず頭打ちの状態になっていることが分かります。



関連エントリ