序列化benchmark——Kryo、Hessian和jackson

序列化benchmark——Kryo、Hessian和jackson

前言

发现云音乐很多地方使用jackson作为默认序列化方式,最近想做一个benchmark来验证一下效果,包括调优一些问题。我的机器是mac,4核16G的配置。

  • kryo: 3.0.3
  • hessian: 4.0.35
  • jackson: 2.6.5

benchmark主代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class SerializeBenchmark {

@State(Scope.Thread)
public static class Person implements Serializable {
// fields...
}

private Kryo3MetaSerializer kryo3MetaSerializer = new Kryo3MetaSerializer();

private Hessian2MetaSerializer hessian2MetaSerializer = new Hessian2MetaSerializer();

private JacksonMetaSerializer jacksonMetaSerializer = new JacksonMetaSerializer();

@Benchmark
public void kryo(Person p) {
try {
byte[] bytes = kryo3MetaSerializer.toBytes(p);
kryo3MetaSerializer.fromBytes(bytes, Person.class);
} catch (Exception e) {
e.printStackTrace();
}
}

@Benchmark
public void hessian(Person p) {
try {
byte[] bytes = hessian2MetaSerializer.toBytes(p);
hessian2MetaSerializer.fromBytes(bytes, Person.class);
} catch (Exception e) {
e.printStackTrace();
}
}

@Benchmark
public void jackson(Person p) {
try {
byte[] bytes = jacksonMetaSerializer.toBytes(p);
jacksonMetaSerializer.fromBytes(bytes, Person.class);
} catch (Exception e) {
e.printStackTrace();
}
}

public static void main(String[] args) throws Exception {
Options opt = new OptionsBuilder()
.include(SerializeBenchmark.class.getSimpleName())
.forks(2)
.warmupIterations(10)
.measurementIterations(10)
.build();

new Runner(opt).run();
}
}

主代码比较简单,这里xxxMetaSerializer是已经封装好的实现类,里面的代码就不贴了。这里是fork两个线程去做测试,预热和measure都是十轮。

小类

我们先看小类的吞吐量测试,这里的meta类如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
@State(Scope.Thread)
public static class Person implements Serializable {

private static final long serialVersionUID = -6378930680486048168L;

String name = "ziyuan";

String address = "网商路599号网易大厦";

Integer age = 25;

List<Integer> phoneNumber = Arrays.asList(1, 8, 7, 1, 8, 3, 1, 7, 9, 8, 2);

Boolean marry = false;

Long[] gone = new Long[]{100L, 998456L, Long.MAX_VALUE, (long) Integer.MAX_VALUE};
}

注意这里我们写set/get方法,测试的时候需要加上。测试结果如下:

1
2
3
4
Benchmark                    Mode  Cnt       Score      Error  Units
SerializeBenchmark.hessian thrpt 20 18316.590 ± 151.534 ops/s
SerializeBenchmark.jackson thrpt 20 175164.888 ± 9199.818 ops/s
SerializeBenchmark.kryo thrpt 20 157208.428 ± 6399.210 ops/s

这里测试的是吞吐量注意,score越大越好,可以看到,jackson的吞吐量最大。kryo其次,hessian最差。

中等大小类

中等类,这里出现了一层嵌套,meta对象:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
public static class Company implements Serializable {

private static final long serialVersionUID = -6453017601947707917L;

String n = "杭州网易云音乐";

String phone = "62804567";

Integer colleage = 25000;

Map<String, Integer> mm = new ConcurrentHashMap<>();

{
mm.put("liujia", 28);
mm.put("taoge", 30);
mm.put("qiandahua", 27);
}

}

@State(Scope.Thread)
public static class Person implements Serializable {

private static final long serialVersionUID = -6378930680486048168L;

Company company = new Company();

String name = "ziyuan";

String address = "网商路599号网易大厦";

Integer age = 25;

List<Integer> phoneNumber = Arrays.asList(1, 8, 7, 1, 8, 3, 1, 7, 9, 8, 2);

Boolean marry = false;

Map<String, String> colleague = new HashMap<>();

Long[] gone = new Long[]{100L, 998456L, Long.MAX_VALUE, (long) Integer.MAX_VALUE};

Set<Float> floats = new ConcurrentHashSet<>();

{
colleague.put("1", "xingtaoabc");
colleague.put("2", "yuanhuaabc");
colleague.put("3", "dezhiabc");
colleague.put("4", "niaanajieabc");

floats.add(213132.21312F);
floats.add(789993132.21312F);
floats.add(213134652.211312F);
floats.add(3.15926535F);
floats.add(76542.98789F);
}

}

测试结果如下:

1
2
3
4
Benchmark                    Mode  Cnt      Score      Error  Units
SerializeBenchmark.hessian thrpt 20 15251.806 ± 757.440 ops/s
SerializeBenchmark.jackson thrpt 20 84019.107 ± 4498.575 ops/s
SerializeBenchmark.kryo thrpt 20 83959.289 ± 4803.940 ops/s

这里看到,jacksonkryo已经不相上下,hessian依然很差,但是这里我们发现一个有意思的地方,比较之下,jacksonkryo在对象复杂度增加的情况下,吞吐量几乎折半,但是hessian依然非常稳定。

复杂类

meta类:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
public static class City implements Serializable {

City() {
}

City(String name) {
this.name = name;
}

String name = "hangzhou";

String addr = "中国浙江省杭州市";

Integer personNum = 5000000;

Integer youbian = 31000;

Queue<Double> q = new LinkedBlockingDeque<>();

{
q.add(3.14159);
q.add(128784523.8787);
q.add(899.8787);
q.add(98.6892223);
}
}

public static class Company implements Serializable {

private static final long serialVersionUID = -6453017601947707917L;

City c1 = new City();

City c2 = new City();

String n = "杭州网易云音乐";

String phone = "62804567";

Integer colleage = 25000;

Map<String, Integer> mm = new ConcurrentHashMap<>();

{
mm.put("liujia", 28);
mm.put("taoge", 30);
mm.put("qiandahua", 27);
}
}

@State(Scope.Thread)
public static class Person implements Serializable {

private static final long serialVersionUID = -6378930680486048168L;

Company company = new Company();

String name = "ziyuan";

String address = "网商路599号网易大厦";

Integer age = 25;

List<Integer> phoneNumber = Arrays.asList(1, 8, 6, 5, 8, 2, 0, 7, 3, 8, 4);

Boolean marry = false;

Map<String, String> colleague = new HashMap<>();

Map<String, City> cities = new HashMap<>();

Long[] gone = new Long[]{100L, 998456L, Long.MAX_VALUE, (long) Integer.MAX_VALUE};

Set<Float> floats = new ConcurrentHashSet<>();

Map<Float, City> cityMap = new ConcurrentHashMap<>();

{
colleague.put("1", "jiangxignabc");
colleague.put("2", "yuanhuaabc");
colleague.put("3", "dezhiabc");
colleague.put("4", "niaanjieabc");

floats.add(213132.21312F);
floats.add(789993132.21312F);
floats.add(213134652.211312F);
floats.add(3.15926535F);
floats.add(76542.98789F);

cities.put("hangzhou", new City());
cities.put("shanghai", new City());
cities.put("beijing", new City());

cityMap.put(23567.1234F, new City("heilongjiang"));
cityMap.put(2.31415926535F, new City("yantai"));
cityMap.put(128.132134F, new City("heibeilangfang"));
}
}

测试结果如下:

1
2
3
4
Benchmark                    Mode  Cnt      Score      Error  Units
SerializeBenchmark.hessian thrpt 20 11704.161 ± 587.963 ops/s
SerializeBenchmark.jackson thrpt 20 24553.787 ± 1291.997 ops/s
SerializeBenchmark.kryo thrpt 20 34815.832 ± 1958.810 ops/s

这里可以看到,jacksonkryo还在继续跳楼式下跌。hessian则依然坚挺,吞吐量下降比例很低。

结论

按照这个趋势继续发展下去,我们可以得出一个简单的结论:

  • jackson:作为文本序列化方式,在小对象的时候性能表现不错,随着对象越来越复杂,吞吐量下降非常明显。
  • kryo:表现不错,但是依然不够稳定。
  • hessian:吞吐量最稳定,但是整体吞吐量不高。

注意这里我们并不能断言一定使用hessian或者kryo,序列化除了考虑序列化/反序列化的效率,还需要考虑序列化后数据的大小,这个会影响传输效率。但是至少从序列化/反序列化的表现来看,hessian还是一个非常不错的选择。

另外,这里并没有针对某种序列化方式做特定优化,比如kryo可以用registry class的方式来提升序列化效率,但是本质上registry class在分布式场景下会出现bug,这里不展开说了。所以这里给出的benchmark是比较纯粹的结果。