Arthas 偶现 ByteBuf LEAK

Arthas 偶现 ByteBuf LEAK

背景

上一篇文章中,我们遇到了YGC的问题,排查过程中使用了阿里的arthas,排查过程中我们在arthas的日志中偶然发现了一些有意思的日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
01 2019-05-10 11:21:00.968 ERROR [nioEventLoopGroup-2-2:i.n.u.ResourceLeakDetector] LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records:
Created at:
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331)
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137)
io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147)
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:628)
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:563)
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.lang.Thread.run(Thread.java:745)

显示有泄露。当时这个问题只是记录了一下,本周开始排查。

排查过程

Netty有泄露级别,我们可以通过调整泄露级别来追踪我们每个ByteBuf的申请和释放情况。这里把我们的级别调整为最高,追踪每一个ByteBuf,记录每一个使用过程。

压测过后发现很奇怪的是,我们的log中并没有搜索到LEAK的内容。考虑到我们使用了arthas,所以开启arthas随便抓了几个类的调用情况,重新看log,还是没发现任何异常的地方。

系统log中没有,再看下arthas的日志,发现果然有问题:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
2019-05-14 11:31:52.545 ERROR [nioEventLoopGroup-2-4:i.n.u.ResourceLeakDetector] LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records:
#1:
io.netty.buffer.AdvancedLeakAwareByteBuf.getBytes(AdvancedLeakAwareByteBuf.java:244)
io.termd.core.telnet.netty.TelnetChannelHandler.channelRead(TelnetChannelHandler.java:45)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:628)
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:563)
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.lang.Thread.run(Thread.java:745)
#2:
Hint: 'TelnetChannelHandler#0' will handle the message from this point.
io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:116)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:628)
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:563)
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.lang.Thread.run(Thread.java:745)
#3:
Hint: 'DefaultChannelPipeline$HeadContext#0' will handle the message from this point.
io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:116)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:628)
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:563)
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.lang.Thread.run(Thread.java:745)
#4:
io.netty.buffer.AdvancedLeakAwareByteBuf.writeBytes(AdvancedLeakAwareByteBuf.java:634)
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:347)
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:628)
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:563)
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.lang.Thread.run(Thread.java:745)
Created at:
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331)
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137)
io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147)
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:628)
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:563)
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.lang.Thread.run(Thread.java:745)

由于Netty泄露检测是全局参数,所以也把arthas内置的Netty的泄露检测开启了,这里我们发现一个奇怪的类:io.termd.core.telnet.netty.TelnetChannelHandler,这个并不是Netty自己的类,google一把发现是alibaba/termd中的一个类,看下代码:

1
2
3
4
5
6
7
public void channelRead(ChannelHandlerContext ctx, Object msg) {
ByteBuf buf = (ByteBuf) msg;
int size = buf.readableBytes();
byte[] data = new byte[size];
buf.getBytes(0, data);
conn.receive(data);
}

channelRead里,分配的ByteBuf既没有释放,也没有在Pipeline上继续传递。导致这个ByteBuf申请之后没有release

修改一下:

1
2
3
4
5
6
7
8
9
10
11
public void channelRead(ChannelHandlerContext ctx, Object msg) {
ByteBuf buf = (ByteBuf) msg;
try {
int size = buf.readableBytes();
byte[] data = new byte[size];
buf.getBytes(0, data);
conn.receive(data);
} finally {
buf.release();
}
}

排查过程中收获的一些知识点

  1. Netty默认使用的是池化内存,且是直接内存。如果是android系统,默认使用的是非池化的ByteBuf
  2. TelnetChannelHandler继承自ChannelInboundHandlerAdapter,这个类的是不会自动释放ByteBuf的,而SimpleChannelInboundHandler是自动释放ByteBuf的,继承SimpleChannelInboundHandler需要实现channelRead0方法,看下SimpleChannelInboundHandler的代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
boolean release = true;
try {
if (acceptInboundMessage(msg)) {
@SuppressWarnings("unchecked")
I imsg = (I) msg;
channelRead0(ctx, imsg);
} else {
release = false;
ctx.fireChannelRead(msg);
}
} finally {
if (autoRelease && release) {
ReferenceCountUtil.release(msg);
}
}
}

被处理过的msg都会在finally里释放(默认autoRelease = true)。

  1. Netty自带的编解码器,是会自动释放ByteBuf的,比如MessageToByteEncoderLengthFieldBasedFrameDecoder
  2. Pipeline的尾节点TailContext也会释放ByteBuf,有些需要释放的ByteBuf我们也可以通过fireChannelRead让它继续走到Pipeline的尾节点中。