Another day in the operations arena. Everything was working correctly, and performance counters had great values. CPU usage was near 0%… system idle. Suddenly a phone call and an angry voice from the other side. What??? The system seems down???

Introducing the case

The most interesting fact here was that there was no activity in the analyzed process. Normally, this should indicate a lock issue, but this behavior was lasting a lot of time. Well… perhaps a deadlock

0:000> .time
Debug session time: Mon Dec 21 10:25:58.000 2015 (UTC + 1:00)
System Uptime: 16 days 23:18:00.314
Process Uptime: 0 days 0:25:00.000
  Kernel time: 0 days 0:03:00.000
  User time: 0 days 0:14:52.000
0:000> .load C:\Windows\Microsoft.NET\Framework64\v2.0.50727\SOS.dll
0:000> !threadpool
CPU utilization 0%
Worker Thread: Total: 8 Running: 1 Idle: 7 MaxLimit: 2000 MinLimit: 8
Work Request in Queue: 0
--------------------------------------
Number of Timers: 3
--------------------------------------
Completion Port Thread:Total: 159 Free: 2 MaxFree: 16 CurrentLimit: 160 MaxLimit: 1000 MinLimit: 8

The CPU was on vacation, as you can see

The first thing to do when you suspect threads are locked, is to inspect all the “lock tree”.

0:000> !SyncBlk
Index         SyncBlock MonitorHeld Recursion Owning Thread Info          SyncBlock Owner
 7748 000000000913e308            1         1 0000000009464cd0  1f74  79   000000014033da70 System.Object
		Waiting threads:
 8591 0000000008cd3a40            1         1 000000000539bbc0  1628  25   0000000100544688 System.Object
		Waiting threads:
 8655 0000000008c7ea10            1         1 00000000092c0c40   df4 127   00000000e039e980 System.Object
		Waiting threads:
 8815 0000000008be1e00            1         1 000000000dfff8f0  1464  88   0000000100497100 System.Object
		Waiting threads:
 9313 0000000008fe1f48            1         1 000000000e419e30  1fb8  59   00000000e044fbf0 System.Object
		Waiting threads:
 9320 000000000914ef98            1         1 000000000924db20  1760  35   00000000a0272d78 System.Object
		Waiting threads:
 9446 0000000009132520            1         1 000000000db84e60  211c 160   00000001201b5040 System.Object
		Waiting threads:
 9943 0000000008e22e10            1         1 000000000db84890  1d94 165   000000010045bb10 System.Object
		Waiting threads:
 9944 0000000008e11e20            1         1 000000000db882b0  23b0 158   000000010045ba68 System.Object
		Waiting threads:
 9948 0000000009136758            1         1 00000000052153d0  1758 126   0000000080270378 System.Object
		Waiting threads:
 9955 0000000009405dc8            1         1 000000000da79350  1458 148   00000001402ddc38 System.Object
		Waiting threads:
10541 0000000008e11a78            1         1 000000000da79920   c14 149   00000001201b4f98 System.Object
		Waiting threads:
10551 0000000008a8da60            1         1 000000000da7aa90  1fa0 151   00000000e039ef40 System.Object
		Waiting threads:
10593 0000000008e29580            1         1 000000000e178c40  1aa0 143   000000010045aa88 System.Object
		Waiting threads:
10604 0000000008fd3e20            1         1 000000000e1797e0  195c 119   00000000e039e590 System.Object
		Waiting threads:
10608 0000000008c80490            1         1 000000000da75360   420 146   00000001601e0ef8 System.Object

... Lots of similar entries ...

		Waiting threads:
-----------------------------
Total           18154
CCW             4
RCW             433
ComClassFactory 1
Free            16028

Look, there were a lot of threads locked, but there wasn’t a clear locking thread. What kind of locks were there?

0:000> !mlocks
Examining SyncBlocks...
Scanning for ReaderWriterLock instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for ReaderWriterLockSlim instances...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
Could not find symbol ntdll!RtlCriticalSectionList.

ClrThread  DbgThread  OsThread    LockType    Lock              LockLevel
------------------------------------------------------------------------------
0xaa       -1         0xffffffff  RWLockSlim  0000000160125d70  Writer        
0xa6       150        0x11c       thinlock    0000000080270878  (recursion:0)
0xa6       150        0x11c       thinlock    0000000140402dd0  (recursion:0)
0x112      53         0x158       SyncBlock   0000000008e22750                
0x112      53         0x158       thinlock    0000000140343ab8  (recursion:0)
0x13e      71         0x340       SyncBlock   0000000008a82e30                
0x13e      71         0x340       thinlock    00000001000b2550  (recursion:0)
0xca       146        0x420       SyncBlock   0000000008c80490                
0xca       146        0x420       thinlock    0000000080409640  (recursion:0)
0x131      122        0x4b4       SyncBlock   0000000008e20580                
0x131      122        0x4b4       thinlock    00000000c0238a88  (recursion:0)
0x104      52         0x5f0       thinlock    000000016029d250  (recursion:0)
0x104      52         0x5f0       thinlock    000000016029e858  (recursion:0)

... and many, many, many more ...

The command

mlocks

from sosex gives a perspective on which lock types are holding threads

We see locks of three types:

  • SyncBlock: These are regular “monitor” objects. Usually, a “lock” on a variable gives you a SyncBlock.

  • Thinlock: It’s an optimization over regular monitors. To simplify, let’s say they are lighter primitives than SyncBlocks. When you use a lock and only a single thread owns this lock, a thinlock is created. Later, when more threads request the same lock, this thinlock promotes to a SyncBlock. In our dump, there were a lot of threads gaining some kind of lock, but not blocking any other thread.

  • RWLockSlim: It is a special object to synchronize access of “readers” and “writer” threads. The theory behind this artifact is quite easy: reader threads only block writer threads, and writer threads block any other thread.

But… look! What the hell is a -1 thread???

It is holding a writer lock on a RWLockSlim… so, it is a writer thread (remember it will be blocking any other reader or writer thread requesting the same lock)

A positive look to a negative thread

Let’s look what’s happening on this -1 mysterious thread

0:000> ~-1 e !mk
        ^ Syntax error in '~-1 e !mk'

OK, it is really weird, isn’t it?

Well.. perhaps it is not so weird if it look at the RWLockSlim hold by this strange thread

The command

rwlock

from sosex tells you which threads are waiting to read or write on a rwlock

0:000> !rwlock 0000000160125d70
*** WARNING: Unable to verify checksum for System.ni.dll
WriteLockOwnerThread:             0xaa (DEAD)
UpgradableReadLockOwnerThread:    None
ReaderCount:                      0
ReaderThreadIds:                  None
WaitingReaderCount:               151
WaitingReaderThreadIds:           0x2e,0x3e,0x2b,0x1f,0xc,0x9,0x44,0x4b,0x4e,0x2a,0x6d,0x99,0x57,0x62,0x3b,0xa7,0xb3,0xb5,0xc0,0xce,0x24,0x41,0xe2,0xf0,0xf2,0x101,0x104,0x112,0x120,0x121,0xf3,0x94,0xd6,0x98,0x97,0x36,0x126,0x128,0x12b,0x136,0x137,0x139,0x13c,0x13e,0x142,0x145,0x14b,0x14c,0x144,0x12e,0xef,0xe6,0xf6,0x88,0x13f,0x13b,0x135,0x12a,0x129,0x13,0x4a,0x5e,0x9d,0xbf,0xc6,0xcf,0x5a,0x70,0x125,0x124,0x123,0x122,0x11e,0x11a,0x116,0x115,0x10c,0x10b,0x106,0xed,0x91,0x5c,0xe,0x119,0x127,0x12d,0x13a,0x52,0xd5,0xf8,0x12c,0x131,0x133,0x11d,0x13d,0x146,0x143,0x130,0x59,0xa1,0xbc,0xcb,0xfc,0x6f,0xc9,0xba,0x29,0x3,0x1a,0x3d,0x4,0x74,0x76,0x80,0xa0,0xca,0xc7,0xad,0xab,0xa6,0xa2,0x84,0x81,0x7f,0x7d,0x38,0x40,0x39,0x58,0x64,0x60,0x4c,0x7,0x21,0x32,0x117,0x114,0x111,0xfa,0x10f,0x10e,0x10d,0x109,0x108,0x107,0x105,0x103,0x102,0x100,0xff,0xfe
WaitingWriterCount:               151
WaitingWriterThreadIds:           None
WaitingUpgradableReaderCount:     0
WaitingUpgradableReaderThreadIds: None
WaitingWriterUpgradeCount:        0
WaitingWriterUpgradeThreadIds:    None
*This lock's writer lock is orphaned.

Here you have the answer… thread 0xAA died while it was holding the ReaderWriterLock with a write lock.

But is it possible?

Of course it is possible, and it happens as you can see… so, remember:

  • Lock sentences are easier to use in C#, because locks are automatically released in case of an exception (it is a pattern similar to the using statement)
  • If you use more complex primitives like the RWLockSlim, whenever you request a lock, be sure it is always released on a finally block.

My mind is focused on summer vacation, and I don’t know when and on what am I going to talk on next post. Perhaps something on LOH or some basic article with a simple guide for dummies to diagnose application crashes with debugdiag… it would be great if every developer on the earth use debugdiag before asking for help to debug a crash!