It was a promising day… A new shiny load test machine, the latest version of our software… We put all running together, and after some minutes the process got stuck. CPU usage was near 0%, we knew lots of requests were pending, but there was no productive work. Throughput 0 and an angry boss staring to your computer screen. What’s going go?

It looks like some kind of deadlock, doesn’t it? Well, there is no pain with a crash dump and WinDbg!

Let’s start setting up the environment. We have to setup symbol search paths and helper extensions

0:000> .sympath+ SRV*C:\symcache*http://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*C:\symcache*http://msdl.microsoft.com/download/symbols
Expanded Symbol search path is: srv*c:\symcache*http://msdl.microsoft.com/download/symbols
0:000> .sympath+ C:\Users\GorkaE\OneDrive\WinDBG\Symdownload
Symbol search path is: SRV*C:\symcache*http://msdl.microsoft.com/download/symbols;C:\Users\GorkaE\OneDrive\WinDBG\Symdownload
Expanded Symbol search path is: srv*c:\symcache*http://msdl.microsoft.com/download/symbols;c:\users\gorkae\onedrive\windbg\symdownload
0:000> .load C:\Windows\Microsoft.NET\Framework64\v4.0.30319\SOS.dll
0:000> .load sosex
This dump has no SOSEX heap index.
The heap index makes searching for references and roots much faster.
To create a heap index, run !bhi
0:000> .load psscor2

Now, lets proof our perception: is the process really stuck?

0:000> !threadpool
Work Request in Queue: 24
--------------------------------------
Number of Timers: 3
--------------------------------------
CPU utilization 0%
--------------------------------------
Worker Thread: Total: 17 Running: 17 Idle: 0 MaxLimit: 2000 MinLimit: 8
Completion Port Thread:Total: 332 Free: 1 MaxFree: 16 CurrentLimit: 333 MaxLimit: 1000 MinLimit: 8

0% CPU and a lot of threads… hmm, this looks really strange. I love sosex’s mwaits command: it will do the work and tell us which processes are locked


0:000> !mwaits
Examining SyncBlocks...
Scanning for ReaderWriterLock instances...

blah, blah blah...
more warnings and boring stuff...

ClrThread  DbgThread  OsThread    LockType    Lock              LockLevel
------------------------------------------------------------------------------
0x13a      277        0x78        SyncBlock   000000000e8a74d0                
0x49       165        0x90        RWLockSlim  000000008003fbb8  Reader        
0x16b      312        0x11c       SyncBlock   0000000007711f38                
0x16a      311        0x160       SyncBlock   000000000e8aba20                
0xfc       157        0x170       SyncBlock   00000000094bc050                

... more ...

0x1f       116        0x1140      SyncBlock   000000000786e190                
0x64       83         0x114c      RWLockSlim  000000008003fbb8  Reader        
0xe7       148        0x1160      RWLockSlim  000000008003fbb8  Reader        
0x47       30         0x1174      SyncBlock   0000000007871130                
0xb3       228        0x11c0      SyncBlock   000000000e8a74d0                

... and more ...

0xc1       78         0x131c      RWLockSlim  000000008003fbb8  Reader        
0x3a       111        0x1324      RWLockSlim  000000008003fbb8  Reader        
0x24       146        0x132c      RWLockSlim  000000008003fbb8  Reader        
0xfe       198        0x1330      SyncBlock   0000000009563e08                
0x16e      352        0x1338      SyncBlock   00000000077122a0                
0xc7       330        0x1364      SyncBlock   0000000007871130                
0x165      307        0x137c      SyncBlock   000000000e8aec48                
0x2f       347        0x13ac      SyncBlock   00000000097de268                
0x9        114        0x13b4      RWLockSlim  000000008003fbb8  Reader        

... well believe me, there were a lot of them ...

It looks like there are many threads waiting as readers for a ReaderWriterLockSlim, and some threads waiting for different locks around the code. Who has gained the access to the ReaderWriterLockSlim?

0:000> !mlocks

... blah, blah, blah ....

ClrThread  DbgThread  OsThread    LockType    Lock              LockLevel
------------------------------------------------------------------------------

... a lot of threads ...

0x1f       116        0x1140      RWLockSlim  000000008003fbb8  Writer        
0xc1       78         0x131c      SyncBlock   000000000786e190                
0xc1       78         0x131c      thinlock    00000001205768b8  (recursion:0)

... more, more, more ...

0x3a       111        0x1324      thinlock    00000000c0158c10  (recursion:0)
0x3a       111        0x1324      thinlock    00000000e00e76a0  (recursion:0)
0x24       146        0x132c      SyncBlock   0000000007629150                
0x24       146        0x132c      thinlock    00000000e0232cc8  (recursion:0)
0x9        114        0x13b4      SyncBlock   000000000ef9aff8                
0x9        114        0x13b4      thinlock    00000000a04c0848  (recursion:0)

... tons of threads owning locks ...

Look! Thread 116 is the owner of the ReaderWriterLockSlim, but it appears in both lists… Thread 78 also appears in both lists! It really looks like a deadlock.

Sosex has a great command specific to detect deadlocks

0:000> !dlk
Examining SyncBlocks...
Scanning for ReaderWriterLock instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for ReaderWriterLockSlim instances...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
Could not find symbol ntdll!RtlCriticalSectionList.
Scanning for threads waiting on SyncBlocks...
Scanning for threads waiting on ReaderWriterLock locks...
Scanning for threads waiting on ReaderWriterLocksSlim locks...
Scanning for threads waiting on CriticalSections...
*DEADLOCK DETECTED*
CLR thread 0x1f holds the Writer lock on ReaderWriterLockSlim 000000008003fbb8
...and is waiting for the lock on SyncBlock 000000000786e190 OBJ:00000000e0151330[System.Object]
CLR thread 0xc1 holds the lock on SyncBlock 000000000786e190 OBJ:00000000e0151330[System.Object]
...and is waiting for a Reader lock on ReaderWriterLockSlim 000000008003fbb8
CLR Thread 0x1f is waiting at Sisteplant.Captor.Server.MessagesBroker.MessagesBrokerServiceBase.SendMessage(Sisteplant.Captor.Server.MessagesBroker.Messages.CaptorMessage)(+0x6b IL,+0x1a6 Native)
CLR Thread 0xc1 is waiting at System.Threading.WaitHandle.WaitOne(Int64, Boolean)(+0x19 IL,+0x23 Native)


1 deadlock detected.

Wow!

So managed threads 116 and 78 are deadlocked. We should inspect both call stacks.

0:000> ~116 e !mk
... top secret call with a lot of top secret methods from my company ...

0:000> ~78 e !mk
... this call stack is also top secret ...

This way, we knew thread 116 was writing on a ReaderWriterLockSlim while waiting for a lock owned by thread 78.

Thread 78 was waiting for the thread 116 to complete the writing operation while owning the lock requested by thread 116.

Hmm… some sentence reordering and the problem was gone.

And here it comes the bonus command of today. RWLock structures are very prone to cause synchronization problems… rwlock command from sosex can tell you the status of any rwlock (given its id). When invoked without arguments, rwlock gives a report of all RWLock structures found in the process.

0:000> !rwlock 000000008003fbb8
WriteLockOwnerThread:             0x1f
UpgradableReadLockOwnerThread:    None
ReaderCount:                      0
ReaderThreadIds:                  None
WaitingReaderCount:               125
WaitingReaderThreadIds:           0x2d,0x1b,0x17,0x44,0x57,0x66,0x7a,0x80,0x95,0xaa,0xb0,0xb8,0xc4,0xc6,0xdc,0x9f,0x92,0x90,0x86,0xc0,0xb6,0xa3,0x70,0x45,0x39,0x84,0x9c,0x72,0x67,0x6a,0x65,0x60,0xdf,0xe0,0xe2,0x7e,0xa0,0xcf,0x4d,0x61,0xcd,0xc1,0xab,0x8f,0x64,0x4f,0x4a,0x48,0x1a,0x23,0x6f,0xd9,0xaf,0x62,0xd3,0xd0,0x2e,0x22,0x12,0xbe,0x52,0xae,0x8d,0x3a,0x1c,0x9,0x3d,0x5c,0x78,0x88,0xad,0xdb,0xd7,0xce,0xb4,0xa5,0x9d,0x85,0x74,0x6c,0x43,0xb,0x16,0x20,0x34,0xc,0x24,0x32,0xe7,0xe8,0xf2,0xf3,0xf4,0xf7,0xf9,0xfa,0xfd,0x40,0x7,0xf5,0x5e,0x3b,0x3e,0x49,0x5a,0xa8,0x14,0x9b,0x69,0x19,0x2c,0x13,0xd2,0x99,0x56,0x4c,0x11,0x54,0x98,0xb9,0xbb,0xc8,0x53,0xe3,0x42
WaitingWriterCount:               125
WaitingWriterThreadIds:           None
WaitingUpgradableReaderCount:     0
WaitingUpgradableReaderThreadIds: None
WaitingWriterUpgradeCount:        0
WaitingWriterUpgradeThreadIds:    None

As you can see, the ReaderWriterLockSlim owned by thread 116 was a very big bottleneck for the application, so, we finally ended up removing this ReaderWriterLockSlim… But this is another good history for another post.

Stay tuned! Next time I will show you my cheat sheet for WinDbg.