The Curious Case of the Core Data Crash • The Breakroom

Detective Heber hot on the trail of the Curious Case of the Core Data Crash

For several years, Twitterrific has been plagued by Core Data crash reports that I haven’t been able to figure out. It was so bad that nearly all of Twitterrific’s reports were from this framework. I knew I was doing something wrong, but didn’t know how to fix a problem I couldn’t reproduce. After lots of investigation on Twitter and Google, nothing ever turned up.

Worse, it got to the point where I mostly ignored this seemingly unsolvable problem.

Recently, there was a breakthrough. Three people used the new TestFlight feedback mechanism to report a crash on iOS. They also noted the app was in the background at the time. I had long suspected that was the case with this bug, but always assumed I was missing something important.

Rather than look at the crash reports inside Xcode’s Organizer as I usually do, I downloaded the report attached directly to the feedback report and took a look at the file. The stack trace matched the Core Data crashes I had seen many times in Organizer. No surprises there.

I was about to close the log in defeat when I noticed that the Termination Reason was something unfamiliar. Xcode‘s Organizer does not report the cause of termination — after all, one would assume the reason was BECAUSE IT CRASHED, DUH!

But here’s the thing: that assumption is totally wrong. iOS routinely terminates apps for all sorts of reasons, including using too much memory or taking too long to do background processing. I don’t know why this information isn’t included in Xcode’s Organizer, but it’s a critical piece of a debugging puzzle.

Naturally, the termination reason in the raw crash log isn’t some nice human-readable message. It’s a hex code and a cryptic string that maybe suggests which subsystem is responsible for it. Why make crash reports easy for the developers, right?

In this Core Data crash report the termination reason was: Namespace RUNNINGBOARD, Code 0xdead10cc. This nugget was something new I could search on the web for. Many of the results were just pasted crash reports with little attention paid to the significance of the termination reason.

But one result led to Tech Note 2151, with the code listed at the end under Other Exception Types:

The exception code 0xdead10cc indicates that an application has been terminated by the OS because it held on to a file lock or sqlite database lock during suspension.

Holy crap! I’ve been searching for this for ages and here, finally, was a clue! I started looking around the code in search of a trigger for this exception. I discovered that Twitterrific was sometimes closed while it was doing a network download and iOS left it running in the background long enough for the network request to finish. But not long enough for the database update to finish.

Due to incorrectly placed task markers, the background task would be marked done before it had finished saving the database. When things happened in the wrong sequence, iOS would unceremoniously kill the process — making it look like it had crashed somewhere inside of Core Data.

I spent several days reworking a bunch of old code to ensure that task markers were started and ended in the proper sequence, and to ensure that the task wasn’t marked as complete until after the entire process was done.

Lo and behold, this appears to have worked!

Prior to the fix, one recent build had over 90 crash reports from TestFlight in the span of just four days. Three days after releasing a new build to TestFlight, there were only 9 crash reports, and they were all caused by a simple bug introduced during refactoring.

So there you have it — another exciting tale of me doing things wrong for years. Remember kids, I’m a professional!