**INTRODUCTORY COMMENTS**

This, the 6th article in an ongoing buggy series that probes Robert Pape's recently published book ---

*Dying to Win: The Strategic Logic of Suicide Terrorism*--- focuses with doggedly single-minded determination on Pape's effort to test his nationalist theory of suicide terrorism statistically with logistic regression.

**Pape's Logit Modeling Is a Disaster from Start to Finish**

It's in chapter 6 of

*Dying to Win*, which starts on p. 79 and ends on p. 101, that Pape explains his nationalist theory of suicide terrorism in detail, culminating in what he claims are its very likely causal pathways. These causal pathways are then diagrammed on p. 96, followed immediately by Pape claiming how they will then be tested by his logit modeling.

The diagram is set out below, and needless to add, the preferred causal pathways --- running left to right --- are those that Pape's logit models, run on a data-set that he himself creates from scratch. In particular, the data-set that he codes, organizes, and classifies adds up to a data-set of 58 cases of military occupation . . . all carried out by democratic governments of either alien populations or of national or ethnic minorities on their own territory. This data-set, in turn, becomes the sample selection that he runs on his logit models to test his nationalist theory's alleged causal pathways. Note that the sample selection is equivalent to the total universe or population of all such relevant cases of military occupation . . . or so Pape claims, and misleadingly so as we'll see.

**Pape's Model of Suicide-terrorism**1) Solid arrows represent the theory proposed in this book.

2) The dashed arrow --- running from rebellion to nationalism --- represents a casual path

that sometimes influences the production of national identity;

but that plays little role in determining when suicide-terrorism campaigns occur.

3) The dotted arrow represents a causal path that al-Qaeda and perhaps other terrorist organizations

hoped will occur, but that has not done so.

No need to say that it's the causal pathways postulated by Pape's theory running from left to right that his logit modeling will test.

**But Note**

As for the "relevant" total of military occupations, Pape is either deliberately misleading us or fouling up once more . . . or maybe a combination of both. Or so we'll see.

In particular, like his nationalist theory of suicide terrorism, Pape's statistical testing of it turns out to be wrong and flawed from start to finish. The big trouble starts with a noticeably error-riddled data-set --- created, coded, and classified by none other than Pape himself --- that serves as the sample selection for his various logit models. No surprise there. All of Pape's most important data-sets, starting with the initial one set out in tabular form on p. 15, are markedly erroneous and unreliable . . . something we'll see again later today

That's bad enough, this severely handicapped and inaccurate data-set.

**AND IT GETS WORSE, MUCH WORSE**

Pape's statistical trouble mount and expand with the ways he has specified or constructed apparently four or five different logit models. It leads him, too, to some flagrant misinterpretations of his 2nd and 4th logit model's results --- the only ones he briefly refers to at all.

His trouble then swells and multiplies when he reports and misinterprets on several counts the results of his 2nd logit model on p. 99 in the form of a 2x2 classification table --- which will be reproduced here in a few moments. The results there are truly mediocre, not that Pape seems to realize this, and he also seems to think that they have something to do with "prediction" of specific suicide terrorist attempts in the future --- another big error. By that point, his statistical woes proceed at full gallop. His 4th logit model, referred to very briefly on p. 99, looks like a desperate, spur-of-the moment attempt to salvage the 2nd model's results from disaster even while a publishing deadline hung over his head . . . most likely, as we'll see later on, when the galley-proofs were in his hands and one of the 16 research assistants he used for his book --- or possibly one of the 20 "expert" readers the book also acknowledges --- suddenly awakened from a self-imposed brain-coma and saw what a snafu Pape's 2nd logit model amounted to.

**Nor Is That All**

Oddly, too, the final to-the-rescue 4th estimator or predictor in that last logit model --- "timely concessions" to nationalist rebels by governments that prevent the rebels from turning to suicide terrorism --- isn't referred to substantively after p. 99 for another 140 pages of text. Yes, that's no exaggeration.

Only on p. 239, just a few pages from the end of the book's text, does Pape refer to the role of concessions once more as a way to deal with suicide terrorist groups like al Qaeda . . . and this, believe it or not, is in a chapter that criticizes the current War on Terrorism and urges on the government in Washington a generous Pape-based set of policy changes that fit his nationalist theory and its wondrous statistical testing. And that's it. A 4th logit model is supposed to salvage a disastrously reported 2nd logit model's results, and its formidable theoretical role is mentioned in exactly one paragraph 140 pages later, with nothing else to show for it.

To compound all this statistical disarray, Pape's reports of his statistical work are irresponsibly stingy, so much so that they have no parallel in prof bug's memory . . . at any rate after decades of reading probably hundreds of social science statistical studies.

**HOW THE BUGGY ARGUMENT WILL UNFOLD**

All these and the swarms of other flaws that riddle the work of Pape's nationalist theory of suicide terrorism and in particular his statistical "testing" of it are dealt with in depth here today. The twists and turns of our buggy argument fall into five parts.

**Part One Is . . .**

something unusual on this buggy site: less devoted to rigorous argument and more an

*introductory*survey of Pape's statistical claims and the problems that surround them and his work . . . all followed by a little fable about a Statistical Wonder, Professor Bernard de Stapler of the University of New Orleans, who pays a visit one day to our current National Security Adviser and urges that he be given a grant to help our country in the War on Terrorism. How? He will lead a team of expert researcher to undertake a lengthy study of the non-occurrence rate, with point-estimations and confidence-intervals included, of spitball terrorism in the United States. That little fable, which deals of course with a wholly imagined funny-figure guy, is then followed by a more serious look at how a teeny-bopper gum-chewer could either match the predictive success of Pape's 2nd logit model or outperform it.

**Part Two Will . . .**

start examining the defects that Pape's data-set used as a sample selection (equivalent to the population of events) for his logit modeling, but focus mainly on some weird, out-of-the-blue derriere-covering that Pape resorts to on p. 101 of

*Dying to Win*.

**Part Three Focuses . . .**

more fixedly on Pape's highly tainted data-set, showing how it's no less reliable than any of the other major data-sets in his book, while managing to do what these other sets do: whitewash and conceal the over-towering dominance of Islamist extremism in the rash of suicide terrorism since 1980.

**Part Four Will . . .**

be largely, though not entirely, a summary of Pape's claims as to what his logit models will do for his nationalist theory of suicide terrorism, the specified four models that we presume he tested, and the sole report of his statistical work --- for all the hugging and puffing of his claims --- that appears in a 2x2 classification scheme on p. 99 of

*Dying to Win*. . . discussed, astonishingly, in only two brief paragraphs there, plus a couple of even more cursory paragraphs tucked away in fn. 43 on p. 294.

By that point, prof bug will take it for granted that the readers of today's buggy article either have a sound familiarity with logistic regression or, at least, have read through the previous two articles that set out the basics of both linear and non-linear logistic regression. That said, all new technical terms or concepts like interaction terms and the nature of 2x2 classification tables --- which further have to be analyzed into "prediction" tables, classification tables, and selection tables for accurate statistical testing --- will be clarified and usually illustrated with one or more concrete examples.

Keep in mind finally that logistic regression and logit modeling (or analysis) are interchangeable terms.

**And Part Five Finally . . .**

will delve into the remarkable number of problems that torpedo Pape's technical statistical work, a story of ship-wrecked regressions that never make it to shore. (PS tacked on here: it turned out on December 17th, 2005 that the current buggy article would be better off stopping at the end of Part Four and delving into Pape's statistical woes at length in the next buggy article . . . largely done anyway.

**One More Point**

Something else to keep in mind too, even if it means repeating what was said a minute or two ago: never, in decades of reading scholarly work that uses various forms of statistical techniques and models --- including probably hundreds of articles by economists full of linear regression equations of sometimes 4th dimension aspirations --- has he seen such a skimpy, tightfisted report of statistical work as marks Pape's summary of his in Chapter 6 of Dying to Win.

The stingy elusiveness is all the more head-spinning when you consider Pape's puffed-up claims for what the logit modeling will do for his theory.

The reason can't be the publisher's demand that Pape tone down the use of technical statistical work in order to reach more readers. Why in that case go to the trouble to have three lengthy appendixes at the end of the book for reporting data-laden matter? Why not have a couple of pages in a fourth appendix to report in detail how he specified his logit models, how they were or were not nested in one another, how model performance was tested for, how the individual parameter or coefficient-values were tested, how he interpreted his interaction term in his presumably 2nd and 4th logit models, what the sensitivity and specificity of the "observed" behavior of his dependent variable were in frequency terms, what the positive error rate was, and . . . well, the list of omissions is much longer.

The actual reasons for the oddball stinginess of it all are matters that only Pape himself would know.

**PART ONE:**

PAPE'S CLAIMS FOR HIS STATISTICAL WORK: A PRELIMINARY SURVEY, NOTHING MORE AT THIS POINT

PAPE'S CLAIMS FOR HIS STATISTICAL WORK: A PRELIMINARY SURVEY, NOTHING MORE AT THIS POINT

**Yes, It's Just A Foretaste of What You'll Get in Greater Detail in Part Four Today**

Despite this mountain of self-created troubles, Pape, all innocence presumably, sets out on pp. 96 and 97 of

*Dying to Win*a set of claims as to what his logit modeling will do --- bolstered as it is by his data-set that's created with "focused comparative" case-studies. Specifically, the logit modeling will "enhance [our] confidence" that his nationalist theory of suicide terrorism has properly identified and tested successfully "the causal dynamics" that "determine" why suicide terrorist groups emerge and launch deadly attacks against civilians. "To test my theory," Pape says,

"I employ a methodology that combines the features of focused comparison and statistical-correlative analysis using the universe --- [read: total number] --- of foreign occupations, 1980-2003. Correlative analysis of this universe enhances confidence that my theory can

*predict future events*by showing the patterns predicted by the theory occur over a large class of cases [58 in all]. Detailed analysis of historical cases

**enhances confidence**that the correlations found in the larger universe are not spurious: the theory accurately identifies

*the causal dynamics that determine outcomes [of suicidal terrorism]*" p. 96[italics and bold-type added].

Later, on p. 99 --- after the scantest report by a scholar of his statistical "testing" that prof bug has ever seen --- Pape, still the statistical ingénue, reinforces these claims. The results reported in his 2x2 classification table --- reproduced here in a few moments--- show that "the nationalist theory of suicide terrorism

*"predicts"*that suicide terrorism would occur in tandem with only one of the combinations of independent variables: "that is, when there is both a religious difference and rebellion. The theory correctly

*"predicts" 49 of 58 cases*, a result that is statistically significant at the highest common benchmark of .001, meaning that it could be achieved by chance less than once in 1000 times."

**Prof Bug's Judgment About Pape's Claims?**

As you might guess, they're all wrong! . . . and for reasons that will be set out in Part Four today. For what it's worth, Pape's badly interpreted 2x2 classification table pn p.99 with its mediocre results looks like this:

Religious Difference | No Religious Difference | |
---|---|---|

Rebellion | 7/14 | |

No Rebellion | 1/15 | 0/20 |

Just note in passing, nothing more right now, the cruddy outcome of Pape's rate of "predictive" success in the top left-hand cell --- 7 cases of suicide terrorism actually observed in his 2nd logit model compared to the baseline logit model's "prediction" that there would be 14 such cases. That's a 50% rate of success. It's equivalent to you or me knowing nothing about suicide terrorism and flipping a coin a few dozen times while always calling "heads".

**Note Too The Bottom Right-Hand Cell**

As it happens, a zero-count in a contingency table --- even in the reorganized fashion that Pape reports it --- is statistically disastrous and is called a "zero-cell defect" in logistic jargon. The presence of such a zero-count is enough, according to one prominent logistic specialist, J.S. Cramer, "to play havoc with [any logit model's] estimation routines." If, for instance, you had access to the original contingency table produced by Pape's statistical software --- usually labeled "prediction success" or "percentage of successful predictions" or something like that --- what, exactly, would be the rate of successful prediction if you calculated 0/20 or, oppositely, the NPV or Negative Predictive Value that would following from dividing 20/0?

*(These are conjectures, you understand: Pape does not provide us with the original 2x2 table that shows the frequency in each cell of "predicted" vs. "observed" outcomes . . . the predictive rate estimated originally with a baseline or null logit model using only the constant or intercept term on the right-hand side of the logit equation and the "observed rate" estimated by the final or fitted logit model that a researcher has used and tested statistically. Still, somewhere in that original table there was a zero rate of observed success.)*

Nor is that all, Pape's ingénue innocence here notwithstanding.

Look now at the two other cells in the reported Pape table. See how there's a very low if non-zero frequency rate in each? Well, as Cramer adds, in such instances "estimation will not break down, but the quality of the estimates will be unfavorably affected . . . ." ( See Cramer,

*Logit Models from Economics and Other Fields*, [Cambridge, 2003], p. 46.)

**OTHER PROBLEMS GALORE**

Mind you, these aren't the only flaws that mar the major reported statistical outcomes of Pape's logit modeling in his entire book. What follows is just a foretaste of what we'll be doing in Part Three or Four today.

**(i.) His 2x2 classification model is something, for instance, that he . . .**

Calls a prediction model: not so, not even for those who take the "predictive" success of a logistic regression fairly seriously. Come to that, as you'll see, it's not even a classification model --- rather, a selection model, and each of these three different kinds of reported "observations" in a logistic regression require a different test for statistical significance. It's about 95% likely that Pape didn't use the proper statistical test here:

*φ*, phi-p (See Scott Menard,

_{p}*Applied Logistic Regression*(Sage University Paper; 2nd ed, 2001), pp. 28-36).

**(ii.) Then, too, among the numerous other reasons why "predictive" success, which is ---**

really a measure of classification-consistency (even for a selection table if the cell-rates are properly tested for) --- is simply that it takes the natural-logged estimations of the logit identity-function that are strictly quantitative and run between (hypothetically) negative and positive infinity and reduce the range to a strict dichotomy of either 0 or 1 for "predictive" purposes. In doing so, the default cut-off rate is .05. That means, you understand, that if 6 or 7 of Pape's small sample-set of 58 cases showed up with a probability estimate of 0.46 or 0.47 or 0.48 or 0.499 on one side of the "predictive" cut-off rate, they were treated as equivalent to 0.001.

Here, by way of illustration, is a real report by Professor Karl Wuensch of a "classification output" --- note his proper reference to what Pape calls a "prediction table" --- that applies during cut-off rates (four exact) in order to demonstrate to readers what the reported occurrences are for a logit model and a large sample selection that he and two colleagues had used for a psychological experiment. The fitted logit model has several estimators or independent variables, and what they are and how the sample selection was contrived aren't germane here. Simply note the different outcomes, and how they are reported for various cut-off levels of prediction, followed by Wuensch's own comments of the sort that any respectable statistical specialist should emulate: