**Prologue Comments**

It was in the 6th article that the buggy series began probing the numerous flaws and mishaps that ripple and swell throughout Pape's statistical work.

The buggy analysis there moved at a fast, fairly top-skimming approach, exactly as prof bug observed it would. Today's article is different. It slows down the pace and intensifies the scrutiny of Pape's statistical ship-wreck, which reflects, as it sinks bow-first into some dark deep miasma, the teeming blunders, glitches, and fallacies that swarm throughout

*Dying to Win*and wear away its theoretical understructure and guidance-systems. Those flaws aren't confined to theoretical generalizations and statistical work. They also haunt almost all of Pape's major data-sets, starting with the very first table on p. 15 and on through subsequent chapters, including the one concocted for use in logit modeling in Chapter 6.

And what does this ship-wreck leave in its tow? Well, whether intentionally or not, this clear impression: both Pape's theory and the statistical work add up to a whitewashing of the dominant role of Islamist extremism in the rash of suicide terrorism that has erupted on the world scene since 1980.

**One Final Point**

1. Today's buggy argument begins with Part Four. That's purposeful, a clear indicator that it's a direct follow-up of the previous article in this series on Robert Pape. It sets out in detail what were presumably Pape's logit-models, including the final . . . well, final "two" fitted models that p. 99 and fn. 43 on p. 294 refer briefly to.

2. Part Five probes the various technical problems that beset Pape's data-set of 58 cases.

The previous buggy article, recall, examined the biases, glitches, and havoc-making howlers that riddled the data-set in substantive ways --- above all, its total exclusion of a good 10 Islamist terrorist groups that carried out suicidal attacks between 1980 and 2003's end. Not only did this exclusion further conceal the overwhelmingly dominant role of Islamist groups in those suicidal attacks --- a total of 17 of the 19 such terrorist groups --- but it allowed Pape to focus his logit modeling on only military occupations of nationalist minorities or alien populations by democratic countries. In this sloppy lopsided manner, without ever owing up to it, Pape solved much of his regression-work by definition . . . not that the final statistical results that finally emerged didn't founder anyway . . . or so we will see in greater detail today.

The argument in Part Five switches focus and probes the technical troubles of Pape's wholly self-created data-set: not the ways he selectively included only cases that fitted his theory, or coded, classified, and organized the 58 cases he claims to have found, but rather whether its small size isn't suitable for logistic regression . . . at any rate with standard maximum likelihood estimations, the standard way all software packages calculate the coefficients of variables in logistic regression. Several indicators converge to show that Pape's sample size is simply inadequate and will produce completely unreliable "effects" for the coefficients and equally unreliable outcomes for the binary dependent variable (the frequency rates at which suicide terrorism was observed to occur or not occur in the total 58 cases).

But note.

The inadequate size of Pape's self-made sample set need not have been impossible to run with fairly accurate estimates of coefficient values, provided he had recognized his troubles with its size and resorted to what are called "exact methods" of logistic estimation. There's a 99% likelihood that he didnit. Pape's knowledge of logistic regression and logit analysis, you see, seems confined to basics and a strictly cookbook approach to statistical work --- unless, of course, he has been deliberately conning us with disastrous logit results on p. 99 and in fn. 43 on p. 294; and most likely he never realized that his sample set was too small for proper maximum likelihood estimation and that he needed to fall back on "exact methods" for small samples.

3. Part Six --- which will appear in the next buggy article (the 8th in this series) --- will delve deeply then into more of the technical howlers that ripple through Pape's statistical work, focusing especially on his astonishingly naive and totally misleading interpretations of what his reported logit modeling amounted to.

**...........................................................**

**Remember What Was Said in the Previous Buggy Article**

As the warning sounded in its introductory part noted, we're reduced to some speculation in the analysis of Pape's logit modeling here.

No help for it. Pape's astonishingly stingy reports of his logit modeling --- how many models were specified, how they tested against one another, what the estimated coefficients of his variables were in each, what tests of their statistical significance he actually used, and what, if any, goodness-of-fit test for overall model performance he used --- leaves us no choice. Fortunately, if we assume certain things and infer some other things from his briefly reported models on p. 99 --- presumably his 2nd and 4th specified logit models --- we can be relatively confident about most of what follows.

In any case, something certain will emerge by the end of today's buggy argument: in a word, what a disastrous hash of a wrong, badly misconceived theory Pape's logit modeling turns out to be . . . a statistical fiasco from start to finish.

The fiasco starts with a lopsidedly wrong data-set that Pape uses as a sample selection --- equal to its population --- for his logistic regressions. The disaster continues with erroneous interpretations of his reported logit models' success as "predictive models"; it very likely envelopes his interaction term, a key independent variable in at least the two models he reports on briefly; it infests his likely estimated coefficients quite apart from the misinterpreted interaction term --- a near-certainty, as we'll see, with the existence of what's called a "zero-cell defect" in his reported classification results (these play havoc with logit estimations); and it shoots up and multiplies in . . . well, no need to run ahead of our argument here. Parts Two and Three will be the place to delve into these troubles.

**First Things First Though**

Not surprisingly, in Pape's analysis, those world-views and strategic calculations turn out to have little or nothing to do with Islamist fundamentalism --- itself, Pape assures us at the start of chapter 7, peaceful and mainly concerned to fight off western cultural and economic imperialism --- and almost everything to do with nationalist resistance to such neo-colonialism . . . the latter, please note, a term that Pape seems himself to subscribe to. At any rate, he doesn't contradict it, and he even insists that what counts anyway is how bin Laden and his terrorist associates see the world, not how average Americans or others do.

On Pape's view, then, bin Laden and al Qaeda are animated overwhelmingly by motives and strategic calculations that are fully in line with Pape's nationalist theory of suicide terrorism. Presumably, too, so are its loose affiliates world-wide as well as its imitator terrorists in the Muslim world.

Not that Pape himself uses the term "Muslim world." It's a no-no for him --- and for reasons you're thoroughly familiar with by now.

**Why Pape Has To Use Logit Modeling**

He has to --- or some other, if less popular non-linear regression technique --- because his outcome or dependent variable --- whether suicide terrorism occurs or not --- is a binary qualitative variable that is inherently non-linear. The alternative outcomes --- if suicide terrorism occurs (Y = 1) or not (Y = 0) --- can't be regressed with linear equations for reasons that should be familiar to all of you.

As for Pape's independent variables on which the dependent variable is regressed --- once the non-linear equations are transformed by the logit model into linear ones using logged odds to the base

_{e}--- they are some combination of nationalist rebellion and religious differences between a democratic military occupier and the occupied people.

So far, so good. But wait!

Look back at Pape's diagram of his theory's causal influences and pathways. What has happened to "occupation" as a causal influence in his logit models? In particular, why doesn't it figure as an independent estimating variable (or covariate or predictor) in the models whose equations will be set out in a moment?

Well, as we noted in the preliminary remarks earlier today, we're dealing with another slippery technique of Pape's. Note that in the diagram, the role of "occupation" isn't confined there to democratic military occupiers; it refers in general to any occupation. By definition, however, the data-set that Pape has assembled, coded, and classified to run his logit models on is limited to occupations by democratic countries. The result? It's two-fold: 1) Instead of hundreds of occupations by democratic and non-democratic governments between 1980 and 2003 of territorial based ethnic minorities that resented central government control, Pape by definition limits the disputes to 58 cases, all involving democratic countries where, it turns out, 9 suicide terrorist groups emerged. And 2) Pape's limited data-set then compounds the problems by ignoring at least 9 or 10 Islamist-inspired suicide terrorist groups that carried out attacks against Muslim autocratic countries.

**And Now Our Buggy Conjectures About Pape's Logit Modeling**

**Despite the Surprisingly Spare Reporting, We Will Assume Pape Followed Proper**

Modeling Procedures

Modeling Procedures

With these big provisos in mind, we can assume that he had specified 4 or 5 models. Strictly speaking, given the need to test each logit model for its overall model performance as he added or experimented with different independent variables --- or estimators or predictors (in logistic regression terminology) or covariates (ditto)--- he would have likely opted for 5 models, starting with a baseline one in which any independent variables are set to 0 and he looked at the performance of the model using only the constant or intercept term and run it on his data-set of 58 cases.

Even so, we'll list the base mode with the constant or intercept term only and label it 1.1 and the theory-inspired first model as 1.2. As you can see immediately below, the intercept-only model is estimated with the covariates or independent variables set to zero. (Technically speaking, the two other independent variables that Pape's logit models use should be included and set to zero too.)

**Pape's First Two Logit Models**

**Y = ln[p/(1-p)] = a + b**

_{0}X + b_{0}Z (1.1)**Y = ln[p/(1-p)] = a + b**

_{1}X + b_{2}Z (1.2 )where:

*****ln[p/(1-p)] = the logit = the logged odds that suicide terrorism (its value equal to "1") will occur if someone were to pick a random case from Pape's 58 data sample and observe whether in fact it will occur or not the next time.

*****a = the constant or intercept, whose final value Y takes when the X and Z independent variables all equal zero

*****b

_{1}X = nationalist rebellion

*****b

_{2}Z = religious differences between the occupier and the occupied people

As you've probably already inferred, the two independent variables --- nationalist rebellion and religious differences --- are qualitative binary variables too, called "categorical" variables . . . or estimators or covariates or "predictors" in logistic jargon.

That means that they too are treated like dummy variables, exactly like the dependent or outcome variable --- suicide terrorism is observed to either occur or not occur when Pape applies his logit models to his self-created data-set. There's nothing wrong with dummy variables used as independent variables. They appear all the time in linear regression too. It's the dependent binary qualitative variable that requires the use of non-linear regression like logistic.

The use of the independent dummy estimators, though, does require --- as you'll see momentarily --- a larger sample size than otherwise, even if Pape didn't realize this himself. (This is especially true when the distribution of the categorical variables in a logit model leans heavily to one value --- which as we'll see is true for Pape's reported 2nd model. In fact, there are only 9 occurrences of suicide terrorism in his relatively small sample size --- equal to the population of "democratic" military population of 58 total cases --- and a classified output of the sort that appears on p. 99 is lopsided toward its non-occurrence. Worse, there's a zero-cell in the reported classification table, which plays havoc as we'll also see with all the estimators' coefficients.)

Note that all the subsequent independent variables --- one more for the next two logit models Pape constructed --- are also categorical and hence have to be treated as dummy variables too.

Note, too, that once you've grasped the logit identity's role as it appears in the two specified models above --- ln[p/(1-p)] --- we'll take it for granted that you know logit analysis has transformed probabilities into odds and logged them to the base

_{e}. . . the natural logarithm, so we'll just drop it from the next equations. What that logit transformation does is remove both the base and ceiling of probability estimates --- 0, 1 --- and in effect stretches out the S curve of the original non-linear cdf to look linear.

**Pape's Reported 2nd Logit Model**

Pape's astonishingly bare-boned report on this 2nd model appears on p. 99 and in a fn. 43 on p. 294 in

*Dying to Win*. In particular, the reported "predictive success" he claims on p. 99, as we'll see soon, appears in a 2x2 classification table that, typically, is misinterpreted by Pape on several counts: the classified results perform miserably, it isn't a prediction model in any dictionary sense of the term, it isn't even a prediction-model by those logistic-regression theorists who take "predictive efficiency" seriously (like Scott Menard) --- rather, not even a classification table but what's called a "selection table" --- and . . . well, we'll get to the tangled confusion soon enough in Part Five. Note here that if Pape didn't distinguish between these three kinds of tables, his logistic software --- whichever program he ran --- will have not properly tested the observed vs. predicted outcomes for statistical significance. He would have needed to apply the proper statistic and calculate the significance himself. (See Scott Menard,

*Applied Logistic Regression*(Sage University Paper, 2nd ed., 2001), pp. 28-33)

Here, meanwhile, is what the second model with a 3rd estimator added to 1b's: an interaction term for nationalist rebellion and religious different working in tandem on the behavior of the outcome variable --- suicide terrorism's occurrence or not:

**Y = a + b**

_{1}X + b_{2}Z + b_{3}XZ 2.0where:

b

_{3}XZ = the interaction term just mentioned

**Pape's Barely Referred to 4th Logit Model:**

**Y = a + b**

_{1}X + b_{2}Z + b_{3}XZ + b_{4}C 4.0where:

b

_{4}C = concession made by democratic occupying countries to armed nationalist rebels (including terrorist groups) that, according to Pape, stopped them from resorting to suicide terrorism. /span>

As you'll see, Pape invokes this 4th independent variable on p. 99 totally out of the blue with head-spinning abruptness: it doesn't appear at all in the diagrammed causal-pathways that appears on p. 96 of Dying to Win, and it is mentioned in passing for half a paragraph, followed by the itty-bitty fonts in the weird table on p. 100. After which, it isn't mentioned again for 139 pages in the rest of his book --- on p. 239, the text itself ending on p. 250.

As we noted earlier and will clarify in Part Six, its sudden invocation looks like a desperate, last-second rescue-job by Pape to save his reported 2nd logit model's results from disastrous mediocrity. (There's also a 3rd logit model Pape performed, mentioned for a sentence or two in fn 43 on p. 294: it substituted "linguistic difference" for religious difference, but Pape reported that the logit model couldn't be estimated properly because there wasn't enough variation across the 58 cases of military occupation for logistic regression's estimating procedure --- maximum likelihood estimation --- to be carried out and successfully distinguish the effects of the model's independent variables on the behavior of the outcome variable, suicide terrorism's occurrence or not.)

As for the

*3rd logit model*of Pape's, don't worry. Prof bug hasn't forgotten about it. It uses a different estimator in place of "religious difference" --- specifically, "linguistic difference" --- and as we'll see soon enough, Pape rejects it as not producing enough variation across the data-set to generate a statistically sound model.