This, the 6th article in an ongoing buggy series that probes Robert Pape's recently published book --- Dying to Win: The Strategic Logic of Suicide Terrorism --- focuses with doggedly single-minded determination on Pape's effort to test his nationalist theory of suicide terrorism statistically with logistic regression.
Pape's Logit Modeling Is a Disaster from Start to Finish
It's in chapter 6 of Dying to Win, which starts on p. 79 and ends on p. 101, that Pape explains his nationalist theory of suicide terrorism in detail, culminating in what he claims are its very likely causal pathways. These causal pathways are then diagrammed on p. 96, followed immediately by Pape claiming how they will then be tested by his logit modeling.
The diagram is set out below, and needless to add, the preferred causal pathways --- running left to right --- are those that Pape's logit models, run on a data-set that he himself creates from scratch. In particular, the data-set that he codes, organizes, and classifies adds up to a data-set of 58 cases of military occupation . . . all carried out by democratic governments of either alien populations or of national or ethnic minorities on their own territory. This data-set, in turn, becomes the sample selection that he runs on his logit models to test his nationalist theory's alleged causal pathways. Note that the sample selection is equivalent to the total universe or population of all such relevant cases of military occupation . . . or so Pape claims, and misleadingly so as we'll see.
Pape's Model of Suicide-terrorism
1) Solid arrows represent the theory proposed in this book.
2) The dashed arrow --- running from rebellion to nationalism --- represents a casual path
that sometimes influences the production of national identity;
but that plays little role in determining when suicide-terrorism campaigns occur.
3) The dotted arrow represents a causal path that al-Qaeda and perhaps other terrorist organizations
hoped will occur, but that has not done so.
No need to say that it's the causal pathways postulated by Pape's theory running from left to right that his logit modeling will test.
As for the "relevant" total of military occupations, Pape is either deliberately misleading us or fouling up once more . . . or maybe a combination of both. Or so we'll see.
In particular, like his nationalist theory of suicide terrorism, Pape's statistical testing of it turns out to be wrong and flawed from start to finish. The big trouble starts with a noticeably error-riddled data-set --- created, coded, and classified by none other than Pape himself --- that serves as the sample selection for his various logit models. No surprise there. All of Pape's most important data-sets, starting with the initial one set out in tabular form on p. 15, are markedly erroneous and unreliable . . . something we'll see again later today
That's bad enough, this severely handicapped and inaccurate data-set.
AND IT GETS WORSE, MUCH WORSE
Pape's statistical trouble mount and expand with the ways he has specified or constructed apparently four or five different logit models. It leads him, too, to some flagrant misinterpretations of his 2nd and 4th logit model's results --- the only ones he briefly refers to at all.
His trouble then swells and multiplies when he reports and misinterprets on several counts the results of his 2nd logit model on p. 99 in the form of a 2x2 classification table --- which will be reproduced here in a few moments. The results there are truly mediocre, not that Pape seems to realize this, and he also seems to think that they have something to do with "prediction" of specific suicide terrorist attempts in the future --- another big error. By that point, his statistical woes proceed at full gallop. His 4th logit model, referred to very briefly on p. 99, looks like a desperate, spur-of-the moment attempt to salvage the 2nd model's results from disaster even while a publishing deadline hung over his head . . . most likely, as we'll see later on, when the galley-proofs were in his hands and one of the 16 research assistants he used for his book --- or possibly one of the 20 "expert" readers the book also acknowledges --- suddenly awakened from a self-imposed brain-coma and saw what a snafu Pape's 2nd logit model amounted to.
Nor Is That All
Oddly, too, the final to-the-rescue 4th estimator or predictor in that last logit model --- "timely concessions" to nationalist rebels by governments that prevent the rebels from turning to suicide terrorism --- isn't referred to substantively after p. 99 for another 140 pages of text. Yes, that's no exaggeration.
Only on p. 239, just a few pages from the end of the book's text, does Pape refer to the role of concessions once more as a way to deal with suicide terrorist groups like al Qaeda . . . and this, believe it or not, is in a chapter that criticizes the current War on Terrorism and urges on the government in Washington a generous Pape-based set of policy changes that fit his nationalist theory and its wondrous statistical testing. And that's it. A 4th logit model is supposed to salvage a disastrously reported 2nd logit model's results, and its formidable theoretical role is mentioned in exactly one paragraph 140 pages later, with nothing else to show for it.
To compound all this statistical disarray, Pape's reports of his statistical work are irresponsibly stingy, so much so that they have no parallel in prof bug's memory . . . at any rate after decades of reading probably hundreds of social science statistical studies.
HOW THE BUGGY ARGUMENT WILL UNFOLD
All these and the swarms of other flaws that riddle the work of Pape's nationalist theory of suicide terrorism and in particular his statistical "testing" of it are dealt with in depth here today. The twists and turns of our buggy argument fall into five parts.
Part One Is . . .
something unusual on this buggy site: less devoted to rigorous argument and more an introductory
survey of Pape's statistical claims and the problems that surround them and his work . . . all followed by a little fable about a Statistical Wonder, Professor Bernard de Stapler of the University of New Orleans, who pays a visit one day to our current National Security Adviser and urges that he be given a grant to help our country in the War on Terrorism. How? He will lead a team of expert researcher to undertake a lengthy study of the non-occurrence rate, with point-estimations and confidence-intervals included, of spitball terrorism in the United States. That little fable, which deals of course with a wholly imagined funny-figure guy, is then followed by a more serious look at how a teeny-bopper gum-chewer could either match the predictive success of Pape's 2nd logit model or outperform it.
Part Two Will . . .
start examining the defects that Pape's data-set used as a sample selection (equivalent to the population of events) for his logit modeling, but focus mainly on some weird, out-of-the-blue derriere-covering that Pape resorts to on p. 101 of Dying to Win
Part Three Focuses . . .
more fixedly on Pape's highly tainted data-set, showing how it's no less reliable than any of the other major data-sets in his book, while managing to do what these other sets do: whitewash and conceal the over-towering dominance of Islamist extremism in the rash of suicide terrorism since 1980.
Part Four Will . . .
be largely, though not entirely, a summary of Pape's claims as to what his logit models will do for his nationalist theory of suicide terrorism, the specified four models that we presume he tested, and the sole report of his statistical work --- for all the hugging and puffing of his claims --- that appears in a 2x2 classification scheme on p. 99 of Dying to Win
. . . discussed, astonishingly, in only two brief paragraphs there, plus a couple of even more cursory paragraphs tucked away in fn. 43 on p. 294.
By that point, prof bug will take it for granted that the readers of today's buggy article either have a sound familiarity with logistic regression or, at least, have read through the previous two articles that set out the basics of both linear and non-linear logistic regression. That said, all new technical terms or concepts like interaction terms and the nature of 2x2 classification tables --- which further have to be analyzed into "prediction" tables, classification tables, and selection tables for accurate statistical testing --- will be clarified and usually illustrated with one or more concrete examples.
Keep in mind finally that logistic regression and logit modeling (or analysis) are interchangeable terms.
And Part Five Finally . . .
will delve into the remarkable number of problems that torpedo Pape's technical statistical work, a story of ship-wrecked regressions that never make it to shore. (PS tacked on here: it turned out on December 17th, 2005 that the current buggy article would be better off stopping at the end of Part Four and delving into Pape's statistical woes at length in the next buggy article . . . largely done anyway.
One More Point
Something else to keep in mind too, even if it means repeating what was said a minute or two ago: never, in decades of reading scholarly work that uses various forms of statistical techniques and models --- including probably hundreds of articles by economists full of linear regression equations of sometimes 4th dimension aspirations --- has he seen such a skimpy, tightfisted report of statistical work as marks Pape's summary of his in Chapter 6 of Dying to Win.
The stingy elusiveness is all the more head-spinning when you consider Pape's puffed-up claims for what the logit modeling will do for his theory.
The reason can't be the publisher's demand that Pape tone down the use of technical statistical work in order to reach more readers. Why in that case go to the trouble to have three lengthy appendixes at the end of the book for reporting data-laden matter? Why not have a couple of pages in a fourth appendix to report in detail how he specified his logit models, how they were or were not nested in one another, how model performance was tested for, how the individual parameter or coefficient-values were tested, how he interpreted his interaction term in his presumably 2nd and 4th logit models, what the sensitivity and specificity of the "observed" behavior of his dependent variable were in frequency terms, what the positive error rate was, and . . . well, the list of omissions is much longer.
The actual reasons for the oddball stinginess of it all are matters that only Pape himself would know.
PAPE'S CLAIMS FOR HIS STATISTICAL WORK: A PRELIMINARY SURVEY, NOTHING MORE AT THIS POINT
Yes, It's Just A Foretaste of What You'll Get in Greater Detail in Part Four Today
Despite this mountain of self-created troubles, Pape, all innocence presumably, sets out on pp. 96 and 97 of Dying to Win
a set of claims as to what his logit modeling will do --- bolstered as it is by his data-set that's created with "focused comparative" case-studies. Specifically, the logit modeling will "enhance [our] confidence" that his nationalist theory of suicide terrorism has properly identified and tested successfully "the causal dynamics" that "determine" why suicide terrorist groups emerge and launch deadly attacks against civilians. "To test my theory," Pape says,
"I employ a methodology that combines the features of focused comparison and statistical-correlative analysis using the universe --- [read: total number] --- of foreign occupations, 1980-2003. Correlative analysis of this universe enhances confidence that my theory can predict future events by showing the patterns predicted by the theory occur over a large class of cases [58 in all]. Detailed analysis of historical cases enhances confidence that the correlations found in the larger universe are not spurious: the theory accurately identifies the causal dynamics that determine outcomes [of suicidal terrorism] " p. 96[italics and bold-type added].
Later, on p. 99 --- after the scantest report by a scholar of his statistical "testing" that prof bug has ever seen --- Pape, still the statistical ingénue, reinforces these claims. The results reported in his 2x2 classification table --- reproduced here in a few moments--- show that "the nationalist theory of suicide terrorism "predicts" that suicide terrorism would occur in tandem with only one of the combinations of independent variables: "that is, when there is both a religious difference and rebellion. The theory correctly "predicts" 49 of 58 cases, a result that is statistically significant at the highest common benchmark of .001, meaning that it could be achieved by chance less than once in 1000 times."
Prof Bug's Judgment About Pape's Claims?
As you might guess, they're all wrong! . . . and for reasons that will be set out in Part Four today. For what it's worth, Pape's badly interpreted 2x2 classification table pn p.99 with its mediocre results looks like this:
Suicide Terrorism and Democratic Occupation, 1980-2003
| ||Religious Difference ||No Religious Difference |
|Rebellion ||7/14 || |
|No Rebellion ||1/15 ||0/20 |
Just note in passing, nothing more right now, the cruddy outcome of Pape's rate of "predictive" success in the top left-hand cell --- 7 cases of suicide terrorism actually observed in his 2nd logit model compared to the baseline logit model's "prediction" that there would be 14 such cases. That's a 50% rate of success. It's equivalent to you or me knowing nothing about suicide terrorism and flipping a coin a few dozen times while always calling "heads".
Note Too The Bottom Right-Hand Cell
As it happens, a zero-count in a contingency table --- even in the reorganized fashion that Pape reports it --- is statistically disastrous and is called a "zero-cell defect" in logistic jargon. The presence of such a zero-count is enough, according to one prominent logistic specialist, J.S. Cramer, "to play havoc with [any logit model's] estimation routines." If, for instance, you had access to the original contingency table produced by Pape's statistical software --- usually labeled "prediction success" or "percentage of successful predictions" or something like that --- what, exactly, would be the rate of successful prediction if you calculated 0/20 or, oppositely, the NPV or Negative Predictive Value that would following from dividing 20/0?
(These are conjectures, you understand: Pape does not provide us with the original 2x2 table that shows the frequency in each cell of "predicted" vs. "observed" outcomes . . . the predictive rate estimated originally with a baseline or null logit model using only the constant or intercept term on the right-hand side of the logit equation and the "observed rate" estimated by the final or fitted logit model that a researcher has used and tested statistically. Still, somewhere in that original table there was a zero rate of observed success.)
Nor is that all, Pape's ingénue innocence here notwithstanding.
Look now at the two other cells in the reported Pape table. See how there's a very low if non-zero frequency rate in each? Well, as Cramer adds, in such instances "estimation will not break down, but the quality of the estimates will be unfavorably affected . . . ." ( See Cramer, Logit Models from Economics and Other Fields
, [Cambridge, 2003], p. 46.)
OTHER PROBLEMS GALORE
Mind you, these aren't the only flaws that mar the major reported statistical outcomes of Pape's logit modeling in his entire book. What follows is just a foretaste of what we'll be doing in Part Three or Four today.
(i.) His 2x2 classification model is something, for instance, that he . . .
Calls a prediction model: not so, not even for those who take the "predictive" success of a logistic regression fairly seriously. Come to that, as you'll see, it's not even a classification model --- rather, a selection model, and each of these three different kinds of reported "observations" in a logistic regression require a different test for statistical significance. It's about 95% likely that Pape didn't use the proper statistical test here: φp
, phi-p (See Scott Menard, Applied Logistic Regression
(Sage University Paper; 2nd ed, 2001), pp. 28-36).
(ii.) Then, too, among the numerous other reasons why "predictive" success, which is ---
really a measure of classification-consistency (even for a selection table if the cell-rates are properly tested for) --- is simply that it takes the natural-logged estimations of the logit identity-function that are strictly quantitative and run between (hypothetically) negative and positive infinity and reduce the range to a strict dichotomy of either 0 or 1 for "predictive" purposes. In doing so, the default cut-off rate is .05. That means, you understand, that if 6 or 7 of Pape's small sample-set of 58 cases showed up with a probability estimate of 0.46 or 0.47 or 0.48 or 0.499 on one side of the "predictive" cut-off rate, they were treated as equivalent to 0.001.
Here, by way of illustration, is a real report by Professor Karl Wuensch
of a "classification output" --- note his proper reference to what Pape calls a "prediction table" --- that applies during cut-off rates (four exact) in order to demonstrate to readers what the reported occurrences are for a logit model and a large sample selection that he and two colleagues had used for a psychological experiment. The fitted logit model has several estimators or independent variables, and what they are and how the sample selection was contrived aren't germane here. Simply note the different outcomes, and how they are reported for various cut-off levels of prediction, followed by Wuensch's own comments of the sort that any respectable statistical specialist should emulate:
"The CTABLE command gives us extensive classification table output. To use our model to predict which binary outcome is obtained ("approval or not" ), we need a decision rule of the form: If the probability of occurrence of the predicted event is P or higher, we predict that the event will occur; if less than P, we predict it will not. Some programs just use .5 as the P, but SAS lets you pick any value you want, or, if you don't give it a value, it shows you the statistics for many different values. There are three ways you can calculate your "success rate" in classifying observations. You could just count up the number of correct classifications and divide by the total number of predictions. This is the "Correct" percentage given by SAS.
| || PREDICT THE OCCURRENCE OF THE EVENT IF P > |
| || .02 || .20 || .40 || .50 |
| Correct || 128/315 = .406 || (120+72)/315 =.610 || (94+130)/315 = .711 || (71+147)/315 =.692 |
| Sensitivity || 128/128 = 1.00 || 120/128 =.938 || 94/128 = .734 || 71/128 =.555 |
| Specificity || 0/187 = 0.00 || 72/187 =.385 || 130/187 = .695 || 147/187 =.786 |
| False Positives || 187/(187+128) = .594 || 115/(120+115) =.489 || 57/(94+57) =.377 || 40/(71+40) =.360 |
| False Negatives || none || 8/(72+8) =.10 || 34/(130+34) =.207 || 57/(147+57) =.279 |
"You could find the P(correct | event did occur), that is, the percentage of occurrences correctly predicted, known as the Sensitivity. You could find the P(correct | event did not occur), that is, the percentage of non-occurrences correctly predicted, known as Specificity. Focusing on errors in prediction, you could compute the False Positive rate, the P(incorrect | occurrence was predicted), the percentage of predicted occurrences which are incorrect, or the False Negative rate, P(incorrect | nonoccurrence was predicted), the percentage of predicted non-occurrences which are incorrect. Lower P values will be associated with greater sensitivity and fewer false negatives, but less specificity and more false positives. Higher P values will be associated with greater specificity and fewer false positives, but lower sensitivity and more false negatives . . .
"I reported the percentages for P = .4, which gave nearly equal values of sensitivity (73%) and specificity (70%). P = .42 would be even more nearly equal, but lowers the overall success rate a bit. If you wish to evaluate the omnibus effect of a categorical (k > 2) predictor, you have to delete all of its dummy variables and see if the model performs significantly worse [something, please note, Pape can't do because all his independent variables are dummy variables: prof bug]. Look at the results of my second invocation of PROC LOGISTIC. With the scenario dummy variables out, the 2 LOG L increased from 338.06 to 346.503, an increase of 8.443 on 4 df (one df for each dummy variable). From SAS's PROBCHI function I obtained the p, .0766, not quite statistically significant. I chose not to report this test, as the typical reader would not appreciate such a p."
(iii.) The report of predictive success produced by
all logistic regression software --- even one that the University of Chicago political science department might have contrived on its own --- turns out to be especially unreliable for small sample-sets . . . and doubly so if the data in those sets haven't been jackknifed, bootstrapped, or internally validated by use of split-sampling, quite apart from all other problems that bedevil the use of such classified-results to test for model-performance. .
Since Part Three today will delve deeply into the size of Pape's small sample-set, showing among other things that it's too small for the number of dummy variables he's using (as well as the number of cells needed for reporting "classified outcomes"), there's no need to pursue this topic ... though you'll find a sidebar set of technical observations at the end of Part One here that will shed a little more light on what a logit model's "predictive efficiency" (or success) really means. Those who want to see how medical specialists have become aware of how overly optimistic and hence unreliable the use of "predictive models" can be will find a concise analysis here
All of these matters, please note, are returned to and clarified with examples in Part Three today.
On top of that, Pape --- still the statistical naïf --- also seemingly thinks that . . .
the "predictive success" in the bottom cell on the right side shows how formidably his nationalist theory of suicide terrorism has been statistically tested . . . quite apart, you understand, from flagrant ignorance about the zero-cell defect there.
And what is that cell's observed vs. predicted results supposed to show? On Pape's view, it explains why suicide terrorism doesn't
occur! Huh? (Keep in mind that the predicting is being done by the null or baseline logit model run on Pape's data-set with no estimating variables, only the constant or intercept term. The "observed" frequency that explains "predictive success" is estimated by Pape's fitted 2nd logit model with three independent variables as estimators . . . all matters that Part Four will deal with in depth today)
Apparently Pape, who touts his model's predictive success when it comes (among other things) to the observed rate of non-occurrences of suicide terrorism achieved by his 2nd logit model, thinks this outcome is substantively significant. It isn't.
In no way, to take just one example, is it equivalent to a clinical or medical study that looks at the frequency of cardiac health disease i(CHD) among American men over the age of 65, likewise regressed on a logit model. T
The binary outcome variable would be coded by logistic software as y = 1 (suffers from CHD) and y = 0 (free of CHD).
Obviously, in this use of logit modeling, physicians, patients, pharmaceutical companies, sports equipment firms, and the like would all be extremely interested in knowing the fitted logit model's estimated effects (influences) that help account for both
the alleged causes of CHD (y = 1) and the reasons why men without the disease "don't have it" (y = 0). . . diet, exercise, lack of mental stress, genetic heritage, and so on. The binary outcomes are equally revealing. Virtually all clinical studies in the health sciences are like that.
By contrast, suicide terrorism is extremely rare in the world, and hence --- except in a country like Iraq, where (in Pape-speak) the assembled altruistic nationalist freedom-fighters specialize in Kabooming kiddies, commuters, and themselves into oblivion --- more or less equivalent to the incidence of volcanic eruptions around the world between 1980 and 2003.
(v.) Viewed from this angle, then, it's easy
to see that the reasons why suicide terrorism doesn't occur often in the world --- even in Pape's defective, self-serving data-set of 58 cases, mind you, it appears only 9 total times --- are likely to be legion, yes even when there is some sort of armed rebellion going on in the world. (Between 1945 and 2005 alone, there have been over 220 civil wars alone, and hundreds more cases where governments sent security forces into areas of ethnic unrest . . . as recently occurred, say, in the French riots of November 2005).
These reasons, just tossing a few out for your consideration that have tumbled from prof bug's mind, include things like internalized cultural restraints among nationalist or ethnic rebels; and effective security forces that thwart intended suicide attacks --- the Israelis since 2001 have thwarted several hundred such attacks; and the failure of jihadist leaders to recruit anyone for the missions (they themselves, you note, don't send their own family members off on such suicidal ventures); and total lack of interest even by leaders in suicide tactics (the IRA say); and of course psychic sanity, or lack of gullibility, or effective parental supervision of restless, alienated youth who might volunteer on a whim . . . not to forget dysfunctional socialization processes, widespread in the Muslim world, that don't convince men that they are entitled to dominate women or that the instant after they've Kaboomed several commuters or cafe-clients into oblivion they themselves will be swept swiftly on the wings of Angels right to Paradise and into the waiting arms of their 72 (hopefully not too demanding) sex-starved virgins, and so on.
Don't giggle. This is serious stuff. Very very serious
Though, come to think of it, what would we say if we heard that a scholar had proposed to the National Science Foundation that he or she would like a several million-dollar grant to study the complex reasons why there haven't been any volcanic eruptions in downtown Phoenix or in the Sahara desert? A grant-request, according to the scholar in question, that should be all the more generously funded because he or she, aware of their own limits, will have to hire 26 research assistants to comb dark, downtown Phoenix buildings for lava-remains over several years and to crawl on the ground every other hour and listen to possible volcanic rumbles with a state-of-the-art stethoscope while, at the same time, a good 30 "expert" scholar-friends will be doing the same in the middle of the Sahara desert?
Get the point? OK, then:
Suppose one day, to illustrate Pape's misuse of non-occurrences in his classified output, our National Security Adviser. . . .
has agreed to see a sociological scholar from the University of New Orleans, Professor Bernard de Stapler, so many credentials after his name that not even our top officials can refuse the temptation to meet him, who shows up at his office and proposes --- as an aid to fighting the War on Terrorism --- that the Federal government fund a huge research project of his. For a modest $6.2 million annually over a ten year period, he will lead a team of 26 research assistants and 30 expert scholarly advisers that will help him sort out and statistically test the reasons why suicidal spitball attacks didn't occur in the interval between 1980 and 2004 in such countries where military occupations occurred --- note that prof bug is now looking at Pape's third appendix, on pp. 286 and 287
--- as Bolivia, the Dominican Republic, Ecuador (twice!), Sardinia, Peru (twice!), Slovakia, Canada, Ghana (thrice!), Catalonia, and yes, the USA
(No, prof bug isn't kidding. He hasn't suddenly lost his marbles. The last non-occurrence in this list involving the USA is right there in tiny print on p. 267 of Dying to Win: case no. 58 in Pape's data-set of democratic military occupations, "Native Americans vs. United States". Coded: 0 for years of violence, 0 for rebellion, No for suicidal terrorist assaults, "traditional vs. Christian" for religious difference, and "native languages and English" for linguistic differences. Did you know that such a military occupation had occurred? Do you remember it being reported in the media? Anyway, if Professor Pape can code and test things this way, so could our wholly imagined walking and talking statistical wonder, Professor Bernard de Stapler from the University of New Orleans, conjure up a data-set of his own for testing his theory of the occurrence and non-occurrence of spitball attacks in the War on Terror?)
Back in the White House, our arch New Orleans scholar is ready to toss in his ace-card as a clincher for getting his lucrative grant. Heaving his heavy briefcase onto the Security Adviser's desk, he rifles through it --- tossing suddenly a rancid banana peel into the wastepaper basket behind the Adviser's head ("Ha, still got the old basketball touch", he mumbles out loud)--- and spreads out a stack of print-outs full of tables and figures.
"Look here, my friend," Prof de Stapler says as he points to a spot on the 3rd page next to a large spidery coffee stain, "a preliminary pre-model correlation analysis of my likely data --- yes, just created the set on the flight in from New Orleans on the back of a napkin, then ran it on my SPSS program using my trusty Dell Notebook --- is set out here. Something of a pre-regression exploration of the data, ha! ha!. You know, nothing too ambitious for the moment; certainly nothing properly funded, ha!ha!."
As the scholar prattles on in a thick Louisiana accent, a wild-eyed glee begins to form on his face. "Here! Yes, lookie here at the Pearson-Correlation significance test results --- all two-tailed, mind you, Mr. Adviser, none of this blocked junk-stuff statistics at just one-tail that the weaklings go in for these days with chi-square usage! Horsefeathers! that's what their hitchy-koo mamma's baby-stuff statistics amounts to, the little "bastards!" Oops, pardon my French, Sir; I've forgotten my Deep-South manners, but really when you think of the sloppy cookbook methods they're taught in grad school these days, you can't but have you blood start boiling, can you?." Our statistical wonder pauses, giggling slyly to no on in particular. Then, with a gulp of air, he races on in boyish pleasure. "And yeah, my friend, here's a preliminary bootstrapping report of 16,000 N-repetitions with their confidence-intervals for my data-set set out, while the table that follows --- lookie carefully, Mr. Adviser, what a promising result! --- it documents the results of several competing test statistics . . . you know, like Manova with Pillai's Trace, Hotelling's Trace, Wilk's lamba
(Boy, I don't mind telling you that the initial Wilk's test-value gave me the Wilky-willies briefly, Ha! Ha! ---pardon my little pun, Sir --- until I bootstrapped another 13,333 times and adjusted the significance level from .05 to .45!), and . . . oh yes! not to forget that damn Roy's Largest Characteristic Root that I niftily calculated just as we were landing at the Washington airport, boy talk about bumpy arrivals, willya! and sure enough, would you believe it, just as I was ready to save the file, the plane bumped especially hard and I ended up spilling coffee on myself and some imbecile who was snoring very loudly, the jerk, in the next seat. Hell!
--- oops, pardon my French again, Sir--- you'd think from the way he got mad that I'd gone and spilled the stuff on purpose. And sure, some of the hot java trickled down to his crotch and forced him to wake up with a jolt, but I ask you, Sir, in all honesty --- no crossing your heart and hope to die stuff while your other hand plays hanky-panky with the truth behind your back ---would you react by throwing a punch at a peace-loving scholar like me just because your pants looked like you got through playing with yourself and you said you had to meet with the President in 10 minutes?
" I mean, hells-bell, you'd think . . . . "
"Uh, very good, Professor de Stapler," the alarmed Security Adviser says weakly. His finger reaches under the desk and presses the buzzer. Very hard. Any second now, armed Marine Guards will rush in and evict the madman from the White House.
While he waits, he makes a bad mistake.
Increasingly vexed, a little flame of anger building in his brain, he asks the full-tilt loony opposite him exactly
how many non-occurrences of spitball assaults he estimates have taken place in the United States during the 10 minutes that have elapsed since he entered the adviser's office.
The prof nods excitedly. He's too caught up in a burst of barreling enthusiasm to notice the Adviser's edgy sarcasm. Most professors are like that. In love with their own voices. Indifferent to others. All others idiots anyway. "A great to-the-point question, sir!" he says with bursting brio. "I love your stress on being exact
, a true budding statistician, that is what you are!" And then, with a charge of renewed hyperkinetic energy, he flips opens his Dell Notebook, waits a few seconds until his SPSS program loads up all the templates for calculating linear regressions, then hurriedly types in a sequence of complex equations.
"Gotta use linear estimations for this sucker, you understand. Some quadratic stuff that will have to be dealt with too no doubt, probably by use of a new Kalman filter-model I've been toying around with lately . . . you know, a 3-stage pre-estimation with the use of an imposed kxk transition matrix to ferret out any random baddies roaming around the data-set, ha! ha! Trust me, though, won't take long with this Dell here."
A few more seconds pass by, and then Prof de Stapler looks up and smiles devilishly at the astonished Security Adviser.
"I told you, didn't I, the sucker took only a half minute to estimate. Let's see now, according to these preliminary regression estimates, it appears that the reply to your right-on-target question is . . . hmmm, well let's just say that the calculated confidence intervals for the likely number are 13.3 trillion on the upside and 971 on the downside at a significance level of .05. And so give or take a few trillion in one direction and a few hundred in the other, I'd reckon at this stage of early investigation --- very much in the pre-grant period, you understand: no research assistants, no expert scholarly aid, no pretty secretaries to sit on your lap and take dictation, hee! hee! --- yeah, I'd reckon we're talking about a point-estimate of 3.17 trillion non-occurrences in all have materialized in roughly the last 10 minutes."
He halts a second, his face flushed with wild-eyed excitement. Surely, surely
, he's making the right impression. Suddenly he winks at the Security Adviser. It's for the little reference to pretty secretaries, some pleasant lap-sitting, all the guy-stuff that the Adviser will no doubt have appreciated.
"You follow me, my friend?" he quickly asks a second later and then glances at his wristwatch. "A good 3.17 trillion non-occurrences of spitball assaults have already occurred in exactly
the last 11 minutes and 31 seconds, give or take a few trillion in one direction and a few hundred in the other. Of course" --- here the prof chortles several seconds to no one in particular, followed by some loud snorts and choking sounds ---"the calculations were carried out with assumptions of asymptotic normality of the NSL estimator, and frankly you can never be sure about asymmetrical asymptotic estimations in such cases, can you? Ha! Ha! Not to worry though. No Sir! Lookie here" --- he points suddenly to a smudged corner of the Dell monitor-screen --- "as you can see clearly, Mr. Adviser, I've just corrected for these dubious assumptions by hurriedly applying quasi-Newton methods for minimizing smooth criterion functions to the data, and so I ‘m happy to report that we now have a more precise point-interval. Still in the pre-grant stage, of course; but more and precise all the same, and yep, getting promisingly robust if we apply the Engle-Granger representation theorem to correct for autoregressive distributed lag model-of-order."
He stops all at once and pulls out a bulk medicine bottle from his jacket pocket. "Forgive me, my friend," he says as he wiggles the bottle high for the National Security Adviser to see. "Doc's orders. Gotta take big swags of this stuff every hour or so to get through the day, night too for that matter."
All at one, his glaring look now outrightly murderous, the
National Security Adviser finds his nose overwhelmed by the wafting odor of Kentucky bourbon. Probably Old Turkey. Very rich stuff, his own favorite. God, he could use a snort himself; maybe even the whole bottle. Should he ask the loony for --- No! can't do that. It'd only encourage the . . .
"And so, as it turns out, Mr. Adviser," Professor de Stapler speeds on again as he wipes his mouth with his jacket sleeve and tucks the medicine bottle back in a pocket, giggling good-naturedly in the process, "the number of non-occurrence of spitball attacks by terrorists in the last 11 minutes and 31 seconds clear across our Great Country --- I mean, is it great or what? And take it from me, Sir: so too is New Orleans despite having a murder-rate 12 times higher than New York City; big big deal,
damn media! you'd think that --- oh oh, better not get me started on their case, my friend, just note that the new estimated number no longer adds up to a puny 3.17 trillion cases as my trusty old Dell SPSS program initially calculated. No siree! Rather --- are you ready, Sir? hold onto your chair, this stuff could floor you! --- it amounts to a cool 5.16 trillion cases."
His face lit up with manic, wild-eyed energy, the professor stops a second and jabs excitedly at the calculated point-estimate on his monitor screen.
"Just think of it, Mr. Adviser! A good 5.16 trillion non-occurrences of spitball terrorism have materialized in the last few minutes as we've been sitting here chewing the fat, and our citizenry and government act as if nothing important has happened. Can you believe it? Are the implications not frightening to you? I mean, innocence can kill, can't it? And that's not all; uh, uh, because . . . well, lookie at this result, it shows I'm tickled pink to say that we can be confident now that the margin of error around our spanking-new point-estimate has shrunk to just 13.29 trillion on the upside and 611.13 on the downside. That's at the 0.05 level of significance you understand, and besides, I --- "
More and more vexed and crackling with anger, the National Security Adviser struggles to resist the impulse to leap across his desk and strangle
the madman opposite who has just finished laughing out deliriously at a bad joke and is is now alternating between more wild prattling and the rapid click-click entry of new equations into his Notebook.
"Where the hell are the Marines when you need them?" the Security Adviser asks himself while his finger keeps pushing furiously at the button. Where, dammit? Where?
" his inner voice mutters out loud, not that Professor de Stapler notices it . . . no, how could he? he's caught up totally in scrutinizing the software report for a second-order autocorrelated error, all the while maintaining a comic hum to urge on his clicking fingers while he decides on the-spur-of-the moment to offset the identified normalization process-problem by deft use of the EGLS estimator and the even more clever application of the LI/ML method as a correction for its excessive error-terms that will then ensue. Always does ensue in his experience, doesn't it . . .that damn EGLS, you'd think the software makers would know what they're doing ---.
Alas, there are no Marine Guards on duty. The National Security Adviser just remembers that.
They've all been summoned suddenly to report for duty in Iraq or New Orleans, the latter with added pay for hazardous duty, while the Secret Service agents have all flown off with the President to meet with Jacques Chirac, Sean Penn, Barbara Streisand, the new Enron CEO, and the President's good chum, King Abdullah of Saudi Arabia, at his Texas ranch for a rollicking weekend of bicycling and backslapping. Good sport that Abdullah, he's promised to bring 83 ravishing beauties from his personal harem for the visit . . . some of them originally from Texas, it seems. Hmmm, how did Texas beauties aged 16 --- if Abdullah's personal secretary is to be believed --- end up in Saudi Arabia? The National Security Adviser abruptly remembers all this. He himself arranged the meetings last week, didn't he?
Jeez, what a tangle! What a mental relapse! Must need a rest, get away somewhere. Man, do I need to get away! Too damn busy to think clearly these days, listening to lunatics day in, day out, from Congress, France, New York, and worse of all from New Orleans right now. Talk about trouble! Maybe hanging out with some of the harem-denizens now cavorting somewhere in Texas would put him right again. Or maybe, if need be, with the penguins marching in Antarctica. Anywhere but here. Yeah, even at 75 degrees below zero, the marching would beat being locked up with the damn lunatic !
Seconds pass by, and then the
National Security Adviser scowls with wild-eyed intensity. Damn! Damn! his inner voice screams. Gotta get rid of this gibbering maniac right now, doesn't matter how!
Slowly, with the craft and stealth we always associate with our National Security Advisers, he stretches out his one good hand for the letter opener on the desktop. His gaze is homicidal now. It glares with rippling force at the stomach of Professor de Stapler just beyond the desktop edge to where the elated, self-absorbed statistician has leaned forward, the better to race his fingers up and down on the Dell keyboard.
And so of course the professor doesn't see the glare in the Adviser's eyes or the letter opener inching toward his gut. How could he? His mind is lost in a wonderland of more preliminary analysis of his data-set and the need for a new Notebook that can do 1 million iterations of piecewise pseudo-maximum likelihood estimation and the use of the Cochrane-Orcutt procedure for testing his new theory of 112 causal pathways that fully account for why spitball assaults don't occur at a frequency rate of more than 5.16 trillion times every 11 minutes and 31 seconds across the United States at the 0.05 level of statistical significance.
All innocence then, nothing but statistics and daydreams of lavish grant money churning in his mind, our New Orleans scholar doesn't notice that the point of the letter opener has now edged to just five inches from his bulging stomach and is tilted at a sharp upward angle for penetration.
But then who could blame Professor de Stapler for his lack of attention?
I ask you, who among us wouldn't be distracted by our jostling joyous thoughts on such topics as the relative merits of running the heteroskedasticity-consistent covariance matrix estimator
compared to the more robust but harder-to-interpret Gauss-Newton test for heteroskedasticity robustness
on 13 of the cases in his data-set that have been scribbled out illegibly on the front side of the United Airlines napkin . . . itself likewise stained with coffee splashes and what appears to the glaring eyes of the National Security Adviser to look very much like the hardened innards of a Twinkie Bar?
Would you be less inattentive? Would prof bug?
Come on, Answer Honestly! Would Any of Us?
"And lookie here, Mr. Adviser," the professor says finally with a new burst of harum-scarum eagerness as he flicks away some Twinkie crumbs and holds up an equation that he's just scribbled down on a corner of the United Airlines napkin. "See what I've already achieved! Yes sir, you won't be wasting taxpayers money funding my grant-request when you take note of how ---"
The silver blade has by now been drawn backward a half foot off the desk for a better lunge. The right hand of the Adviser is holding its handle very tightly. All the while his mind's weighing the pros and cons of a few months in prison for justified manslaughter of pedantic bores as opposed to sitting still and waiting until the imploding jabber runs itself into the ground and the squirrelly loony-bin type leaves.
His mind comes to a decision. In the end, he does what all of us would in his position.
"--- Yes, I can assure you, Mr. Adviser," our totally elated scholar says with a burst of renewed eagerness, "that when you take note of how with just a little more work and, yes to be honest, the help of 26 research assistants and 30 expert readers (all obtainable, please understand, for just 6.7 million bucks each year for the next decade, expenses and all: I mean, is that a bargain or what?), my pioneer study will be able to predict with an 85% rate of success all the non-occurrence of spitball assaults in more specific places like Dishman, Washington and the southeast corner of the Dakota Badlands in any time interval you want, yes sir, no way you'd be wasting hard-earned tax-payer dough on . . . Oh my God! No! No! What are you doing? Have you gone crazy? Help! Help! I've been stabbed in the stomach by a madman!"
TIME NOW FOR OUR LITTLE FABLE TO END AND FOR US TO GET REAL AGAIN
There's a moral here, believe it or not. A good three in fact, and please be attentive to each and every one.
First, our story is wholly fictional: Professor Bernard de Stapler is a funny if totally imagined guy at the University of New Orleans, and Professor Robert Pape is a serious, widely respected scholar a the University of Chicago. That's how it should be. And as a serious, respected scholar, Prof. Pape deserves to be treated seriously, period. Except for a misuse of statistical work, there's no similarity between him and the fictive Professor de Stapler at all. .
Secondly, as far as prof bug knows, Professor Pape has never, ever, published a study of terrorist spitball attacks and run regressions to estimate their frequency rate and their causes . . . either in Continental USA or with Alaska and Hawaii tossed into the data-set and whether at the 0.05 confidence level or 0.45 level.
And thirdly, for all that, a gum-chewing teeny-bopper who's totally ignorant of anything political about the world other than one fact --- suicide terrorism is very rare --- could totally match Pape's statistical results in one way and come close to matching them another way.
For a start, he could bet his entire gum-allowance for the week that he'd be able to match Pape's logit model's "positive predictive value" (PPV in logistic jargon) by flipping a coin 100 times and calling heads always, That would equal Pape's reported success-rate in his upper left-hand cell. (Pape's actual PPV might even be lower: the 2x2 classification has been reorganized by Pape in such a way that you have to conjecture what the actul PPV is, the same being true about "false positives" and "the sensitivity" of prediction . . . all matters discussed at length in Part Four today.)
Note, though, that this particular betting strategy would require that the teeny-wonder would have to have some prior knowledge: in particular, he'd have to be informed of the predicted outcomes that would have been estimated at the outset of Pape's logit modeling by the null or baseline logit model . . . the initial one with only the intercept or constant term.
Is this unfair? No, far from it. That's how any logistic regression generates its predictions for every logit model with independent variables added: it sorts out the data in a preliminary manner by a complex process of iteration and "predicts" group membership --- in Pape's data-set, which of the 58 cases that the fitted logit model will be processing one at at time will likely fall into the group-category of Y =1 (suicide terrorism occurs) or the opposite group of Y = 0. These are the "no regressions" predictions that could be assumed to be due to "random chance." After the regression is run and all 58 cases have been processed, the resulting classification table reported by logistic software will produce a 2x2 table that compares the "observed" frequency rate of suicide terrorism --- whether it occurred or not for each case --- with the original "no regressions predicted rate." And so once our teeny-wonder knows this, he can achieve a probability rate of positive predictions (or the PPV) equal to Papes' 2nd logit model by simply doing his coin-flipping trick.
Enter the second betting strategy.
Here the teeny-bopper could actually outdo Pape's logit-modeling, believe it or not, by knowing only one substantive thing about the world: aside from some Islamist countries like Iraq under terrorist attack daily, suicide terrorism is rare in the world
. That's it. Knowing just this little fact and absolutely nothing else, our teeny-here could now risk betting his tequila-allowance for the week by predicting that suicide terrorism would not
occur 100% of the time when Pape's 2nd logit model is run on his data-set. Yep, he need simply call "none of the time " in advance for each and every one of the 58 cases in Pape's his data-set, and you know what would happen?
He would achieve an overall rate of predictive success of 85% ! That's for both positive and negative successful predictions (PPV) and (NPV) combined . . . overall percentage correct!
Our teeny-wonder would do this, to clarify briefly, by being right 100% of the time for 49 of the 58 cases in Pape's data-set. Remember, in 49 cases of the total, suicide terrorism just wouldn't occur. He would, oppositely, be wrong only 9 times. That his predictive rate fails only 9/58 times or 15.5%. Subtract that 15.5% from the 100% success his predictions had for 49 of the cases, and voila, what a miraculous statistician-in-the-making our teeny-bopper turns out to be.
Compare that with Pape's observed rate of "predictive success". We don't know it exactly, mind you, because Pape would have had to report the original 2x2 classification table that would look like this:
|Observed || Predicted |
|Suicide Terrorism Yes ||Suicide Terrorism No ||Percentage Correct |
|Suicide Terrorism Yes || || || |
|Suicide Terrorism No |
|Overall Percentage || || || |
So not getting the original classification table --- which presumably used a cut-off rate of .50 --- we are left conjecturing, but not wholly. Look back at Pape's reorganized "predictive success", and you see a mediocre success rate in the upper left-hand cell of 50%, but that's not the PPV: the 7 successful cases have to be divided by either a misclassified 29 cases(the column, 14 + 15) or by a misclassified 23 cases (the row, 14 + 9). If the column reflects the proper misclassification (yes/no), then Pape's PPV is 7/29 or a flagrantly miserable success rate of 24%. If it's the row cases that reflects the misclassification, the success rate climbs to a slightly less mediocre 30%.
The misclassified "no" cases are then either . . . well, you see, here's the problem mentioned before. There is a zero-cell defect in the bottom right cell, but we'll ignore it, and estimate that the NPV is either 20/29 (69%) or 20/35 --- or 57%. Averaging the successful rates for PPV and NPV, Pape's reported 2nd logit model achieves a staggeringly cruddy rate of somewhere around 35-40%! Random chance did better in Pape's null or intercept-only model without the estimators, and a teeny-bopper calling heads over the 58 cases would have done better too.
As it is, though, our teeny wonder did even better!
Knowing only that suicide terrorism is rare in the world, he achieved a staggeringly impressive prediction rate of 85%.
Hurrah! Hurrah Teeny-Wonder
And who knows? Rewarded munificently by prof bug for his fantastic crystal-ball work, our teeny-wonder --- who could of course be a female (many of prof bug's best graduate students in statistics have been women) might be motivated enough to work hard in high school and beyond and so eventually get a job as a logistic-regression expert in the political science department at the University of Chicago.
Come to think of it, he might be able by then to warn Professor Pape about the dangers of a zero-cell defect in logistic regression and about pretending that a barely reported logit model has done wonders when we're not even interested in the "non-occurrence" of suicide terrorism. Right now, there's none going on outside prof bug's study, and he's not expecting any for a long long time.
Back to the logit-crashing Professor Pape.
He might learn from our teeny-wonder something else that's important: lopsided outcomes in a 2x2 classification table of "predictive success" for a logistic regression are precisely the reason why so many statistical theorists are skeptical that it is a reliable indicator of logit modeling performance. As we'll see, even health-specialists are wary now that their "predictor" models have been producing unreliably optimistic results, and some urge careful internal validation techniques -- jackknifing, cross-validation, split-samples, and bootstrapping of the de Stapler sort --- to tone down the results. But why wait until Part Three to grasp this. You already know it. Witness our little fable starring Professor Bernard de Stapler and his ingenious breakthroughs in predicting the number of non-occurring spitball terrorist attacks that had materialized in the United States during the span of just 11 minutes and 31 seconds of rewarding conversation with our slasher-National Security Adviser.
(Professor de Stapler, you might be happy to know, was rushed to the hospital after the attack, swiftly knitted up, and soon recovered, making himself quickly into a terror for the young nurses attending him before he was dismissed. Armed with a new grant request --- sent this time to the Department of Defense with assurances from Vice President Cheney that he, for one, sees the de Stapler project as especially promising --- he has returned to the University of New Orleans and is currently very busy fine-tuning his statistical study of non-occurring spitball terrorism by altering his linear regression equation with the help of a self-designed Kalman filter and the uncanny use of a k x k transition matrix on any k-dimensional random variables wandering around in his data-set.
(He's a very happy man, our funny-guy prof. Come on, who wouldn't be in his shoes? Already, four of his loveliest female students have volunteered to become his personal traveling secretary when his $60 million grant gets the final approval of Donald Rumsfeld, and he has recently emailed to the envious prof bug that he has carefully tested each and every one for extensive lap-sitting. Then, too, the confidence intervals around his point-estimates have substantially improved since his visit to Washington D.C. One day soon, he expects that they will narrow from 13 trillion cases at a 0.05 confidence level to about one half that interval.
(As for our former National Security Adviser . . . well, the last prof bug has heard, he was sentenced to six months psychiatric therapy in a straitjacket and then six months more without it, though with a chain-and-ball wrapped around his feet. Once he's certified sane, his future will markedly improve. Among other things, he has already been publicly awarded a full governmental fellowship to earn a Ph.D. in the University of Chicago Political Science Department, provided he reports weekly to his parole officer and shows, with unambiguous test-results, that he has been making steady progress in the use and interpretation of statistical techniques.)
For now, though, we've said enough. The rest of Pape's statistical ship-wreck will be dealt with in Part Four below, and at length.
A Sidebar Technical Clarification: How Logistic Regression Software
Calculates Predictive Efficiency and the Relevance to Pape's Logit Modeling
(i.) A Puzzle Needs to be Solved Here
Namely? Well, when students start learning logistic regression and they see how the software --- SPSS, SAS, Stata, and so on --- produces a 2x2 classification table, they're usually puzzled how the "predicted" columns are estimated in order to compare those predicted outcomes of group membership with the actually "observed" results of classified group membership set out along the rows of the table.
To clarify, return to the hypothetical 2x2 table for Pape's 2nd logit model set out above . . . the one that we don't see on p. 99, the table there a reorganized one by Pape himself.
In that hypothetical table, the "predicted "data in the columnar cells are the estimated group membership for Y = 1 ( suicide terrorism is predicted to occur) or Y = 0 for the opposite. In a fitted logit model --- say Pape's 2nd model --- the "predicted" group membership is estimated with only the intercept, and then again in different steps as the estimating or independent variables as added: nationalist rebellion, religious differences, and an interaction term for both . . . these variable themselves, recall, dummy or binary variables. .
In Pape's 2nd logit model, for instance, this would mean, say, that before case 21 or 22 is actually processed on the estimating variables, the logit model will predict in advance whether each case will show up on the dependent variable as Y = 1 or Y = 0. It does so by using a default cut-off value of 0.50 and predicts that case 21, for instance, will have a probability associated with suicide terrorism of 0.51 and classify it as belonging to all cases associated with this behavior that it predicts as having a probability of 0.50 or higher. It then compares this prediction with the actual observed result (suicide terrorism turns up or not). Overall, the resulting predictive efficiency or success --- which will also generate the 2x2 classification of group membership for the 58 cases in his data-set --- needs to be compared to the actual observed classification of the 58 cases.
(ii.) Agreed, Not the Clearest Explanation So Far. A Clearer One Requires Us to Get Slightly Technical.
Specifically, to stay with Pape's model, any logistic regression software package will predict the "expected group membership" of, for instance, case 21 and case 22 by using the default cut-off value of 0.50 --- unless the researcher follows good advice and tries different cut-off points --- and generate predicted probabilities for each of these and the other cases in his data-set that totals 58 in all.
By a complex process of iteration, the software might predict, for instance, that case 21 will have a probability of 0.50 or higher of scoring 1 on the behavior of the dependent variable (suicide terrorism occurring) and classify that case in the category of "yes or suicide terrorism occurs". That produces predicted group membership for case 21. Alternatively, by continuing its iterative procedures, the logistic software might then predict that case 22's probability being associated with suicide terrorism is less than 0.50 of scoring 1 on the behavior of the Y variable and classify it in the opposite category where all predicted cases of "no suicide terrorism" are grouped.
Enter the actual observed behavior of Y, the dependent variable, when cases 21 and 22 are actually calculated on Pape's data-set. These will give you the "observed" results. The resulting predicted scores (1 or 0) for those and all other cases are then cross-classified in the 2x2 table with the observed outcomes for actual group membership. The 4 cells in each 2x2 classified table us to compare the "predicted" cases of group membership with the "actual" or "observed" results of group membership in percentage terms.
And it is these percentage terms that are reported as "percentage correct", "sensitivity" of prediction or classification, "specificity", percentage of false-positives and false-negatives . . . or oppositely, the PPV or positive predicted value and the NPV or negative predictive value. (or the opposite NPV). .
(iii.) Interpreting the 2x2 Classification Table
"Since there are four cells in such a 2x2 table, a "highly accurate model would show that most cases fall in the cells defined by 0 on the observed and 0 on the predicted group membership, and by 1 on the observed and 1 on the predicted group membership. Relatively few cases would fall into the cells defined by a mismatch of observed and predicted group membership. A simple summary measure equals the percentage of all cases in the correctly predicted cells. A simple summary measure equals the percentage of all cases in the correctly predicted cells.
"A perfect model would correctly predict group membership for 100% of the cases; a failed model would do no better than chance by correctly predicting 50% of the cases. The percentage of correctly predicted cases from 50 to 100 provides a crude measure of predictive accuracy." (The quote is from Fred C. Pampel's excellent little book, Logistic Regression: A Primer (Sage, 2000), pp. 50-51.)
(v.) Note Next How
logistic regression theorists, Pampel too for that matter, tend to downplay the importance of classification results --- which Pape likes to call "predictions".
For one thing, there is widespread disagreement as to how test for the overall predictive success in the 2x2 classification table. Scott Menard, as we noted earlier, discusses at least four different measures of testing the strength of the relationship for the tabular data. He argues that a prediction table differs from a classification table, and both differ from a selection table, and in his view, therefore, a different statistical measure is needed for each . . . even as he notes that there is still disagreement on these measures. Even the more complete description of classification accuracy --- the use of the ROC or Receiver Operating Characteristic curve, which we won't delve into here --- is subject to disagreement among logistic regression specialists.
For another thing, the cut-off value of 0.50 for classifying cases as positive (Y =1) or negative (Y=0) can be very misleading. For instance, one of Pape's cases might be predicted as equal to 1.0 (100% success) when its actual probability is 0.51, and the next case be predicted as 0 (total non-success) even though its actual probability is estimated to be 0.49. Particularly in a small sample-size like Pape's, a few cases of this sort can produce huge distortions of classified group-membership. At a minimum, then, as Wuensch's table set out earlier showed, it's very useful for a researcher to apply different cut-off values and compare the resulting classified tabular data from one cut-point to another.
For a third thing, as our teeny-wonder showed, an observer or any logit model for that matter can do better than 50% by simply predicting that all cases will fall into the category that subsumes the largest number of cases. The greater the gap between the distribution of 1 and 0 cases in the data-set, the greater the lopsided nature of the resulting estimated and correctly "predicted" cases when it comes to classifying their membership. In Pape's data-set of 58 cases, there are only 9 cases in all where suicide terrorist groups emerged, and obviously that will produce a lopsided classification heavily favoring the "0" outcomes on the dependent variable. Our teeny-wonder achieved such marvelous results, remember, by always calling "0" or no suicide terrorism for all 58 cases --- an astonishing 85% rate of predictive success. Pape's logit model, by contrast, never comes close to that success.
Finally, even those like Scott Menard who take predictive efficiency fairly seriously agree that a 2x2 classification table is not a good measure for overall logit model performance or efficiency. In fact, it's quite common for a much more accurate test of model performance --- say, Hosmer and Lemeshow's goodness-of-fit test --- to come up with impressive results while the classification results are mediocre. And for that matter, vice versa. That's why Hosmer and Lemeshow, for instance, say explicitly on pp. 159 and 160 that the only real interest of a 2x2 classified output of the sort that Pape boasts about on p. 97 of Dying to Win is when the researcher is explicitly interested in "classification . . . as a state goal of analysis. Otherwise, it should only supplement more rigorous methods of assessing [model] fit" or performance. Fred Pampel, whom we quoted earlier, agrees, and the same is true of J.S. Cramer, the author of Logit Models From Economics and Other Fields
There are scarcely any logistic regression specialists who disagree on this point. To drive it home, neither the Cramer book nor Hosmer's and Lemeshow' Applied Logisitc Regression (2nd edition) even use the terms "predictive success" or "prediction model". For them, the 2x2 classification is that --- classification, and nothing else.
iv.)A Query Rears Up at the End Here: Is Pape Aware of . . .
. . . all these complexities and drawbacks to reporting his 2nd logit model's "predictive success" in a 2x2 table on p. 99?
Most likely not. He might not even care. He doesn't seem to prof bug to be an introspective person, just the opposite. What seems to count for him is to contrive data-sets that whitewash the towering near-monopoly role of Islamic groups in suicide terrorism after 1980, to then further his laundry-job by mirror-illusion presentations of the data, and --- so it also seems --- by trumpeting the alleged success of his 2nd logit model that is as big an illusion as the data-set it's run on. Or so it seems to prof bug.
PAPE'S DATA-SET SUBSTANTIVELY VIEWED: A FIRST CUT
To repeat what we said earlier: yes, the numerous technical problems with Pape's data-set --- its small size, the lack of independence in its observations or cases, the almost certain failure for him to have run internal validation techniques like bootstrapping or jackknifing or split-sampling to remove biased estimates of the dependent variable --- are important, just as are the various technical problems with his logit modeling that we'll set out later today.
Still, what counts the most in examining a statistical researcher's work --- whether linear regression, non-linear regression, or what have you --- is the substantive quality of the data-set or sample selection that he or she uses. And on this critical matter, Pape's logit modeling fails blatantly, just as, in the process, it manages to further whitewash and even conceal the over-towering dominance of Islamist groups in suicide terrorism between 1980 and 2000. There were, as it turns out, 19 suicide terrorist groups in all that were active in this period, and not just 9 as Pape claims.
And of these 19, 17 were Islamic groups --- 89% in all. Or so we'll see in Part Two today.
Does Any of This Matter? Prof Pape Covers His Behind
For all the claims and other hoopla that Pape uses to pomade his statistical work and how it "enhances [our] confidence" in his theory's "causal" pathways, he sets out some weirdo wiggle-room excuse-making on the very last page of Chapter 6. It's a typical case of wanting to have your cake and eat it too --- or, in less polite society, of covering your ass with evasive escape-clause fine-print in advance. And the paragraph where the disclaimers appear --- set out with "on the one hand" but "on the other hand" hemming-and-hawing --- is a wonder to behold.
Well, after claiming wrongly that his statistical tests show that at least 95% of the time between 1980 and 2003 suicide-terrorism has confirmed his nationalist theory of its alleged causes --- a boast, by the way, that he recently repeated on national news last month --- Pape, possibly informed at the last minute as we've just speculated exactly how mediocre his reported 2nd logit model's results were, tells us on p. 101 that
"although [his statistical] findings give us confidence that future cases [of suicide terrorism] are likely to follow a similar pattern, we should not overread the evidence[italics added]"
Why be so cautious all of a sudden? Well, you see --- so Pape tells us in the next sentence --- because suicide terrorism is a "relatively recent phenomenon" . . . a head-spinning statement, by the way, that is blatantly contradicted by his own earlier analysis on pp. 11-12 and again on 33-35, showing how it does that suicide terrorism flourished in the ancient world and on into the Middle Ages and past. Never mind. No matter. As it happens, contradictions galore mark much of Pape's work. So, since suicide terrorism is now said to be "relatively recent . . .
"we cannot rule out the possibility that future terrorist organizations would succeed in carrying out suicide campaigns even when the religious differences between the foreign rulers and occupied community is narrowed than in past cases."
This Is An Extraordinary Wiggle-Room Clause, No?
Consider its bizarre out-of-the-blue character.
Nothing, but nothing,
Pape has said throughout chapter 6's analysis --- or for that matter in earlier chapters --- prepares us for this hedging proviso on p. 101. More bizarrely still, after conjuring up a flagrantly flawed data-set that ignores 10 suicide terrorist groups of Islamist nature that attacked civilians or governmental officials galore in dictatorial Muslim countries between 1980 and 2003 --- a larger number of such groups, remember, than figures in Pape's erroneous, whitewashing data-set itself --- Pape ignores several pages of statistical work that he had just finished extolling on p. 99 and now tells us that "future terrorist organizations" might not adhere to his tested nationalist theory's causal pathways after all.
The bizarre derriere-covering is all the more intriguing if you consider when it was likely added to Pape's manuscript.
Start with the years of suicide terrorist activity that his book deals with --- 1980 through the end of 2003. That doesn't mean, though, that Pape's book was ready on January 1st, 2004. Most likely, he wrote the final draft of his book's manuscript in 2004 and maybe even into early 2005, finishing just before its final publishing-run. And what was going on in 2004 and 2005, specifically in Iraq, while Professor Pape was expounding his nationalist theory in his manuscript? Well, fanatical jihadists --- or, in Pape-language, "nationalist altruists" --- were blowing up around 20 to 25 Iraqi Muslims daily in order to assert (to use Pape-speak again) their legitimate nationalist urges toward self-determination. And not just in Iraq, mind you: but also in Indonesia, Turkey, Egypt, Morocco, Algeria, Tunisia, Saudi Arabia, Afghanistan, and Malaysia.
And so, faced with these rampant attacks entirely at odds with his entire nationalist theory of suicide terrorism aimed at democratic military occupiers, what can we speculate that Pape finally did?
In All Probability
In all probability, this: very late in the game --- remember, most likely during the galley-proof stage of his book --- he suddenly got cold feet and decided to add a flurried touch or two of reality, however tentatively and hurriedly, to his manuscript's argument. After all, how could anyone ignore all the explosions going off around the Muslim world courtesy of jihadist suicidal zealots acting in their raw, imploding ways, rather than thanks to Pape's nationalist freedom-fighting altruists Kabooming democratic oppressors? Apparently, in the galley-proof stage, not even Professor Pape himself any longer. No!No!
; too late for a total flight from reality. And hence we are left with his have-it-both-ways rear end-covering set out in haste on p. 101.
Then, too, something else might have pushed him to add some rushed second thoughts there. Namely? Well, when you get down to it, could Pape be really sure --- prof bug means, really really sure
--- that Dying to Win
might not be read by someone who actually knew a decent thing or two about the Middle East, Islam, international relations, terrorism and yes, even about logistic regression, and found his book and its lengthy and contorted argument little more than a spun-out apologia for stark bursting jihadist terrorism?
Sure It's Speculative, But Then . . .
. . . You tell prof bug what you think motivated Pape here.
Remember, even as Pape swiftly and unexpectedly inserts this self-denying disclaimer of theoretical and statistical modesty on p. 101 of Dying to Win
, it figures absolutely nowhere else in his book . . . starting, as it happens, on p. 102 with his extensive excuse-making for al Qaeda in Chapter 7. Not only does the disclaimer not appear anywhere in that chapter's argument, it has disappeared totally from sight through Chapters 6 through 12 and on into the boonies of the appendixes.
The self-denying proviso hasn't made a brief appearance just once on p. 101 of Pape's book to then disappear entirely from the rest of the book into the shadowy barrens beyond human eyesight. It's also totally vanished, as it happens, from Pape's media appearances since his book's publication earlier in 2005.
No modest self-denial evident there in these Telly and other media interviews; just the opposite . . . the very opposite. Materializing on national television just last month, Pape repeated the nonsensical claim that his statistically tested nationalist theory was so sound that it could successfully predict a good 95% or more of future suicide terrorist kaboomings. A transient case of errant forgetful on Pape's part as to what he said on p. 101 of Dying to Win
? Not so. No way. Earlier in July, he also failed to mention any modest qualifiers to an Australian journalist who was interviewing him fawningly at the time, and apparently this has also been the case for several other Telly jaunts for Professor Pape that prof bug hasn't had the opportunity to watch. Still, five-will-get-ten that the having-it-both-ways modesty-act on p. 101 of Dying to Win
didn't show up in those interviews either.
Anyway . . just in case any of you have a better conjecture to offer by way of explaining the sudden appearance and then vanishing act of Pape's surprising modesty on p. 101, please be sure to post a comment at the end of this article on your epiphany.
Prof bug himself will go on speculating that Pape's slippery cover-your-behind on that page seems a flurried, last-second bit of afterthought stuff, and nothing more. . . and in that respect, looking exactly like the salvation estimating-variable that also emerged out of nowhere in Pape's 4th logit model mentioned for a few lines, and no longer, on p. 99. Consider their similarity up close. Both of these tardy, sprung-out-of-nowhere