Sunday, February 06, 2005

Temporal Trends (?) in Appellate Review of Daubert Decisions

In this post, we will: (1) present a chart tracking cumulative federal appellate decisions affirming or reversing rulings on expert evidence since 1/1/2000; (2) explain how the chart was compiled; (3) speculate on what it might mean (perhaps nothing); and (4) invite further (i.e., more serious and systematic) research.

(1) The chart



Here is a PDF version of the same chart.

(2) Explanation of the chart

When we launched our parent site, Daubert on the Web, in early 2002, we began compiling cumulative statistics on affirmance rates for a sample of cases intended to comprise all federal appellate dispositions since January 1, 2000, published or unpublished, addressing the admissibility of expert testimony. In particular, we have tracked the rate at which federal appellate decisions:
  • affirm the trial courts' evidentiary rulings in general (the overall affirmance rate, represented in the chart by the yellow line in the middle, labeled "O" in the legend);
  • affirm lower court decisions admitting expert testimony (the admissibility affirmance rate, represented by the dark blue line at the top of the chart, denoted "A" in the legend); and
  • affirm the trial courts' rulings excluding expert evidence (the exclusion affirmance rate, represented by the pinkish line at the bottom, labeled "E" in the legend).

What the chart appears to show is that the likelihood of appellate affirmance is increasing over time for evidentiary decisions in all three categories.

The methods used to allocate appellate decisions to the listed categories are explained here. Briefly, affirmance rates can be analogized to batting averages. If a district court decision on admissibility is affirmed, we have a "hit." If it is reversed, we have an "out." If the game is called on account of rain during the district court's "at bat" -- e.g., if the appellate court does not reach the lower court's evidentiary decision -- then the decision isn't counted at all. It follows that if the district court's decision is never appealed, it is not in the sample. Note that outcomes are counted expert by expert, not case by case. This matters because one case may involve several experts. Note also that the sample tabulates affirmances or reversals of evidentiary rulings, not judgments. Thus if a trial court's decision admitting expert testimony is held to be erroneous, but harmless error, the outcome is counted as a reversal, not an affirmance.

These figures are maintained for each individual circuit, and also for all circuits taken as a whole. In retrospect, it would have been wise to tabulate subtotals over regular temporal intervals, so that there would be figures, e.g., for each year. Unfortunately, we lacked the foresight to do that. Thus the data points in the chart represent not the affirmance rate prevailing at any fixed point in time, but rather the cumulative affirmance rates as of the relevant date, for all decisions since January 1, 2000. That should mean that the magnitudes of any true temporal trends are understated in the graph, in the following sense. If, e.g., the cumulative overall affirmance rate has climbed from .843 (311/369) as of 4/9/03 to .872 (539/618) as of 2/5/05, or by about 3%, then the affirmance rate for the interval between those two dates will be higher than the terminal cumulative value of .872. In point of fact, for the period between those two dates, the overall affirmance rate will be .916 (228/249), or about 7% higher than in the earlier period.

Because the site is updated every few days to reflect new decisions, and because we are lazy and have a hard disk of only finite size, we do not store backups for every temporal instantiation of the site, and so we do not have daily figures for such cumulative totals. The data points in the graph were obtained by recourse to the Wayback Machine. The graph was charted on an Excel spreadsheet.

(3) Interpretation of the figures

(a) Caveats. Before trying to interpret any temporal trends, we should discuss some of the reasons to doubt their reality. There is a significant element of judgment involved in determinations that an appellate opinion reverses or affirms a district court's evidentiary decision. Some district court decisions exclude testimony only in part; some appellate rulings affirm or reverse only in part. Nor is it always clear whether to count an appellate decision as even pronouncing on lower court evidentiary rulings. One example is presented by the occasional decision involving a petitioner for habeas corpus who claims that his defense counsel's failure to raise a Daubert challenge in the underlying criminal proceedings constituted ineffective assistance. It might seem obvious that such challenges do not necessarily embroil the courts in de novo consideration of whatever Daubert issues might have been posed in the underlying trial, but some district court and appellate decisions have seemed to approach the issue in those terms, perhaps on the theory that if the possibility of a valid Daubert challenge could be ruled out, then no constitutional issues need be reached.

Having done the counting ourselves, we will represent that such uncertain cases constitute a very small fraction of the sample. Although we are not statisticians, we doubt they are of sufficient number that vagaries in their categorization would be responsible for trends of the dimensions suggested by the graph. Nevertheless, it should be noted that one individual has done all the categorizing for this sample, under rules that have not been tested for inter-rater reliability, and it is possible that any apparent trends in the data merely reflect the unconscious evolution of his own tacit methods for resolving ambiguities.

What cannot reasonably be contested is the representativeness of the sample, which includes 100% of all instances of federal appellate review during the relevant time period. (Actually, it is possible we have missed a decision or two, but we will represent that if we have, their number would be very small, and we can think of no reason why there would be any tendency for missed decisions to fall into one category more often than another). The sample, by now, is also of reasonably substantial dimensions (appellate review of evidentiary decisions on 618 experts). To repeat, we're not statisticians. But the sample seems large enough that major trends are not likely the product of random variation.

(b) Hypotheses. The decisions in the sample include both criminal and civil cases, and involve a wide variety of fields of expertise. We have not attempted to evaluate whether the trends toward affirmance are an artifact of the mix of cases, but for example, if criminal cases were found to represent an increasing fraction of the overall total, then that by itself would be one very convincing explanation for a substantial increase in admissibility affirmance rates (since forensic testimony is admitted, and its admissibility is upheld, at a very high rate in criminal cases). It would be a far less plausible explanation, however, for the similar upward trend in exclusion affirmance rates.

We also have not attempted to analyze (because we doubt that these raw data would permit us to analyze) how the trends in affirmance rates may relate to any trends in admissibility rates. One interesting feature of the graph, however, is that the gap appears to be narrowing, over time, between the rates of affirmance for decisions admitting expert evidence and decisions excluding it. Decisions reversing the exclusion of expert evidence have consistently been more numerous, in relative terms, than decisions reversing the admission of such evidence. That trend has been evident since we first started compiling the data, not only for federal appellate decisions as a whole, but also at the individual circuit level. Nevertheless, affirmance rates in these two categories have apparently begun to converge.

One possible explanation for the trends might be that appellate courts have grown more lax in their oversight of Daubert rulings over the past five years. It is inherently difficult to measure something like the prevalent level of stringency in appellate review, and so we may be left with practitioners' intuitions. For ourselves, we have read each decision in the sample, and that process has not left us with the impression that appellate review has been growing more lax.

One other possible explanation might be that as the corpus of appellate decisions under Daubert and Kumho Tire has grown, district courts have enjoyed more guidance on the application of the principles embodied in those decisions, and are therefore better able to judge the outer bounds of their judicial discretion. That hypothesis might suggest that Supreme Court opinions articulating abstract factors pertaining to admissibility are less useful, in guiding specific rulings, than the now substantial body of intermediate appellate decisions mapping recurrent fact patterns into outcomes.

(4) Further research

To urge "further research" is actually something of a misnomer. These data were not originally collected or structured with an eye toward testing any hypothesis. We have simply charted some data that happen, fortuitously, to be at our disposal, at a level of rigor that not only fails to rival that prevailing in the natural sciences, but which is also uninformed by any statistical analysis or techniques going beyond mere tabulation. We have decided to post the data because for all the caveats and uncertainties, we are left with the firm impression, when all is said and done, that they probably do reflect something real about the judicial world, though we are unable to say what.

Deeper analysis would involve a more systematic protocol for data collection, some inquiry into the "mix" of decisions in the corpus, a finer temporal mesh, and perhaps some comparative analysis between circuits, or between federal and state decisions. Such a project would be ambitious. We hope someone undertakes it.


2 Comments:

Anonymous writes ...

Actually, there will be a bias: missed decisions are much more likely to be affirmances. Affirmances are less likely to make reference to the Daubert decision than reversals are and "Appellants' other arguments do not merit consideration"-type statements won't trigger the Bat-Daubert-Detector at all.

This may or may not be of great significance, but whatever signficance it has will be magnified in inter-circuit comparisons, both because of the regional differences in opinion-writing and the smaller sample sizes.

As I've mentioned before, I'm skeptical that batting averages prove anything. I think this is especially true over time, because the universe of Daubert decisions meriting appellate review will change over time as the caselaw develops.

Best,
Ted Frank

11:25 PM  
pn writes ...

Ted Frank's excellent point is valid in the abstract, but I am skeptical that it is actually a significant factor in empirical reality. One test, I suppose, would be to define some sample of district court decisions involving the admissibility of expert testimony, and then to survey the appellate dispositions in those cases, to see how frequently the circuit opinions fail to mention expert evidentiary issues explicitly, and then to investigate the briefing, to see how often the issues were raised on appeal in the first place.

Pending such a test, my own sense, from reading no small number of cursorily reasoned unpublished opinions, in which legal issues of all stripes can admittedly receive breathtakingly short shrift, is that in cases implicating expert evidence at all, even the most desultory of appellate analyses will commonly pause to make at least some mention of Daubert, or Kumho Tire, or Rule 702, or some other term of art that would normally trigger the great Daubert Detector here in the Bat Cave (always assuming that Alfred is not asleep at the switch).

But I could be wrong, and appellate opinions of the variety in question -- "We reject the remainder of appellant's arguments as without merit, including various arguments about expert evidence of which we make no explicit mention here" -- could turn out to be legion. (Call these the "kitchen-sink clause" opinions, or KSC opinions for short.) In that event, the data for non-KSC opinions would still show the temporal trends they do, and those trends would still remain to be explained. It is hard to see what role KSC opinions would play in such an explanation. It is counterintuitive to suppose that KSC opinions would be growing any less common in Daubert cases as time marches on. If anything, we might intuitively expect them to be growing more common, as the evidentiary law grows more settled -- in which event the trend toward affirmance would actually be more pronounced than the data from the non-KSC opinions suggest. Of course, our intuitions about this should receive no weight, if the data negate them. But we know of no data that do, or which address the issue at all. Counterintuitive hypotheses unsupported by data are perfectly fine things, not only for purposes cocktail party chatter, but also because a healthy skepticism in the face of statistical "information" is to be applauded. All the same, at some point it would do more, by way of advancing the cause of human knowledge, to go forth and gather the data to test the hypotheses (which is precisely what one is proposing, but which one cannot do all by oneself).

As Ted Frank meanwhile suggests, one potential explanation for the temporal trend toward affirmance might be that the mix of decisions up for review is changing over time. But to suggest this hypothesis scarcely seems a reason to reject further empirical analysis. Quite the contrary, that hypothesis seems to me to be worth testing empirically, and to be very capable of empirical investigation. It might indeed turn out that any significant statistical variation in affirmance rates over time is entirely attributable (say) to changes in the relative proportions of testimony on given topics or from given fields. That would be an informative thing to know. It would likewise be informative to learn (if inquiry did indeed reveal) that the increased corpus of federal appellate guidance on the admissibility of expert evidence under Daubert and its progeny had no measurable effect on the propensity of district courts to commit reversible error.

1:50 AM  

Post a Comment

<< Home

Fed. R. Evid. 702: If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise, if (1) the testimony is based upon sufficient facts or data, (2) the testimony is the product of reliable principles and methods, and (3) the witness has applied the principles and methods reliably to the facts of the case.