As mentioned in my last post, I want to continue the discussion of plausibility with the recent evidence suggesting benefit from an AIDS vaccine. This was a randomized, placebo-controlled trial in 16,000 adults, that, at least in the initial press releases, showed a statistically significant 31% decrease in HIV infection in vaccinated individuals. How confident should we be in the results?
The analysis that showed the statistically significant benefit (p = 0.04), was a reasonable one. There were 56 people who developed HIV in the vaccine arm, and 76 in the placebo arm, but seven patients were found to be infected with HIV at the time the vaccine was administered, and this analysis excluded those patients. Before we come back to that decision, we should keep in mind what that p value means -- what it is that has a 4% chance of having happened?
Although this causes untold confusion for physicians at all levels of training, that p value says that (excluding any problems with the design or performance of the trial), if vaccine were no better than placebo we would expect to see a difference as large or larger than the one seen in this trial only 4 in 100 times. This is distinctly different from saying that there is a 96% chance that this result is correct, which is how many people wrongly interpret such a p value.
One reason this is so is that many values of truth closer to the null hypothesis (but still where vaccine is superior to placebo) are possible and consistent with the result.
A more important issue of iterpretation in many cases including this one, though, is that we are not starting from a position where all results are equally likely. We had a lot of reason to be suspicious prior to this trial that the vaccine would fail to protect against HIV. Although this way of looking at results is often given important-sounding terms like "Bayesian reasoning", really all we're saying is that if something was unlikely to begin with, we need a higher level of statistical "proof" than if something were likely; if something is extremely likely to begin with, we need a much lower level of "proof".
Imagine a small randomized trial of statins in a population where they have haven't been clearly studied (say, hypertensive smokers without known CHD). If the trial found an insignificant relative risk of CV events in the treated group of 0.78 (22% relative risk reduction,p = 0.40), it would not be the case that there is a 40% likelihood that this result was due to chance. We have lots of prior evidence suggesting that just such a RR is likely, and our belief that this reduction in risk was due to chance should be much smaller than suggested by the p value in a single trial.
In the case of an HIV vaccine, the failure out multiple prior HIV vaccines gives us a baseline doubt about any new vaccine. Added to that is the apparent difficulty of even actual HIV infection to prevent subsequent infection: patients with HIV who have a normal CD4 count and are on HAART that is adequately controlling their infection can become superinfected with a strain of HIV that is resistant to the HAART regimen they are receiving. This raises real concerns about how to create a vaccine to prevent infection.
So, the modestly positive result found in the trial must be weighed against our prior belief that such a vaccine would fail. Had the vaccine been dramatically protective, giving us much stronger evidence of efficacy, our prior doubts would be more likely to give way in the face of high quality evidence of benefit.
And then, into the mix must enter the reality of the multiple analyses performed by the researchers. Although I said the analysis that gave this result was reasonable, it was not the analysis they apparently first planned to perform. The study specified an intention to treat analysis (that did not find a statistically significant benefit at the 0.05 level), and a per protocol analysis (that also did not find a statistically significant benefit).
While the actual analysis the investigators decided to make primary would be completely appropriate had it been specified up front, it now suffers under the concern of showing marginal significance after three bites at the statistical apple; these three bites have to adversely affect our belief in the importance of that p value. And, it's not so obvious why they would have reported this result rather than excluding those 7 patients from the per protocol analysis and making that the primary analysis; there might have been yet a fourth analysis that could have been reported had it shown that all important p value below 0.05.
The authors of the study tell us that the choice of analysis was made prior to unblinding of the results, but that probably isn't adequate. It had been used in interim analyses (and the final analysis) on the blinded data, but likely so had the intention to treat and per protocol analyses. If this analysis were the only one that achieved a p value below 0.05, it could have been chosen as the preferred analysis for that reason alone. The investigators wouldn't have known prior to unblinding whether the trial would show benefit or harm with vaccine, but either result would be more interesting than a statistically non-significant outcome. The vaccine manufacturer would be far better off with a statistical benefit, but probably no worse off if the vaccine showed harm than if it showed no effect.
I've been struck since the beginning of my medical training that biostatistics is an area of mathematics where people can have remarkably different opinions about what is or is not not resonable. I don't usually think of "opinion" and "mathematics" falling in the same sentence, but perhaps that just shows my lack of expertise in theoretical math.
In this case, while the investigators may have felt they were reasonable picking the most "appropriate" analysis, we who interpret the study have to worry about those three statistical bites. Combined with the prior likelihood that the vaccine would fail, I think we need to remain very wary that despite an apparent 31% vaccine efficacy in a trial involving 16,000 patients, the vaccine may in reality be no better than placebo.
I'm not sure I would trust an HIV vaccine until it was 1000000% sure it wouldn't give me the disease. The flu? If the vaccine gives me the flu, the worst I get is a miserable few days. But HIV...that's something that can kill.
Posted by: Robert | Dec 02, 2009 at 02:14 PM
Robert:
Among the many potential problems that might be associated with an HIV vaccine, getting HIV from the vaccine itself is not one of them. No vaccine being tested in humans contains all of the components HIV needs in order to work, and no vaccine being tested in humans contains whole virus at all. (The vaccine could theoretically increase your susceptibility to infection, but only if you were also exposed some other way, like having sex without condoms with someone who had HIV, or sharing needles with someone who had HIV.)
More importantly,
http://www.cdc.gov/Flu/keyfacts.htm
http://www.kff.org/hivaids/upload/3029-071.pdf
Though the numbers are from different years, this year, it is likely true that more people will die of flu than will die of AIDS in the United States. At any rate, it's the same order of magnitude.
David:
If we think the rate of superinfection is not that high--and if it was very high, then more gay HIV+ men in San Francisco should be failing their raltegravir-enfuvirtide-maraviroc-hydroxurea regimens, and more sexually active HIV+ people in general should be failing their regimens more often, I think--then the likelihood of a successful HIV vaccine, generically speaking, remains uncertain and difficult to determine.
That said, I don't disagree with your overall argument, because that likelihood seems lower all the time for other reasons.
Posted by: Joe Wright | Dec 02, 2009 at 10:16 PM
Good post. I think you make too much an issue of the multiple analyses, however. The data are the data are the data. Why should our inference depend on what we intended to do before the experiment? Sure, multiple analyses were undertaken, obviously not independent, but they all led to approximately the same inference: the vaccine was a little more effective than placebo. All of the analyses provide just about the same confidence interval for the difference between vaccine and placebo. The only reason this becomes an issue is the accursed p-value.
Suppose, for whatever reason, I believe the analysis with the p-value of 4%. Then, if you ask me how effective the vaccine is, I would answer that it has about 31% lower risk of HIV infection (if I understood the study correctly). On the other hand, if I believed one of the "non-significant" analyses and you asked me how effective the vaccine is, should I say zero? That doesn't make any sense. Why is it that, if a p-value is a little larger than 5%, we let our estimate of treatment effect snap back to zero?
Further, it is not at all uncommon for us to believe, and act upon, analyses that are unplanned and multiple. It especially happens when we consider the safety of treatments. We stand on our heads, in talking about effectiveness, to define primary and secondary objectives, pre-hoc and post-hoc analyses, etc., but when it comes to safety we let data exploration rule the day. We are willing to believe the difference in effectiveness between vaccine and placebo only if we see a statistically significant difference in the outcome we defined in advance, using the analysis we defined in advance. But if we see more vaccine receivers than placebo receivers experiencing hiccups we take that as gospel and an effect of the vaccine, even though nobody had a thought about hiccups beforehand.
The logic we bring to bear on RCTs needs re-examination, and I respect you and the others who are trying to do so. Thanks.
Posted by: Tom Spradlin | Dec 05, 2009 at 10:57 AM
Tom Spradlin's point is a fair one. The issue to my thinking, though, is not that these multiple analyses somehow unfairly skew our view of reality, but rather that the act of bobbing for p values (sticking with my bites at the apple metaphor) is relevant because of the importance that journals and reviewers place on the holy p = 0.05.
In the HIV paper, the various analyses achieved p values of 0.08, 0.16, and 0.04, all with efficacy point estimates in the 25-30% range. Had the paper reported a p value of 0.16, the journals (and the media reporting on the results) would have said the study showed no benefit with the vaccine. By getting that 0.04 p value out there, the spin was completely different.
For me, I was pointing out why I'm dubious that the vaccine worked. Had the authors been stuck with a p value of 0.16, I wouldn't even have needed to point out my skepticism, since everyone would be saying the trial was negative. That this, too, is an inaccurate interpretation of results and p values is, of course, correct.
Posted by: David Rind | Dec 05, 2009 at 03:45 PM
I many time visited this site and here something is constant new. I will come once again
Posted by: HipGricsnic | Feb 06, 2010 at 12:23 AM
And you could paint this theme more in detail, it seems to me that here something does not suffice
Posted by: adhetlecrelia | Feb 06, 2010 at 01:03 AM
these HIV vaccines really works ? I have several doubts about that ...
Posted by: viagra online | Feb 12, 2010 at 05:36 PM
I very much like this blog. Yet time I will come here
Posted by: numodeljob | Feb 20, 2010 at 02:01 AM
The author has very much tried. I support the majority of commentators
Posted by: sanatorka | Feb 20, 2010 at 02:34 AM
The best blog which I saw before. Hope to vissit it again
Posted by: roomofrequirement | Mar 02, 2010 at 05:06 AM