In this post, Peter Sasieni argues that it is time to be more specific about our definitions of specificity and false positive tests. How will we define these terms in the future when screening tests can tell us that we have a cancer that is so small that doctors are unable to find it?
The specificity of a medical test is defined as the proportion of people tested without the disease who will have a negative result. But how should we define “without disease”? And for some tests how should we define a “negative” result?
Whilst considering these questions, it is also important to consider whether the test is being used for screening or triage and whether one is talking about test specificity or episode specificity. The latter is particularly important if one uses the language of a false positive.
Test specificity or episode specificity
Cervical screening in England uses cytology as the initial test. The test is not either positive or negative. It can be “normal” (negative), weakly positive (low-grade), or strongly positive (high-grade). The test can also be labelled as “inadequate” for providing a valid result. Women with normal results are recalled for screening after an interval of three or five years (depending on age). Those with a strongly positive result are referred to a gynaecology clinic for an examination called colposcopy. When the sample is weakly positive, a second test is done. If that test is positive the women is referred to colposcopy. If it is negative, she is recalled at the routine interval. Women who have a weakly positive initial test and a negative triage test, had a positive test in terms of cytology but a negative test in terms of the combined test. How should they be counted when it comes to talking about the specificity? The initial test was positive, but the episode was negative.
Studies of triage tests tend to report specificity in the range of 30-75%. They sound terrible, but the episode specificity can only be improved by reclassifying some people who are initially “weakly positive” as negative. Since most false-positives are not random events but arise because of the presence of a condition that mimics the one that we are trying to detect, it is not surprising that a second test applied to those (weakly) positive on the first test will have a high proportion of false positives. Suppose the first test is positive on 10% of those screened: with 1% being true positives and 9% being false positives. Suppose the second test is positive for 4.0% of those positive on the first test: true positive in 0.96% and false positive in 3.04%. Then in the triage setting the specificity of the second test is 66%. But look what the triage test has done – in a screening population of 10,000, it has reduced the number needing referral (and invasive or expensive testing) from 1000 to 400, while finding 96 people with disease (but missing 4 cases that would have been picked up if all 1000 had been investigated fully). If the initial test had a specificity of 90.9%, the episode specificity of the combined test is 96.9%.
Who is disease free?
Consider the issue of using human papillomavirus (HPV) testing in cervical screening. Most HPV DNA tests are extremely specific for HPV DNA. If the sample does not contain HPV DNA the test will be negative. In terms of analytic specificity there are only two problems: contamination of the sample with traces of HPV DNA that are not from the patient; and cross reactivity between different types of HPV. The contamination issue is general to all testing. The cross-reactivity comes from having a generic test that claims to detect, for example, 13 HPV types but which in practice may also give a positive signal for other HPV types. But the issue is that we are not screening for HPV DNA. We are screening to detect cervical pre-cancer and to diagnose cervical cancer early when it can be easily and successfully treated. Most women with HPV DNA will not have pre-cancer or cancer. The clinical specificity might be defined as the proportion of women without pre-cancer or cancer who test negative. But that will depend heavily on how common HPV infections are in the population being considered. In the UK, HPV positivity of cervical samples varies enormously with age. In young girls (before they become sexually active) HPV infection is extremely rare. In women in their early 20s, up to a third will have HPV DNA in a cervical sample. The prevalence of infection falls rapidly with age. About 10% at age 35-50 will be HPV positive but only about 6% in those age over 50. Data from one study in England suggests that in fact between 4% and 6% of women will test positive on a generic HPV test without having one of the 13 HPV types that are most strongly associated with cervical cancer. Thus, the specificity is over 99.5% in young girls, falls to perhaps 67% at age 20-23 and then increases back towards 94-97.5% depending on the test used. There is also a twenty-fold variation in HPV prevalence between countries. The clinical specificity of HPV tests is thus largely dependent on the population being tested rather than on the test being used.
The issue of who is disease free is relevant to other types of screening too. Initially, faecal occult blood tests which detect blood in the stools were proposed as a way to screen for bowel cancer. There are of course lots of conditions that may lead to blood in the stool apart from cancer. For example, adenomas, which are small growth that may develop into cancer, can also bleed. Newer tests are better at detecting adenomas. If one defines being disease-free as “not having bowel cancer” then the new test will have worse specificity than the old test. But by detecting (and then removing) adenomas the new test prevents bowel cancer far more often than the old test.
Although it is only a theoretical issue there is also the possibility of having a test for cancer that is so sensitive and specific to cancer that it will test positive in an individual with a microscopic cancer that is too small to identify. The test would tell you that you have cancer, but doctors would be unable to find it. Should such a result be considered to be a false-positive (because in terms of all other tests the patient does not have cancer), or should we hail the advances in molecular diagnostics that enable us to identify someone with a microscopic cancer? The answer depends on what will be the consequences of the positive test. One the negative side it could lead to huge anxiety or to a whole series of invasive and expensive tests (that still fail to pin-point where the cancer is). But on the positive side it could lead to simple treatments (medicines like aspirin, for instance) or lifestyle changes (losing weight, exercising) that enable the body to destroy the cancer before it grows into one that causes problems. Alternatively, it might lead to 6-monthly surveillance so that the cancer is diagnosed when it becomes the size of a pea and can be easily removed surgically before it has spread.
The idea of having a blood test that can detect cancers that are too small to be seen may sound like science fiction, but at least two such tests are being studied: CancerSEEK tests for a panel of proteins and mutations that provide a cancer signature in blood; and GRAIL are developing very sophisticated assays to look for tiny fragments of cancer in the blood.
The views expressed are those of the author. Posting of the blog does not signify that the Cancer Prevention Group endorse those views or opinions.