Screening iconoclasm series: Don’t be overly sensitive about sensitivity

Photo by Patrick Tomasso on Unsplash

In this post, Peter Sasieni argues that sensitivity is over-rated. It is the first of a series of posts on “Screening Iconoclasm” in which we will re-examine the orthodoxy of the principles of medical screening and suggest that the time has come for a radical rethink.

Don’t be overly sensitive about sensitivity

The first thing I am asked when talking to a company about a new cancer screening test they are trying to develop is “how sensitive does it need to be?” Sensitivity of a test designed to detect cancer is the proportion of people with cancer who would test positive. My stock answer is that I can’t give them an answer! It depends on too many things. Maybe after an hour or two we could agree on a target figure.

One of the criteria for medical screening is that you need to have a suitable test. The classic guidelines published by the World Health Organisation in 1968 discusses the “validity of the test” without referring to sensitivity. Similarly, the UK National Screening Committee’s criteria for appraising a screening programme does not provide any specifics regarding the required performance of a screening test. They both avoid putting an exact figure on what they mean by sensitive, but that is what people want to know. Does a test need to have a sensitivity of 90%? Is 80% good enough? Surely if the sensitivity is only 50%, so that the test misses as many cases as it picks up, the test must be useless?

Historically, the best cancer screening tests had poor sensitivity

Of course, all things being equal one would choose the more sensitive screening test. But we virtually never have such a choice. There may only be one test available. The more sensitive test may be more expensive; or it may be more invasive; or it may have more false positives.

The first thing to note is that two of the most widely used and effective cancer screening tests have poor sensitivity.

The Papanicolaou (or Pap) or smear test was widely used for cervical screening in several countries in north America, northern and western Europe and elsewhere. In many of those countries it is credited with leading to a reduction in cervical cancer by between 50% and 80%. Over the years some 300,000 women may have been prevented from developing cervical cancer thanks to the Pap test. Such a test must surely have an extremely high sensitivity? Oddly enough it doesn’t.

When used in screening the Pap test is trying to detect pre-cancer. It is estimated that it detects between about 30% and 80% of pre-cancer depending on its quality. In most studies from North America the sensitivity is estimated to be about 50%. And yet, cervical cancer rates have fallen substantially across North America and that is almost certainly thanks to screening.

Test sensitivity versus programme sensitivity

One reason cervical screening in North America was so successful despite using a test with poor sensitivity was that it was recommended that women were screened annually. Fortunately, cervical pre-cancerous only progresses to full cancer over many years. Thus, even if a test misses it one year, it may pick it up a year or two later and it may still be before the disease has turned into cancer. Since it is very easy to treat pre-cancer of the cervix, so long as one finds it eventually (and before it has progressed to cancer), it is very easy to prevent the cancer. There are several definitions of programme sensitivity. Some people wish to consider what proportion of women are actually screened; others don’t. But everyone agrees that for cervical screening it should capture the probability of detecting pre-cancer before it turns into cancer.

So, for cervical screening, one can’t say how sensitive the test needs to be to work – the answer depends on how often you plan to screen.

Did people realise how poorly sensitive these tests were before introducing them?

Probably not! Because initially the tests were looking for different things!

The first paper by Papanicolaou (published in the proceedings of the Third Race Betterment Conference!) about his technique suggested that it could be used to diagnose cancer without the need to take a biopsy. For finding cancer (as opposed to pre-cancer) the sensitivity is between 80% and 95% (depending on the quality of the test).

Most screening tests are more sensitive to advanced cancer than they are to early cancer. But there is little point in having a screening test for advanced cancer. Screening aims to find something early when it is easier to treat and the patient is more likely to be cured. One could have a test that had 100% sensitivity for advanced pancreatic cancer, but unless it was also quite sensitive to early stage cancer, it would be useless as a screening test because by the time the cancer is advanced it is very difficult to cure.

Sensitivity to what?

Another reason why sensitivity on its own is meaningless is because it depends on what you are looking for.

Bowel scope is an examination that looks at the part of the bowel that is closest to the anus. It probably detects over 95% of cancers in that part of the bowel “the distal colon and rectum”. We could say it has 95% sensitivity for distal colo-rectal cancer. Or we could say that it has 65% sensitivity for bowel cancer. The latter doesn’t sound nearly as good, but it is the same thing – even without finding any cancers in the other part of the bowel a test that detects 95% of distal cancers will detect about 65% of all bowel cancers.

It has a lot to do with our perception of risk.  The psychologist and Nobel prize winning behavioural economist, Daniel Kahneham, realised that there is a non-linear relationship between chance (probability) and how people behave. In prospect theory, he and Amos Twersky suggested that:

  • People tend to overreact to small probability events, but under react to large probabilities
  • The value people give to possible outcomes is s-shaped because a gain (loss) of £1000 pounds (to someone now) feels better (hurts less) than the additional value of a gain (loss) of £101,000 compared with £100,000
  • The value people give to possible outcomes is asymmetrical because losses hurt more than gains feel good.

Most people are willing to “pay” more for a 100% chance of avoiding one bad thing than they are for a 50% chance of avoiding two bad things. We value certainty over uncertainty.

A hypothetical example concerns using a urine test to look for bladder cancer. If you test 1000 people, 50 may test positive and 5 may turn out to have bladder cancer. The sensitivity of the test to bladder cancer might be 70%. In addition, of those who test positive, 2 may have kidney cancer. The sensitivity of kidney cancer might be just 30%. Overall the sensitivity to bladder and kidney cancer is about 50% so one might think it sounds better to ignore the kidney cancers and say that the test has 70% sensitivity to bladder cancer. Assuming the kidney cancers are not over-diagnosed, one only stands to gain by also diagnosing them. The test that picks up 50% of bladder-and-kidney cancer is better than the test that picks up 70% of bladder cancer, but because we don’t mention that the latter test picks up 0% of kidney cancer it sounds better.

If we were developing diagnostic tests, we would prefer to have two tests: one for bladder cancer and one for kidney cancer each with a sensitivity of 90% say. But in a screening setting one might rather have a single test with a combined sensitivity of 60% for both kidney and bladder cancer (because, I am assuming that the numbers who would have a false positive result on at least one of the two tests would be greater than the number of false positives on the single test).

Sensitivity is an important concept for screening tests, but a theory based on dichotomous test results (positive or negative) and dichotomous health states (diseased or healthy) oversimplifies reality. As technological advances make the prospect of a pan-cancer screening test more likely, it is incumbent on researchers to consider new paradigms for quantifying the benefits and harms of such tests.

The views expressed are those of the author. Posting of the blog does not signify that the Cancer Prevention Group endorse those views or opinions.

Subscribe to our mailing list to get updates of new posts.

2 Trackbacks / Pingbacks

  1. Which screening guidelines are woefully inadequate? – Cancer Prevention Group Blog
  2. Artificial Intelligence in breast cancer screening – part 2: international differences – Cancer Prevention Group Blog

Leave a Reply

Your email address will not be published.