#OpenDataSavesLives - Session 32 - Cancer Data

By Tazmin Chiles, Data & Innovation Consultant/Delivery Manager, Open Innovations

Our latest #OpenDataSavesLives session on the 20th April saw us joined by three fantastic speakers and a great turnout across clinical practitioners, policy professionals and some fellow data geeks like us - all interested in collaborating around the use of cancer data in the open.

We were very lucky to be joined by Professor Sir Mike Richards, the first National Cancer Director and retired Chief Inspector of hospitals. More recently, Mike has written a major report on diagnostics which has led to the introduction of Community Diagnostic Centres across England. He has also recently been appointed as Chair of the UK National Screening Committee. Alongside Mike, we heard from Dr. David Tighe, Deputy Clinical Lead and Consultant Oral & Maxillofacial Surgeon and Martelie Isaacs, Principal Cancer Business and Data Analyst, Surrey and Sussex Cancer Alliance.

Key points from our speakers

Professor Sir Mike Richards

Setting the context on the state of cancer data in the UK, Mike told us that we’ve come a long way, but very slowly and we have a long way to go yet. We’ve had cancer registration for more than 50 years, effectively recording incidence and new patients which was subsequently linked to death data from the ONS. At the time, timely data was not essential as insights could be gained from trends over time. Historically, up to 20% of cancer patients were only picked up after death. However from around 2000 onwards, it was recognised that cancer data is essential to drive improvements in service delivery and this served as a catalyst to improving the timeliness and completeness of cancer data.

In light of this, a paper-based National Cancer Patient Experience Survey was set up that enabled some comparison of performance and outcomes between trusts. Chemotherapy data was the slowest - electronic prescribing was promised by 2006 but it was more than a decade later when this was delivered. More than a matter of data quality, Mike eluded to safety issues around paper based prescriptions which were a concern during this time. He went on to describe the laborious processes required to initially extract radiotherapy data from dispersed machines into a singular dataset.

The biggest step forward for cancer data was linking datasets, which allowed health professionals to see for the first time how patients were coming to a diagnosis. This revealed that almost 25% of patients were coming as emergencies and 90+% through symptoms, compared with only 6% of patients being diagnosed through screening. From the work with the International Cancer Benchmarking Partnership, it became clear that the UK was performing poorly, and that 10,000+ lives could be saved if our services were improved.

Now the goal is to have 75% of patients be diagnosed within Stage 1 or 2 by 2028 and there’s a lot we need to do to get there. Firstly, the full impact of the Covid-19 pandemic at a national level is still unclear due to a lack of data available during the first wave. Now there is real-time data on Covid-19 infections and admissions that we just do not have with cancer data. Cancer is a more complicated picture, but if we can do this for Covid-19, then why not for cancer?

Currently we also do not know which hospitals are not meeting performance standards during which diagnostic steps, which is essential for us to drive improvements and minimise regional health inequalities. We are currently not linking GP and hospital records, therefore we don’t know what happens to patients before they came to hospital and after they leave. This information is needed to have a hope of answering critical questions such as ‘what are survival rates of patients that develop recurrent breast cancer?’. Clearly we need earlier and faster diagnosis, but to do that we need data, and that data needs to be as close to real-time as possible.

Dr David Tighe

David has been working in collaboration with the University of Kent Computer Science Department, and a lot of this work is now embedded in the nation quality improvement program - QOMS (Quality Outcomes in Maxillofacial Surgery).

Firstly, how do we define quality of care?

David described some of the challenges with defining good metrics for assessing quality of patient outcomes, which are essential to develop a quality improvement programme. It is important to make sure they are:

  • Actionable
  • Reproducible, over time and geography
  • Linked to quality improvement whereby as care improves, the metric should also improve.
  • Linked to expected variation, rather than where commonality in performance between hospitals is expected
  • Metrics also need to be modelled to reflect the complexity and comorbidities in the cohort that each hospital will treat.

The team have developed models to represent some of the key performance metrics selected by the team. The figure below describes these models in more detail. At the outset of the work in 2016, David described that the team were using traditional medical statistics but over time this has evolved to expose their datasets to machine learning platforms. Today, the team maintains and analyses datasets to calculate these metrics, scouring through unstructured data locked away in histopathology reports from three hospitals, now taking into account the complexities and comorbidities within the populations they treat which have a considerable influence on how a patient responds to treatment.

The Quality Outcomes in Maxillofacial Surgery program aims to improve quality metrics for measuring patient outcomes.
Credit: Open Innovations

When comparing hospital performance, David went on to explain how these models indicate a very different picture against standard performance metrics once adjustments are made for variables such as patient age, population metrics, tumour dimensions, depth, level of invasion, whether the tumour has the capacity to progress along nerves and so on, that standard performance metrics previously did not take into account.

Quality assurance is all about choosing the right metrics, however these models are not without human error or devoid of risk. David will be working to transform this process into clinical governance architecture which will contribute to making improvements to quality of care, governance and ultimately improve patient outcomes.

When asked about collaboration on this work, David described the challenges he has encountered when trying to persuade surgeons to collect data on the adverse events that happen in their unit. These datasets were accumulated by many hours of David’s personal time and, even though healthcare professionals do want to take part and improve quality of care, it is often difficult to incentivise individuals and funding bodies to participate in these kinds of grassroots collaborative efforts.

Martelie Isaacs

Martelie has worked closely with cancer data in her work with the Surrey and Sussex Cancer Alliance, one of 21 such organisations across the country that brings together clinical and managerial leaders across regions to improve services, reduce regional inequalities and improve patient outcomes and survivorship.

Part of the work she does is comparing differences in patient experience, examining performance standards across 7 trusts, 15 different cancer types and taking a close look at each unit’s position and the conditions behind it.

It has taken a long time to get to business as usual due to the impact of the pandemic. A theme that frequently comes up is the sudden nosedive of two week wait cancer referrals, which have seen variable recovery across regions and cancer types. They have also seen an increased likelihood of presenting with cancer symptoms through A&E, resulting from patients’ reluctance to attend a doctor during the pandemic.

In order for services to improve and recover after the pandemic, we need data. Martelie explained that the data they interact with comes in a number of different formats and is often duplicated from multiple sources. The Cancer Alliance interacts with a number of other organisations, shown on the slide below, which collaborate and share data to improve services. The outcome of such an ecosystem is that there is a huge amount of data to process, which can be time consuming and laborious. She elaborated on some of the challenges she encounters.

The Surrey and Sussex Cancer Alliance work with a number of organisations in the region, collaborating and sharing data to improve services.
Credit: Open Innovations

One of the main factors that determines outcomes is the stage of diagnosis, however around 22% of cancer data is missing staging information and we therefore cannot reliably compare the outcomes or impact of Covid-19 on stage of cancer diagnosis.

Interoperability is another key challenge and a theme that emerged from many of the discussions over the day. How simple is it to retrieve the data? Often, laborious processes involving multiple systems leads to bottlenecks and impedes data quality. Martelie explained that we need to go back to basics, ensuring that we have sufficient resources to record data with sufficient granularity and consistency, improving interoperability between systems so that those resources are freed up to engage with patients.

Data sharing agreements and information governance are another consistent impediment to collaborating with data. Preserving patient privacy is absolutely essential, therefore we may see further uptake and adoption of analytics platforms such as trusted research environments (TREs) and OpenSAFELY in the future. 

Finally, linking datasets is another key focus, for example 80% of lung cancer patients have smoked in their life. Linking together GP and cancer datasets would allow relationships to be discovered quickly and action to be taken immediately. 


A key message from the day’s discussions was that information governance is a key limitation to using and sharing data widely and securely, thus preventing collaboration and progress in improving cancer outcomes. With a final note from the team at Open Data Saves Lives: use the web. We need to cut across all of these different ways of collaborating, skirting around legacy processes by publishing our work on the web. In light of the recent Goldacre Review, there is a call to arms around introducing reproducible analytical pipelines (RAPs), open and automated analytical processes that increase trust and transparency whilst preserving privacy.

We will be delving into some of the interesting points raised in the Goldacre Review over the course of the year, so watch this space for future updates. As always, if you were interested in this session then get in touch and tell us what you think. Our next event will be on the theme of Open Data in Health on the 8th June. The event is free and open to all - we hope to see you there. 

Thank you again to our sponsors, TPP and NHS SCW, and to the Health Foundation for supporting this event series.