#OpenDataSavesLives: AI in Health & 2022 Wrap up!

By Tazmin Chiles, Data & Innovation Consultant/Delivery Manager, Open Innovations

Our last online session again saw a huge turnout with 100+ registered for the event! We’re thrilled to see this - in fact, the response we’ve seen to the event series throughout the year has been phenomenal. And as 2022 comes to a close, this also rounds up my first year at Open Innovations! A great time to reflect on how #OpenDataSavesLives has evolved throughout the year.

We were asked if we could host a session on the theme of AI, so this time we were happy to welcome three great speakers who are working on some nice applications of AI models to improve health at a few different levels.

David Howell, Quantum Analytica

Quantum Analytica logo
David Howell, Director of Quantum Analytica
Credit: Quantum Analytica

David Howell was first up, sharing his work with Kent & Medway ICS developing an AI model for early detection of lung cancer, which has some of the worst rates for early diagnosis. His team at Quantum Analytica have developed and trained a model that considers a number of personal metrics - demographic, family medical history, drug interactions, and a combination of behavioural, environmental and mental health risk factors - to give a risk score for each patient that predicts the likelihood of lung cancer diagnosis.

This is really exciting in the context of all the work we’ve shared and progressed around health inequalities over the last year. Models like this, as well as reducing burden on the NHS (i.e. reducing the number of appointments needed before diagnosis), may also help to clear the picture on how social factors affect a person’s access to services and outcomes after treatment.

Professor James Teo, KCL

Next we heard from Professor James Teo from King’s College London. We talked about the importance (and difficulty) of laying the groundwork of good data infrastructure, a lot of which stems from legacy systems that initially replaced paper processes - a nightmare when trying to clean, sort and process data.


Hierarchy of needs diagram highlighting importance of infrastructure.
Good IT infrastructure is essential for embedding analytics and making pipelines scalable and translatable between systems.
Credit: Open Innovations

He said ‘we are using human beings like robots, but we should be using robots like robots’.

We shouldn’t be leaving the task of ensuring data is in good shape to healthcare professionals - this isn’t the best use of their time. 

So a few years ago, his team built Cogstack, which is essentially a data lake of unstructured data. All of these unstructured inputs contained within electronic health records are processed to make them searchable and give the language meaning. The project is open source, you can view the repo here.

They’re now taking this a step further, developing MedGPT by training an OpenAI model on this data and using it to autocomplete a patient’s future interaction with the health service. This just shows the power of using these models to improve efficiency and reduce load on the health service. The demo will be released sometime this month so keep an eye out for this.


Slide showing an overview of the AIDE platform
The AI Centre have developed a number of platforms to support the crucial infrastructure needed for analysis of health data.
Credit: Open Innovations

AIDE (Artificial Intelligence Deployment Engine) essentially acts as an ‘app store’ for AI models. It provides infrastructure for the health service to train, deploy and integrate their models, you can read more about it.

They’re also interested in terms of equity - so they created an open natural language dashboard for people to search and track which areas in the world are contributing most to open journals and publications in AI. Take a look at this dashboard here.

It is clear that James and his team have contributed a great deal of work here. If you’re interested in any of this then get involved. You can reach James directly on Twitter - @jthteo.

Parashkev Nachev, UCL

Finally we heard from Parashkev. Much of the inefficiency in cost and effectiveness of treatment could be improved by treating individuals rather than entire populations. Migraine, for instance, has a huge number of contributing factors that are different for everyone. The current approach is trial and error.

Parashkev and his team have also been using NLP to categorise patient phenotype from their clinical records. Then they trained a causal model based on many treatment outcomes over time and used it to identify the best treatments based on these predicted subpopulations. Amazingly, this ‘machine prescription’ approach performed better, saving on average 3 months of pain compared to existing approaches. You can read more about what Parashkev and the team have been doing in their research paper here.

Reflections from 2022

We’ve covered a lot of ground this year at #ODSL, from our innovation session in March building our mental health data explorer prototype, to Cancer data and reproducible analytical pipelines. The aim of these events is to convene, share knowledge and spark up some opportunities for collaboration. We think we’re achieving this goal - for example, out of the Cancer session we held almost six months ago came a number of interesting questions around inequalities in cancer treatment that we’ll be coming back to in another session in the year ahead. Our session in October on Health Inequalities was really well attended, and conversations were taken offline to see how the fantastic work by Ruby Nicholls and her team over in the East of England can be replicated across East Kent using purely open working methods.

We’re glad to see some of these emergent themes starting to take shape. Set up originally as a way to share and use data, find collaborators and make progress rapidly in the midst of the pandemic, it has since evolved into a space for analysts, healthcare professionals, data fanatics and people across the public and private sector to convene, seek advice and connect with like-minded people.

We’ve worked closely with NHS Digital and others at the Open Data Task & Finish Group to improve the use of open data in the public sector. In light of the Goldacre Review earlier this year, we held a successful in-person event at the space here in Leeds on all things reproducible analytical pipelines. #NotaRAPBattle

Safe to say we’re going from strength to strength. And we want to hear your ideas! What would you like to see next year? Get in touch with us at hello@opendatasaveslives.org.