#OpenDataSavesLives - Session 34: Open Source in Health Analytics
Our event this time saw a fantastic turnout with attendees across sectors. Dr. Marc Farr chaired the session and started by thanking our sponsors and supporters for enabling #OpenDataSavesLives to evolve from an informal collaborative network, set up to provide support during the Covid-19 pandemic, to the fantastic event series it is today.
This time, we built on the rich discussion from previous sessions to further explore the use of open source and open innovation in health analytics, with three great speakers joining us to talk about their work. Ellen Coughlan, Programme Manager at the Health Foundation, will start by sharing how we encourage more people to work in this space. Chris Beeley, Senior Data Scientist at Nottinghamshire Healthcare NHS Foundation Trust, has some reflections on the Goldacre Review to share with us. Finally, we’ll be hearing from Dr. Osman Bhatti to share a GP’s view, as well as some insights into information governance and building trust.
Ellen Coughlan:
Ellen is the programme manager on the Data Analytics team at the Health Foundation, which funds and supports projects, such as Open Data Saves Lives, that improve the use of data in health and social care.
In Ellen’s experience as a former analyst at a general practice, and currently, in her work at the Health Foundation, she has encountered cultural reluctance from analysts to share data sources and methodology due to fear around exposure, as well as governance and licensing concerns. Access to crucial datasets has been often limited and required the use of expensive proprietary software. This closed manner of working and lack of accountability often meant long delays in getting queries answered, high costs and frustration for analysts.
The Data Strategy seeks to address some of these issues, and going forward we should all be holding the DHSC to account for implementing this. We have work to do in building a culture that reduces anxieties around open practices, by sharing success stories, blogs and encouraging decision-makers to promote the value of working in the open. We shouldn’t find ourselves in situations where methodologies are hidden in the health sector, as those developing, procuring and using tools should have a sound understanding of how effective, equitable and safe those tools and services are for their patients. This may require investment in data literacy at the senior level to ensure those decision-makers have the skills and knowledge to facilitate the safe adoption of open working practices.
The community also has an important role to play in creating an open working culture. Initiatives such as NHS-R, a community that promotes the learning, application and exploitation of R within the NHS, and of course Open Data Saves Lives, are fantastic platforms for sharing experiences, collaborating and removing blockers to getting things done at pace.
Chris Beeley:
Chris is a Data Scientist and co-chair of the NHS-R technical advisory group and shared with us some insights on open code practices. He started with a couple of highlights from the recent Goldacre Review.
Trusted Research Environments (TREs) are a means for analysts to work on data without having to download pseudonymised/ anonymised data onto their laptops. Most data analysis is currently done by downloading large amounts of data, which the report argues is neither efficient nor safe and we should move away from storing any kind of patient data on local machines. Code-based TREs facilitate the reusability of code so that others can easily replicate analyses, as well as being auditable to ensure that analysts are using the data safely.
Another important theme from the Review was the transition toward Reproducible Analytical Pipelines, an analytical process that does not require human intervention and runs entirely automatically. RAPs are therefore entirely developed through code. They have been extensively adopted across the civil service and NHS analytics processes are undergoing reform to incorporate open software engineering principles into existing pipelines.
The government’s data strategy, along with the Goldacre Review, outlines that going forward, all NHS procured code for data curation and analysis will be shared openly, except where there are legitimate reasons for not doing so. The benefits of using code for data science are clear (shown in the slide below). Sharing analytic code is the only way to have truly reusable and inspectable analytics. We have a duty to publish what we do, not just from the position of fairness and transparency but also to save vast amounts of public money that would otherwise be spent on inefficient manual processes and non-reproducible analysis. The civil service has saved millions of pounds in adopting these practices, and can prove it too - so these changes are here to stay.

Credit: Open Innovations
There is a massive skills deficit in the NHS. Not all NHS analysts are experienced programmers and a large portion of analysts’ time is spent on manual processing and data preparation. Many private companies are selling easy alternatives to the labour required to transition teams to a code-first approach, but the result of outsourcing is therefore being tied into paying for an expensive product indefinitely and a workforce that has not learned anything. As Chris said, if you teach a person data science, they can generate insight from their data for a lifetime. Code-first analytics is cheaper and better in the long run, but to make the transition we need a massive programme of training through experiential learning with good leadership and mentoring. Similarly to Ellen’s point, this means that analytical literacy in senior roles, along with community programmes led by analysts such as NHS-R, are essential to support a smooth transition.
Dr Osman Bhatti
Osman is a GP and CCIO in North East London and shared a primary care perspective on data sharing.
Confidential patient data is collected for the purposes of direct care. Patients and clinicians all expect the necessary information to be available at the point of care, such as details on GP and hospital visits. It is therefore essential that patients can trust that this data will remain confidential and that they understand how it will be handled. If that trust is eroded then the system falls apart and we are starting to see cases of this happening now.
COPI notices have started to change things from March 2020. There were over 3000 requests for data up to July 2022. Generally speaking, these requests are most often made by academic institutions, CCGs and local authorities, however, the classification of certain applicant organisations is unclear. In 80% of cases, the data was being used for commercial purposes, not for the purposes of direct care, and more detail is needed to understand exactly how patient data is being used, especially given around 20% of the total requests contained sensitive, identifiable patient information. More concerningly, in the 46000+ data releases in which patients have opted out for their data to be used in research, only in 8433 cases was this honoured. This raises the question of how much this hinders trust when in most cases opt-out requests are not being respected.
A lot of work needs to be done to address these concerns and simplify the process. Currently, there are two ways for a patient to opt out of sharing their data - Type 1 Opt-Out stops patient data from leaving the practice, whereas National Data opt out prevents the data from being shared and released to other organisations. These processes are in need of reform and people need to understand how their data is being used - that is, not just for the purposes of direct care and that sensitive, identifiable data is being used for commercial benefit with no record or transparency around exactly how it is being used.
The Goldacre Review has laid out several steps in the right direction to start resolving some of these issues with trust and transparency. TREs will go a long way in reducing some of the risks associated with requests to handle patient-identifiable data. Locally-owned TREs especially will build the best amount of trust for clinicians and patients to pass data into the TRE as well as allow access to the data by other organisations in a more transparent manner. Currently, there have been 114 projects approved using OpenSAFELY and this number is expected to rise as more organisations start to implement the recommendations laid out in the Review and new government data strategy.
Conclusion
This session brought some excellent discussion with a flurry of responses in the chat. If you have any thoughts to share on this topic, then do get in touch with us, or join in the conversation over on Twitter. You can watch the recording of the session here.
If you missed our last session, we discussed the use and potential of Open Data in Health, and in October, we’ll be hosting another session on Health Inequalities, a key area of focus for the NHS currently. This will be a great session so keep an eye out for our updates.