
Three weeks and one day before last year’s General Election, the then Shadow – now current – Secretary of State for Science, Innovation and Technology, Peter Kyle, revealed to founders, business leaders, investors and policymakers gathered at London Tech Week that Labour will create a National Data Library (NDL). In that very first announcement, he said:
“We can have better health, better education, better jobs and better growth if we can unlock the power of data. For example, data can inform vaccine development as happened during Covid-19, data on the structure and distribution of earnings in the UK, on the factors that influence productivity and prosperity, on decarbonisation and green growth. But too often, researchers and innovators are blocked when they try to find the data they need, getting access is too cumbersome and it simply takes too long. Labour will create a new National Data Library to bring together existing data programmes and give it clear ministerial accountability. The Library will provide improved, secure access to good quality public data for the accredited researchers and innovators in pursuit of better policymaking and innovations that improve everybody’s lives. We will do this securely, safely, and always for public benefit.”
Since becoming a commitment in Labour’s manifesto, the NDL has taken root in the new Government’s policy framework and has been referenced at least 29 times in Parliament and in five government policy papers. And while official implementation plans remain pending, what the NDL has already achieved is bringing many minds – including at least 18 dedicated reports and blogs – to the task of envisioning how to maximise its potential and what it means to treat data as a strategic national asset in this day and age. This has, perhaps unexpectedly, helped create an open public forum where ideas are being challenged, refined, and expanded.
One of the contributions to the public debate was a paper we recently published jointly with the Tony Blair Institute for Global Change (TBI), Governing in the Age of AI: Building Britain’s National Data Library. Even if you are not an avid follower of the NDL’s progress or a data infrastructure aficionado, it’s a read for anyone interested in how to get an innovative government initiative off the ground and create an enabling environment for it to scale.
What challenges can the NDL solve?
Vast amounts of data are collected across government departments and public services, but a lot of it remains fragmented, difficult to access, and, for that reason, underutilised. This creates a missed opportunity for policymakers, researchers, and innovators who are unable to unlock the full value this data could provide. They have to navigate siloed systems, inconsistent approval processes, and months-long waits – often to access data that should be readily available.
Expectations for the NDL to fix this are high. It was referenced as a vehicle for economic growth in the Industrial Strategy Green Paper, and was cited by the DSIT Permanent Secretary as “one of the core priorities for the department in the coming months and years.” The AI Opportunities Action Plan positions the NDL as a strategic enabler, powering the development of AI models and supporting applications across a wide range of domains.
You can read in our paper how we think the NDL can achieve that. In this blog, I’d like to focus specifically on how the NDL, if successful, could benefit researchers and innovators in academia and the private sector, and highlight just a few of our 42 recommendations on how to take this opportunity from concept to reality. Keep reading to also find out how you can feed into our continued joint effort with the TBI to ensure the NDL is as useful as it can possibly be.
How could the NDL transform research and innovation?
For academia, the NDL promises to unlock research pathways currently blocked by data fragmentation. Researchers would no longer face the choice between limiting their questions to what’s easily accessible or spending months navigating multiple approval processes. Instead, they could pursue questions that cross boundaries – connecting longitudinal health records with socioeconomic factors, integrating environmental data with demographic patterns, or analysing educational outcomes alongside employment trajectories. A streamlined infrastructure would put research questions, not bureaucracy, at the centre to enable deeper exploration of complex challenges.
Industry would find in the NDL a structural advantage too, enhancing commercial innovation. By providing seamless but secure access to high-quality, linked data, the NDL could help accelerate product development and diffusion of innovation. For example, pharmaceutical innovators could streamline clinical trials by accessing linked NHS, social care and genetic data, reducing delays in patient recruitment and monitoring while bringing precision treatments to market faster. Construction firms could combine planning, environmental, and utilities data to avoid costly mistakes during development, reducing delays and accelerating housing delivery.
In the AI space, we’re already seeing glimpses of this potential with the ‘content store’ for education data, which is enabling AI tools to provide feedback on student work – exactly the kind of innovation the NDL could scale across multiple sectors. Solutions, tailored to address uniquely British challenges and opportunities, could emerge in healthcare, energy, urban planning and public services if AI developers could easily fine-tune models on rich, integrated, representative datasets that reflect our distinctive national systems. By enabling innovators to build AI systems that embody British social values and institutional knowledge, the NDL would also help transform public sector data into instruments of soft power, providing a reliable alternative to systems developed under different political philosophies while projecting our democratic traditions and governance approaches far beyond our borders.
How can we make this a reality?
To begin, we recommend establishing a single Data Access Committee with delegated authority from departmental committees, eliminating the duplication and inconsistency that currently creates unnecessary delays. The goal would be to radically speed up access, processing all data requests within two weeks of submission through a system that adapts the approval process based on who is requesting the data and what they’re requesting – with streamlined access for trusted researchers working with less sensitive information, and appropriate additional safeguards for more sensitive data or new users.
We also propose creating a network of National Data Librarians embedded within government departments and other relevant public bodies. Among other functions, they would act as bridges between data controllers and external users – improving data quality, enhancing usability, and aligning datasets with the needs of researchers and innovators. For academic and industry users, they would work to ensure data is prepared and structured to support real-world applications.
Another recommendation is a Reader Pass system – a digital library membership card that defines user permissions based on credentials, compliance and past use. This would enable trusted users to access appropriate data without repeatedly navigating approval processes, while maintaining strong governance through enforceable terms of use and a Data Offenders Register for those who violate trust. Alongside this, a comprehensive metadata catalogue would function like a library index, allowing users to efficiently discover and understand available datasets through a simple search interface.
Looking further ahead, we envision Data Biomes – collaborative environments where government, academia and industry tackle complex challenges together. Unlike traditional data-sharing initiatives, these would be actively curated spaces focused on specific national priorities. Each biome would be chaired by relevant National Data Librarians who would identify key challenges, publish specific problem statements, and provide sandbox environments where researchers and innovators could develop and test solutions using relevant datasets. This could also incentivise data production around high-value use cases, ensuring new datasets are created with specific applications in mind rather than as abstract assets without immediate utility. When exceptional solutions emerge, a streamlined process would allow rapid adoption in government.
Imagining things
It takes a leap of imagination to look beyond current constraints and envision what’s not easily accessible or doesn’t exist yet. But this forward-looking mindset drives innovation.
The NDL shouldn’t just be a case of the same ambitions under a new name. Looking beyond incremental improvements to existing systems, we should be asking “what if?” and gradually removing systemic barriers to turn it into reality. Just imagine:
What if agritech companies could build AI solutions specifically calibrated to British farming conditions, integrating soil quality, local climate variations, and crop yield data to maximise sustainable food production across our varied growing regions?
What questions could researchers answer if education data was connected with health, employment, and social care data to understand the complex factors affecting life outcomes?
What if it was possible to spot warning signs of housing instability by linking welfare, housing market, and employment data, enabling help to reach people before they lose their homes, not after?
What if the NDL was able to partner with UKRI and ARIA to create new high-value datasets beyond any single organisation’s reach?
What connections between datasets might reveal patterns we’ve never seen or thought of?
These and many other possibilities are not distant mirages. They are opportunities waiting for the infrastructure to make them reality. If done right, the NDL is a chance to remove the barriers that have kept them just out of reach, transforming how researchers and innovators access, combine, and derive value from data.
The NDL’s practical usefulness will depend on how well it meets actual user needs rather than someone else’s assumptions about them. While the NDL is in its scoping phase, this formative time is therefore critical to set this potentially transformative infrastructure on the right track, rather than having users work around its limitations later. To help identify long-term opportunities and quick wins that DSIT could prioritise, the next phase of our joint effort with the TBI is gathering insights directly from future users in academia and industry. Their views on specific data needs, current bottlenecks, high-priority datasets, and essential support services can help inform the NDL’s development so that it delivers real value.