Aug 22, 2022
In this episode Miles is joined by Professor Luciano Floridi of Oxford University; Simon Whitworth of the UK Statistics Authority; and Pete Stokes from the ONS to talk about data ethics and public trust in official statistics.
Hello, I'm Miles Fletcher, and in this episode of Statistically Speaking we're exploring data ethics and public trust in official statistics. In 2007, 15 years ago to the very day we are recording this, the UK Parliament gave the Office for National Statistics the objective of promoting and safeguarding the production and publication of official statistics that serve the public good. But what does, or should, the “public good” mean? How does the ONS seek to deliver it in practice? Why should the public trust us to act in their interests at a time of exponential growth in data of all kinds? Where are the lines to be drawn between individual privacy and anonymity on the one hand, the potential of data science to improve public services and government policies to achieve better health outcomes, even saving lives, on the other.
Joining me to discuss these topics today are Simon Whitworth, Head of Data Ethics at the UK statistics authority, Pete Stokes, Director of the Integrated Data programme here at the ONS and Luciano Floridi, professor of philosophy and the ethics of information and director of the digital ethics lab at the Oxford Internet Institute.
Professor let's start this big
concept with you. What do you think Parliament meant when it said
that the ONS should serve the public good in this context?
It might have meant many things, and
I suspect that a couple of them must have been in their minds.
First of all, we know that data or information, depending on the
vocabulary, has an enormous value if you know how to use it. And,
collecting it and using it properly for the future of the country,
to implement the right policies, to avoid potential mistakes and to
see things in advance - knowledge is power, information is power.
So, this might have been one of the things that they probably meant
by “public good”. The other meaning, it might be a little bit more
specific...It's when we use the data appropriately, ethically, to
make sure that some sector or some part of the population is not
left behind, to learn who needs more help, to know what help and
when to deliver it, and to whom. So, it's not just a matter of the
whole nation doing better, or at least avoiding problems, but also
specific sectors of the population being helped, and to make sure
that the burden and the advantages are equally distributed among
everybody. That's normally what we mean by public good and
certainly, that analysis is there to serve it.
So there's that dilemma between using
the power of data to actually achieve positive outcomes. And for
government, on the other hand, being seen as overbearing, or
Orwellian, and spying on people through the use of
That would be the risk that sometimes comes under the term “paternalism”, that knowing a lot about your citizens might lead to the temptation of manipulating their lives, their choices, their preferences. I wouldn't over-emphasise this though. The kind of legislation that we have and the constraints, the rules, the double checking, make sure that the advantage is always in view and can more easily be squeezed out of the data that we accumulate, and sometimes the potential abuses and mistakes, the inevitable temptation to do the wrong thing, are kept in check. So yes, the State might use the government’s political power, might misuse data, and so we need to be careful, but I wouldn't list that as my primary worry. My primary worry perhaps, would be under-using the data that we have, or making mistakes inadvertently.
Do you think then, perhaps as a
country, the UK has been too cautious in this area in the
I don't think it has been too
cautious, either intellectually or strategically. There's been a
lot of talking about doing the right thing. I think it's been
slightly cautious, or insufficiently radical, in implementing
policies that have been around for some time. But we now have seen
several governments stating the importance of that analysis,
statistical approaches to evidence, and so on. But I think that
there is more ambition in words than in deeds, so I would like to
see more implementations, more action and less statements. Then the
ambition will be matched by the actions on the
One of the reasons perhaps there
might have been caution in the past is of course concern about how
the public would react to that use of data. What do we know of
public attitudes now in 2022, to how government bodies utilise
I think the impression is that,
depending on whom you ask, whether it is the younger population or
slightly older people my age, people who lived in the 50s
versus my students, they have different attitudes. We're
getting used to the fact that our data are going to be used. The
question is no longer are they going to be used, but more like, how
and who is using them? For what purposes? Am I in charge? Can I do
something if something goes wrong? And I would add also, in terms
of attitude, one particular feature which I don't see sufficiently
stressed, is who is going to help me if something goes wrong?
Because the whole discussion, or discourse, should look more at how
we make people empowered, so that they can check, they have
control, they can go do this, do that. Well, who has the time, the
ability, the skills, and indeed the will, to do that? It's much
easier to say, look, there will be someone, for example the
government, who will protect your rights, who you can
approach, and they will do the right thing for you. Now we're
getting more used to that. And so, I believe that the attitude is
slightly changing towards a more positive outlook, as long as
everything is in place, we are seeing an increasingly positive
attitude towards public use of public data.
Pete, your role is to make this
happen. In practice, to make sure that government bodies, including
the ONS, are making ethical use of data and serving the public
good. Just before we get into that though, explain if you would,
what sort of data is being gathered now, and for what
So we've got a good track record of supporting research use of survey data, that we collect largely in ONS, but on other government departments as well. But over the last few years, there's been an acceleration and a real will to make use of data that have been collected for other purposes. We make a lot of use now of administrative data, these are data that are collected by government not for an analytical purpose but for an operational purpose. For example, data that are collected by HMRC from people when they're collecting tax, or from the Department of Work and Pensions when they're collecting benefits, or from local authorities when they're collecting council tax - all of those administrative data are collected and stored. There's an increasing case to make those data available for analysis which we're looking to support. And then the other new area is what's often called “faster data”, and these data that are typically readily available, usually in the public domain where you get a not so deep insight as you'd get from a survey of administrative data, but you could get a really quick answer. And a good example of that from within the ONS is that we calculate inflation. As a matter of routine, we collect prices from lots of organisations, but you can more quickly do some of that if you can pull some data that are readily available on the internet to give you those quicker indicators, faster information of where prices are rising quickly where they're dropping quickly. There's a place for all of these depending on the type of analysis that you want to do.
This is another area where this
ethical dilemma might arise though isn't it, because when you sit
down with someone and they've agreed to take part in the survey,
they know what they're going in for. But when it comes to other
forms of information, perhaps tax information that you've mentioned
already, some people might think, why do they want to know
When people give their data to HMRC or to DWP as part of the process of receiving a service, like paying tax for example, I think people generally understand what they need to give that department for their specific purpose. When we then want to use this data for a different purpose, there is a larger onus on us to make sure that we are protecting those data, we're protecting the individual and that those data are only being used ethically and in areas of trust, specifically in the public interest. So, it's important that we absolutely protect the anonymity of the individuals, that we make sure where their data are used, and that we are not using the data of those data subjects as individuals, but instead as part of a large data-set to look for trends and patterns within those data. And finally, that the analysis that are then undertaken with them are explicitly and demonstrably in the public interest, that they serve the public good of all parts of society.
And that's how you make the ethical
side of this work in practice, by showing that it can be used to
produce faster and more accurate statistics than we could possibly
get from doing a sample survey?
Yes, exactly, and sample surveys are very, very powerful when you want to know about a specific subject, but they're still relatively small. The largest sample survey that the ONS does is the Labour Force Survey, which collects data from around 90,000 people every quarter. Administrative datasets have got data from millions of people, which enables you to draw your insights not just at a national level and national patterns, but if you want to do some analysis on smaller geographic areas, administrative data gives you the power to do that when surveys simply don't. But, any and all use of data must go through a strict governance process to ensure that the confidentiality of the data subjects be preserved. And not only will the use be clearly and demonstrably in the public interest, but also, will be ethically sound and will stand up to scrutiny in that way as well.
And who gets to see this stuff?
The data are seen by the accredited
researchers that apply to use it. So, a researcher applies to use
the data, they're accredited, and they demonstrate their research
competence and their trustworthiness. They can use those data in a
secure lockdown environment, and they do their analysis. When they
complete their analysis, those can then be published.
Everybody in the country can see the results of those analyses. If
you've taken part in a social survey, or you've contributed some
data to one of the administrative sources that we make available,
you can then see all the results of all the analysis that are done
with those data.
But when you say its data, this is
where the whole process of anonymization is important, isn't it?
Because if I'm an accredited researcher selling it to see names and
addresses, or people's personal, sensitive personal
No, absolutely not. And the researchers only get to see the data that they need for their analysis. And because we have this principle, that the data are being used as an aggregated dataset, you don't need to see people's names or people's addresses. You need to know where people live geographically, in a small or broad area, but not the specific address. You need to know someone's demographic characteristics, but you don't need to know their name, so you can't see their name in the data. And that principle of pseudonymisation, or the de-identification of data, before their used is really important. When the analyses are completed and the outputs are produced, those are then reviewed by an expert team at ONS, and so the data are managed by us to ensure that they are fully protected, wholly non-disclosive, and that it's impossible to identify a member of the public from the published outputs.
Historically, government departments
didn't have perhaps the best record in sharing data around other
bodies for the public benefit in this way. But all that changed,
didn't it? A few years back with a new piece of legislation which
liberalised, to an extent, what the ONS is able to
So, the Digital Economy Act, passed
in 2017, effectively put on a standard footing the ability of other
departments to make their data available for researchers in the
same way that ONS had already been able to do since the 2007 System
Registration Service Act. It gave us parity, which then gave
other departments the ability to make their data available and
allow us to help them to do so, to take the expertise that the ONS
has in terms of managing these data securely, managing access to
them appropriately, accrediting the researchers, checking all the
outputs and so on, to give the benefit of our expertise to the rest
of government. In order that the data that they hold, that has
previously been underutilised arguably, could then be fully used
for analyses to develop policies or deliver services, to improve
understanding of the population or cohorts of the population or
geographic areas of the country, or even sectors of industry or
segments of businesses, for example, in a way that hasn't
previously been possible, and clearly benefits the country
So the aim here is to make full use of a previously untapped reservoir, a vast reservoir, an ocean you might even say, of public data. But who decides what data gets brought in in this way?
We work closely with the departments that control the data, but ultimately, those departments decide what use can be made of their data. So, it is for HMRC, DWP, the Department for Education, it’s for them to decide which data they choose to make available through the Secure Research Service (SRS) or the Integrated Data Service (IDS) that we run in ONS. When they're supportive and recognise the analytical value of their data, we then manage the service where researchers apply to use those data. Those applications are then assessed by ONS first and foremost, we then discuss those requests and the use cases with the data owning departments and say, do you agree this would be a sensible use of your data?
Is there an independent accreditation panel that reports to the UK statistics Authority Board, that assesses the request to use the data is in the public interest, that it serves the public good?
The ethics of the proposal are also
assessed by an independent ethics advisory committee, whether it's
the national statistician's data ethics advisory committee or
another. There's a lot of people involved in the process to make
sure that any and every use of data is in the public
From what we know from the evidence
available, certainly according to the latest public confidence and
official statistics survey - that's a big biannual survey run by
the UK Statistics Authority (UKSA) - I guess for that, and other
reasons, public trust remains high. The Survey said 89% of people
that gave a view trusted ONS, and 90% agreed that personal
information provided to us would be kept confidential. But is there
a chance that we could lose some of that trust now, given that
there is much greater use, and much greater sharing, of admin data?
It should be said that it doesn't give people the chance to opt
I think one of the reasons that trust
has remained high is because of the robust controls we have around
the use of data. Because of the comprehensive set of controls and
the framework that we put around use of data that protects
confidentiality, that ensures that all uses are in the public
interest. And another important component of it is that all use of
data that we support is transparent by default. So, any analyst
wanting to use data that are held by ONS, or from another
department that we support, we publish the details of who those
analysts are, which data they're using, what they're using them
for, and then we require them to publish the outputs as well. And
that transparency helps maintain public trust because if someone
wants to know what their data is being used for, they can go to our
website or directly to the analyst, and they can see the results
tangibly for themselves. Now, they might not always agree that
every use case is explicitly in the public interest, but they can
see the thought process. They can see how the independent panel has
reached that conclusion, and that helps us to retain the trust.
There's a second half of your question around whether there is a
risk of that changing. There is always a risk but we are very alive
to that, which is why as we built the Integrated Data Service, and
we look to make more and more government data available, that we
don't take for granted the trust we've already got, and that we
continue to work with the public, and with privacy groups, to make
sure that as we build the new service and make more data available,
we don't cross a line inadvertently, and we don't allow data to be
used in a way that isn't publicly acceptable. We don't allow data
to be combined in a way that would stretch that comfort. And this
is that kind of proactive approach that we're trying to take, that
we believe will help us retain public trust, despite making more
and more data available.
Professor Floridi, we gave you those
survey results there, with people apparently having confidence in
the system as it stands, but I guess it just takes a couple of
negative episodes to change sentiment rapidly. What examples have
we seen of that, and how have institutions
I think the typical examples are when data are lost, for example, inadvertently because of a breach and there is nobody at fault, but maybe someone introduced the wrong piece of software. It could be a USB, someone may be disgruntled, or someone else has found a way of entering the database - then the public gets very concerned immediately. The other case is when there is the impression, which I think is largely unjustified, but the impression remains, that the data in question are being used unjustly to favour maybe some businesses, or perhaps support some policies rather than others. And I agree with you, unfortunately, as in all cases, reputation is something very hard to build and can be easily lost. It's a bit unfair, but as always in life, building is very difficult but breaking down and destroying is very easy. I think that one important point here to consider is that there is a bit of a record as we move through the years. The work that we're talking about, as we heard, 2017 is only a few years ago, but as we build confidence and a good historical record, mistakes will happen, but they will be viewed as mistakes. In other words, there will be glitches and there will be forgiveness from the public built into the mechanism, because after say 10 or 15 years of good service, if something were to go wrong once or twice, I think the public will be able to understand that yes, things may go wrong, but they will go better next time and the problem will be repaired. So, I would like to see this fragility if you like, this brittle nature of trust, being counterbalanced by a reinforced sense of long-term good service that you know delivers, and delivers more and more and better and better, well then you can also build a little bit of tolerance for the occasional mistakes that are inevitable, as in everything human, they will occur once or twice.
Okay, well, touching my mic for what would in effect be my desk, I can say that I don't think ONS has had an episode such as you describe, but of course, that all depends on the system holding up. And that seems a good point to bring in Simon Whitworth from the UK Statistics Authority, as kind of the overseeing body of all this.
Simon, how does the authority go
about its work? One comment you see quite commonly on social media
when these topics are discussed, is while I might trust the body I
give my data to, I don't trust them not to go off and sell it, and
there have been episodes of data being sold off in that way. I
think it's important to state isn't it, that the ONS certainly
never sells data for private gain. But if you could talk about some
of the other safeguards that the authority seeks to build into the
The big one is around the ethical use
of data. The authority, and Pete referred to this, previously back
in 2017, established something called the National Statisticians
Data Ethics Advisory Committee, and that's an independent committee
of experts in research, ethics and data law. And we take uses of
data to that committee for their independent consideration. And
what's more, we're transparent about the advice that that committee
provides. So, what we have done, what we've made publicly
available, is a number of ethical principles which guide our work.
And that committee provide independent guidance on a particular use
of data, be they linking administrative data, doing new surveys,
using survey data, whatever they may be, they consider projects
from across this statistical system against those ethical
principles and provide independent advice and guidance to ensure
that we keep within those ethical principles. So that's one thing
we do, but there's also a big programme of work that comes from
something that we've set up called the UK Statistics Authority
Centre for Applied Data Ethics, and what that centre is trying to
do is to really empower analysts and data users to do that work in
ethically appropriate ways, to do their work in ways that are
consistent with those ethical principles. And that centres around
trying to promote a culture of ethics by design, throughout the
lifecycle of different uses of data, be they the collection of data
or the uses of administrative data. We've provided lots of guidance
pieces recently, which are available on our website, around
particular uses of data - geospatial data, uses of machine learning
- we've provided guidance on public good, and we're providing
training to support all of those guidance pieces. And the aim there
is, as I say, to empower analysts from across the analytical
system, to be able to think about ethics in their work and identify
ethical risks and then mitigate those ethical risks.
You mentioned the Ethics Committee, which is probably not a well-known body, independent experts though you say, these are not civil servants. These are academics and experts in the field. Typically, when do they caution researchers and statisticians, when do they send people back to think again, typically?
It's not so much around what people
do, it's about making sure how we do it is in line with those
ethical principles. So, for example, they may want better
articulations of the public good and consideration of potential
harms. Public good for one section of society might equal public
harm to another section of society. It's very often navigating that
and asking for consideration of what can be done to mitigate those
potential public harms and therefore increase the public good of a
piece of research. The other thing I would say is being
transparent. Peter alluded to this earlier, being transparent
around data usage and taking on board wherever possible, the views
of the public throughout the research process. Encouraging
researchers as they're developing the research, speaking to the
public about what they're doing, being clear and being transparent
about that and taking on board feedback that they receive from the
public whose data they're using. I would say that they're the two
biggest areas where an estate provides comments and really useful
and valuable feedback to the analytical community.
Everyone can go online and see the
work of the committee, to get the papers and minutes and so forth.
And this is all happening openly and in a comfortable
Yes, absolutely. We publish minutes of the meetings and outcomes from those meetings on the UK Statistics Authority’s website. We also make a range of presentations over the course of the year around the work of the committee and the supporting infrastructure that supports the work because we have developed a self-assessment tool which allows analysts at the research design phase to consider those ethical principles, and different components of the ethical principles, against what they're trying to do. And that's proved to be extremely popular as a useful framework to enable analysts to think through some of these issues, and I suppose move ethics from theory to something a bit more applied. In terms of their work last year, over 300 projects from across the analytical community, both within government and academia, used that ethics self-assessment tool, and the guidance and training that sits behind it is again available on our website.
I'm conscious of sounding just a little bit sceptical, and putting you through your paces to explain how the accountability and ethical oversight works, but can you think of some examples where there's been ethical scrutiny, and research outcomes having satisfied that process, have gone on to produce some really valuable benefits?
ONS has done a number of surveys with victims of child sex abuse to inform various inquiries and various government policies. They have some very sensitive ethical issues that require real thinking about and careful handling. You know, the benefits of that research has been hugely important in showing the extent of child sex abuse that perhaps previously was unreported and providing statistics to both policymakers and charities around experiences of child sex abuse. In terms of administrative data, yes, there are numerous big data linkage projects that have come to ONS and have been considered by ONS, in particular, linkage surveys that follow people over time. Linkages done over time provide tremendous analytical value, but of course need some careful handling to ensure that access to that data is provided in an ethically appropriate way, and that we're being transparent. So those are the two I think of, big things we are thinking about in an ethically appropriate way. And being able to do them in an ethically appropriate way has really allowed us to unleash the analytical value of those particular methods, but in a way that takes the public with us and generates that public trust.
Pete, you are part of the
organisation that in fact runs an award scheme to recognise some of
the outstanding examples of the secure use of
We do, and it's another part of promoting the public benefit that comes from use of data. Every year we invite the analysts who use the Secure Research Service (SRS), or other similar services around the country, to put themselves forward for research excellence awards. So that we can genuinely showcase the best projects from across the country, but then also pick up these real examples of where people have made fantastic use of data, and innovative use of data, really demonstrating the public good. We've got the latest of those award ceremonies in October this year, and it's an open event so anybody who is interested in seeing the results of that, the use of data in that way, they would be very welcome to attend.
Give us a couple of examples of recent winners, what they've delivered.
One of the first award winners was looking at the efficacy of testing that was done for men who may or may not have been suffering from prostate cancer, and it analysed when if a person was given this test, what was the likelihood of its accuracy, and therefore whether they should start treatment, and the research was able to demonstrate that actually, given the efficacy, that it wasn't appropriate to treat everyone who got a positive test, because there was risk of doing more harm than good if it had persisted, which is really valuable. But this year, we'll be seeing really good uses of data in response to the pandemic, for example, tying this back to the ethics, when you talk about the use of data made during the pandemic in retrospect, it's clearly ethical, it's clearly in the public interest. But, at the start of the pandemic, we had to link together data from the NHS on who was suffering from COVID which was really good in terms of the basic details of who had COVID and how seriously and sadly, whether they died, but it missed a lot of other detail that helps us to understand why.
We then linked those data with data from the 2011 Census where you can get data on people's ethnic group, on their occupation, on their living conditions, on the type and size of the family they live with, which enable much richer insights, but most importantly, enabled government to be able to target its policy at those groups who were reluctant to get the vaccination to understand whether people were suffering from COVID due to their ethnicity, or whether it was actually more likely to be linked to the type of occupation they did. Really, really valuable insights that came from being able to link these data together, which now sounds sensible, but at the time did have those serious ethical questions. Can we take these two big datasets that people didn't imagine we could link together and and keep the analyses ethically sound and in the public interest. What’s what we were able to do.
That's certainly a powerful example. But before we pat ourselves on the back too much for that survey I mentioned, some of the research we've been doing at the ONS does suggest that there is nevertheless a hardcore cohort of sceptics on all of this. Particularly, it is suggested, among the older age groups, the over 55’s in particular. I mentioned the social media reaction you see as well. Kind of ironic you might think, given the amount of data that big social media platforms and other private organisations hold on people.
Professor, do you think there's a
paradox at work there? People are apparently inclined not to trust
public bodies, accountable public bodies, but will trust the big
social media and internet giants? Or is it just a question of
knowledge, do you think?
I think it might be partly knowledge, the better you know the system, who is doing what, and also the ability to differentiate between the different organisations and how they operate, under what kind of constraints, how reliable they are, etc, versus for example, commercial uses, advertisement driven, etc.
The more you know, and it happens to be almost inevitably the younger you are, the more you might be able to see with a different kind of degree of trust, but also almost indifference, toward the fact that the data are being collected and what kind of data are being collected. I think the statistics that you were mentioning seem to be having an overlapping feature. A less young population, a less knowledgeable population, is also the population that is less used to social media, sharing, using data daily, etc. And is also almost inevitably a little bit more sceptical when it comes to giving the data for public good, or knowing that something is going to be done by, for example, cross referencing different databases.
On the other side, you find the
slightly younger, the more socially active, the kids who have been
growing with social media - and they are not even on Facebook these
days anymore, as my students remind me, Facebook is for people like
me - so let's get things right now, when it comes to
Tiktok, they know that they are being monitored, they know
that the data is going to be used all over the place. There is a
mix of inevitability, a sense of who cares, but also a sense of,
that's okay. I mean data is the air you breathe, the energy you
must have, it's like electricity. We don't get worried every time
we turn on the electricity on in the house because we might die if
someone has unreliably connected the wires, we just turn it on and
trust that everything is going to be okay. So, I think that as we
move on with our population becoming more and more well acquainted
with technology, and who does work with the data and what rules are
in place, as we heard before, from Simon and Pete, I mean, there
are plenty of frameworks and robust ways of double checking that
nothing goes wrong, and if something goes wrong, it gets rectified
as quickly as possible. But the more we have that, I think the less
the sceptics will have a real chance of being any more than people
who subscribe to the flat earth theory. But we need to consider
that the point you made is relevant. A bit of extra
education on the digital divide, which we mentioned implicitly
in our conversation today. Who is benefiting from what? And on
which side of the digital innovation are these people placed? I
think that needs to be addressed precisely now, to avoid scepticism
which might be not grounded.
I hope through this interesting discussion we've managed to go some way to explaining how it's all done, and why it's so very important. Simon Whitworth, Pete Stokes, Professor Luciano Floridi, thank you very much indeed for taking part in Statistically Speaking today.
I'm Miles Fletcher and thanks for listening. You can subscribe to new episodes of this podcast on Spotify, Apple podcasts and all the other major podcast platforms. You can comment or ask us a question on Twitter at @ONSFocus. Our producer at the ONS is Julia Short. Until next time, goodbye