Aug 22, 2022
In this episode Miles is joined by Professor Luciano Floridi of Oxford University; Simon Whitworth of the UK Statistics Authority; and Pete Stokes from the ONS to talk about data ethics and public trust in official statistics.
TRANSCRIPT
MILES FLETCHER
Hello, I'm Miles Fletcher, and in this episode of Statistically Speaking we're exploring data ethics and public trust in official statistics. In 2007, 15 years ago to the very day we are recording this, the UK Parliament gave the Office for National Statistics the objective of promoting and safeguarding the production and publication of official statistics that serve the public good. But what does, or should, the “public good” mean? How does the ONS seek to deliver it in practice? Why should the public trust us to act in their interests at a time of exponential growth in data of all kinds? Where are the lines to be drawn between individual privacy and anonymity on the one hand, the potential of data science to improve public services and government policies to achieve better health outcomes, even saving lives, on the other.
Joining me to discuss these topics today are Simon Whitworth, Head of Data Ethics at the UK statistics authority, Pete Stokes, Director of the Integrated Data programme here at the ONS and Luciano Floridi, professor of philosophy and the ethics of information and director of the digital ethics lab at the Oxford Internet Institute.
Professor let's start this big
concept with you. What do you think Parliament meant when it said
that the ONS should serve the public good in this context?
LUCIANO FLORIDI
It might have meant many things, and
I suspect that a couple of them must have been in their minds.
First of all, we know that data or information, depending on the
vocabulary, has an enormous value if you know how to use it. And,
collecting it and using it properly for the future of the country,
to implement the right policies, to avoid potential mistakes and to
see things in advance - knowledge is power, information is power.
So, this might have been one of the things that they probably meant
by “public good”. The other meaning, it might be a little bit more
specific...It's when we use the data appropriately, ethically, to
make sure that some sector or some part of the population is not
left behind, to learn who needs more help, to know what help and
when to deliver it, and to whom. So, it's not just a matter of the
whole nation doing better, or at least avoiding problems, but also
specific sectors of the population being helped, and to make sure
that the burden and the advantages are equally distributed among
everybody. That's normally what we mean by public good and
certainly, that analysis is there to serve it.
MF
So there's that dilemma between using
the power of data to actually achieve positive outcomes. And for
government, on the other hand, being seen as overbearing, or
Orwellian, and spying on people through the use of
data.
LF
That would be the risk that sometimes
comes under the term “paternalism”, that knowing a lot about your
citizens might lead to the temptation of manipulating their lives,
their choices, their preferences. I wouldn't over-emphasise this
though. The kind of legislation that we have and the constraints,
the rules, the double checking, make sure that the advantage is
always in view and can more easily be squeezed out of the data that
we accumulate, and sometimes the potential abuses and mistakes, the
inevitable temptation to do the wrong thing, are kept in check. So
yes, the State might use the government’s political power, might
misuse data, and so we need to be careful, but I wouldn't list that
as my primary worry. My primary worry perhaps, would be under-using
the data that we have, or making mistakes
inadvertently.
MF
Do you think then, perhaps as a
country, the UK has been too cautious in this area in the
past?
LF
I don't think it has been too
cautious, either intellectually or strategically. There's been a
lot of talking about doing the right thing. I think it's been
slightly cautious, or insufficiently radical, in implementing
policies that have been around for some time. But we now have seen
several governments stating the importance of that analysis,
statistical approaches to evidence, and so on. But I think that
there is more ambition in words than in deeds, so I would like to
see more implementations, more action and less statements. Then the
ambition will be matched by the actions on the
ground.
MF
One of the reasons perhaps there
might have been caution in the past is of course concern about how
the public would react to that use of data. What do we know of
public attitudes now in 2022, to how government bodies utilise
data?
LF
I think the impression is that,
depending on whom you ask, whether it is the younger population or
slightly older people my age, people who lived in the 50s
versus my students, they have different attitudes. We're
getting used to the fact that our data are going to be used. The
question is no longer are they going to be used, but more like, how
and who is using them? For what purposes? Am I in charge? Can I do
something if something goes wrong? And I would add also, in terms
of attitude, one particular feature which I don't see sufficiently
stressed, is who is going to help me if something goes wrong?
Because the whole discussion, or discourse, should look more at how
we make people empowered, so that they can check, they have
control, they can go do this, do that. Well, who has the time, the
ability, the skills, and indeed the will, to do that? It's much
easier to say, look, there will be someone, for example the
government, who will protect your rights, who you can
approach, and they will do the right thing for you. Now we're
getting more used to that. And so, I believe that the attitude is
slightly changing towards a more positive outlook, as long as
everything is in place, we are seeing an increasingly positive
attitude towards public use of public data.
MF
Pete, your role is to make this
happen. In practice, to make sure that government bodies, including
the ONS, are making ethical use of data and serving the public
good. Just before we get into that though, explain if you would,
what sort of data is being gathered now, and for what
purposes?
PETE
STOKES
So we've got a good track record of
supporting research use of survey data, that we collect largely in
ONS, but on other government departments as well. But over the last
few years, there's been an acceleration and a real will to make use
of data that have been collected for other purposes. We make a lot
of use now of administrative data, these are data that are
collected by government not for an analytical purpose but for an
operational purpose. For example, data that are collected by HMRC
from people when they're collecting tax, or from the Department of
Work and Pensions when they're collecting benefits, or from local
authorities when they're collecting council tax - all of those
administrative data are collected and stored. There's an increasing
case to make those data available for analysis which we're looking
to support. And then the other new area is what's often called
“faster data”, and these data that are typically readily available,
usually in the public domain where you get a not so deep
insight as you'd get from a survey of administrative data, but you
could get a really quick answer. And a good example of that from
within the ONS is that we calculate inflation. As a matter of
routine, we collect prices from lots of organisations, but you can
more quickly do some of that if you can pull some data that are
readily available on the internet to give you those quicker
indicators, faster information of where prices are rising quickly
where they're dropping quickly. There's a place for all of these
depending on the type of analysis that you want to
do.
MF
This is another area where this
ethical dilemma might arise though isn't it, because when you sit
down with someone and they've agreed to take part in the survey,
they know what they're going in for. But when it comes to other
forms of information, perhaps tax information that you've mentioned
already, some people might think, why do they want to know
that?
PS
When people give their data to HMRC or
to DWP as part of the process of receiving a service,
like paying tax for example, I think people generally
understand what they need to give that department for their
specific purpose. When we then want to use this data for a
different purpose, there is a larger onus on us to make sure that
we are protecting those data, we're protecting the individual and
that those data are only being used ethically and in areas of
trust, specifically in the public interest. So, it's important that
we absolutely protect the anonymity of the individuals, that we
make sure where their data are used, and that we are not using the
data of those data subjects as individuals, but instead as part of
a large data-set to look for trends and patterns within those data.
And finally, that the analysis that are then undertaken with them
are explicitly and demonstrably in the public interest, that they
serve the public good of all parts of society.
MF
And that's how you make the ethical
side of this work in practice, by showing that it can be used to
produce faster and more accurate statistics than we could possibly
get from doing a sample survey?
PS
Yes, exactly, and sample surveys are
very, very powerful when you want to know about a specific subject,
but they're still relatively small. The largest sample survey that
the ONS does is the Labour Force Survey, which collects data from
around 90,000 people every quarter. Administrative datasets have
got data from millions of people, which enables you to draw your
insights not just at a national level and national patterns, but if
you want to do some analysis on smaller geographic areas,
administrative data gives you the power to do that when surveys
simply don't. But, any and all use of data must go through a strict
governance process to ensure that the confidentiality of the data
subjects be preserved. And not only will the use be clearly and
demonstrably in the public interest, but also, will be ethically
sound and will stand up to scrutiny in that way as
well.
MF
And who gets to see this stuff?
PS
The data are seen by the accredited
researchers that apply to use it. So, a researcher applies to use
the data, they're accredited, and they demonstrate their research
competence and their trustworthiness. They can use those data in a
secure lockdown environment, and they do their analysis. When they
complete their analysis, those can then be published.
Everybody in the country can see the results of those analyses. If
you've taken part in a social survey, or you've contributed some
data to one of the administrative sources that we make available,
you can then see all the results of all the analysis that are done
with those data.
MF
But when you say its data, this is
where the whole process of anonymization is important, isn't it?
Because if I'm an accredited researcher selling it to see names and
addresses, or people's personal, sensitive personal
information.
PS
No, absolutely not. And the researchers
only get to see the data that they need for their analysis. And
because we have this principle, that the data are being used as an
aggregated dataset, you don't need to see people's names or
people's addresses. You need to know where people live
geographically, in a small or broad area, but not the specific
address. You need to know someone's demographic characteristics,
but you don't need to know their name, so you can't see their name
in the data. And that principle of pseudonymisation, or the
de-identification of data, before their used is really important.
When the analyses are completed and the outputs are produced, those
are then reviewed by an expert team at ONS, and so the data are
managed by us to ensure that they are fully protected, wholly
non-disclosive, and that it's impossible to identify a member of
the public from the published outputs.
MF
Historically, government departments
didn't have perhaps the best record in sharing data around other
bodies for the public benefit in this way. But all that changed,
didn't it? A few years back with a new piece of legislation which
liberalised, to an extent, what the ONS is able to
do.
PS
So, the Digital Economy Act, passed
in 2017, effectively put on a standard footing the ability of other
departments to make their data available for researchers in the
same way that ONS had already been able to do since the 2007 System
Registration Service Act. It gave us parity, which then gave
other departments the ability to make their data available and
allow us to help them to do so, to take the expertise that the ONS
has in terms of managing these data securely, managing access to
them appropriately, accrediting the researchers, checking all the
outputs and so on, to give the benefit of our expertise to the rest
of government. In order that the data that they hold, that has
previously been underutilised arguably, could then be fully used
for analyses to develop policies or deliver services, to improve
understanding of the population or cohorts of the population or
geographic areas of the country, or even sectors of industry or
segments of businesses, for example, in a way that hasn't
previously been possible, and clearly benefits the country
overall.
MF
So the aim here is to make full use of a
previously untapped reservoir, a vast reservoir, an ocean you might
even say, of public data. But who decides what data gets brought in
in this way?
PS
We work closely with the departments that control the data, but ultimately, those departments decide what use can be made of their data. So, it is for HMRC, DWP, the Department for Education, it’s for them to decide which data they choose to make available through the Secure Research Service (SRS) or the Integrated Data Service (IDS) that we run in ONS. When they're supportive and recognise the analytical value of their data, we then manage the service where researchers apply to use those data. Those applications are then assessed by ONS first and foremost, we then discuss those requests and the use cases with the data owning departments and say, do you agree this would be a sensible use of your data?
MF
Is there an independent accreditation panel that reports to the UK statistics Authority Board, that assesses the request to use the data is in the public interest, that it serves the public good?
PS
The ethics of the proposal are also
assessed by an independent ethics advisory committee, whether it's
the national statistician's data ethics advisory committee or
another. There's a lot of people involved in the process to make
sure that any and every use of data is in the public
interest.
MF
From what we know from the evidence
available, certainly according to the latest public confidence and
official statistics survey - that's a big biannual survey run by
the UK Statistics Authority (UKSA) - I guess for that, and other
reasons, public trust remains high. The Survey said 89% of people
that gave a view trusted ONS, and 90% agreed that personal
information provided to us would be kept confidential. But is there
a chance that we could lose some of that trust now, given that
there is much greater use, and much greater sharing, of admin data?
It should be said that it doesn't give people the chance to opt
out.
PS
I think one of the reasons that trust
has remained high is because of the robust controls we have around
the use of data. Because of the comprehensive set of controls and
the framework that we put around use of data that protects
confidentiality, that ensures that all uses are in the public
interest. And another important component of it is that all use of
data that we support is transparent by default. So, any analyst
wanting to use data that are held by ONS, or from another
department that we support, we publish the details of who those
analysts are, which data they're using, what they're using them
for, and then we require them to publish the outputs as well. And
that transparency helps maintain public trust because if someone
wants to know what their data is being used for, they can go to our
website or directly to the analyst, and they can see the results
tangibly for themselves. Now, they might not always agree that
every use case is explicitly in the public interest, but they can
see the thought process. They can see how the independent panel has
reached that conclusion, and that helps us to retain the trust.
There's a second half of your question around whether there is a
risk of that changing. There is always a risk but we are very alive
to that, which is why as we built the Integrated Data Service, and
we look to make more and more government data available, that we
don't take for granted the trust we've already got, and that we
continue to work with the public, and with privacy groups, to make
sure that as we build the new service and make more data available,
we don't cross a line inadvertently, and we don't allow data to be
used in a way that isn't publicly acceptable. We don't allow data
to be combined in a way that would stretch that comfort. And this
is that kind of proactive approach that we're trying to take, that
we believe will help us retain public trust, despite making more
and more data available.
MF
Professor Floridi, we gave you those
survey results there, with people apparently having confidence in
the system as it stands, but I guess it just takes a couple of
negative episodes to change sentiment rapidly. What examples have
we seen of that, and how have institutions
responded?
LF
I think the typical examples are when
data are lost, for example, inadvertently because of a breach and
there is nobody at fault, but maybe someone introduced the wrong
piece of software. It could be a USB, someone may be disgruntled,
or someone else has found a way of entering the database - then the
public gets very concerned immediately. The other case is when
there is the impression, which I think is largely unjustified, but
the impression remains, that the data in question are being used
unjustly to favour maybe some businesses, or perhaps support some
policies rather than others. And I agree with you, unfortunately,
as in all cases, reputation is something very hard to build and can
be easily lost. It's a bit unfair, but as always in life, building
is very difficult but breaking down and destroying is very easy. I
think that one important point here to consider is that there is a
bit of a record as we move through the years. The work that we're
talking about, as we heard, 2017 is only a few years ago, but as we
build confidence and a good historical record, mistakes will
happen, but they will be viewed as mistakes. In other words,
there will be glitches and there will be forgiveness from the
public built into the mechanism, because after say 10 or 15 years
of good service, if something were to go wrong once or twice, I
think the public will be able to understand that yes, things may go
wrong, but they will go better next time and the problem will be
repaired. So, I would like to see this fragility if you like, this
brittle nature of trust, being counterbalanced by a reinforced
sense of long-term good service that you know delivers, and
delivers more and more and better and better, well then you can
also build a little bit of tolerance for the occasional mistakes
that are inevitable, as in everything human, they will occur once
or twice.
MF
Okay, well, touching my mic for what
would in effect be my desk, I can say that I don't think ONS
has had an episode such as you describe, but of course, that all
depends on the system holding up. And that seems a good point to
bring in Simon Whitworth from the UK Statistics Authority, as kind
of the overseeing body of all this.
Simon, how does the authority go
about its work? One comment you see quite commonly on social media
when these topics are discussed, is while I might trust the body I
give my data to, I don't trust them not to go off and sell it, and
there have been episodes of data being sold off in that way. I
think it's important to state isn't it, that the ONS certainly
never sells data for private gain. But if you could talk about some
of the other safeguards that the authority seeks to build into the
system.
SIMON
WHITWORTH
The big one is around the ethical use
of data. The authority, and Pete referred to this, previously back
in 2017, established something called the National Statisticians
Data Ethics Advisory Committee, and that's an independent committee
of experts in research, ethics and data law. And we take uses of
data to that committee for their independent consideration. And
what's more, we're transparent about the advice that that committee
provides. So, what we have done, what we've made publicly
available, is a number of ethical principles which guide our work.
And that committee provide independent guidance on a particular use
of data, be they linking administrative data, doing new surveys,
using survey data, whatever they may be, they consider projects
from across this statistical system against those ethical
principles and provide independent advice and guidance to ensure
that we keep within those ethical principles. So that's one thing
we do, but there's also a big programme of work that comes from
something that we've set up called the UK Statistics Authority
Centre for Applied Data Ethics, and what that centre is trying to
do is to really empower analysts and data users to do that work in
ethically appropriate ways, to do their work in ways that are
consistent with those ethical principles. And that centres around
trying to promote a culture of ethics by design, throughout the
lifecycle of different uses of data, be they the collection of data
or the uses of administrative data. We've provided lots of guidance
pieces recently, which are available on our website, around
particular uses of data - geospatial data, uses of machine learning
- we've provided guidance on public good, and we're providing
training to support all of those guidance pieces. And the aim there
is, as I say, to empower analysts from across the analytical
system, to be able to think about ethics in their work and identify
ethical risks and then mitigate those ethical risks.
MF
You mentioned the Ethics Committee,
which is probably not a well-known body, independent experts though
you say, these are not civil servants. These are academics and
experts in the field. Typically, when do they caution researchers
and statisticians, when do they send people back to think again,
typically?
SW
It's not so much around what people
do, it's about making sure how we do it is in line with those
ethical principles. So, for example, they may want better
articulations of the public good and consideration of potential
harms. Public good for one section of society might equal public
harm to another section of society. It's very often navigating that
and asking for consideration of what can be done to mitigate those
potential public harms and therefore increase the public good of a
piece of research. The other thing I would say is being
transparent. Peter alluded to this earlier, being transparent
around data usage and taking on board wherever possible, the views
of the public throughout the research process. Encouraging
researchers as they're developing the research, speaking to the
public about what they're doing, being clear and being transparent
about that and taking on board feedback that they receive from the
public whose data they're using. I would say that they're the two
biggest areas where an estate provides comments and really useful
and valuable feedback to the analytical community.
MF
Everyone can go online and see the
work of the committee, to get the papers and minutes and so forth.
And this is all happening openly and in a comfortable
way?
SW
Yes, absolutely. We publish minutes of
the meetings and outcomes from those meetings on the UK Statistics
Authority’s website. We also make a range of presentations over the
course of the year around the work of the committee and the
supporting infrastructure that supports the work because we have
developed a self-assessment tool which allows analysts at the
research design phase to consider those ethical principles, and
different components of the ethical principles, against what
they're trying to do. And that's proved to be extremely popular as
a useful framework to enable analysts to think through some of
these issues, and I suppose move ethics from theory to something a
bit more applied. In terms of their work last year, over 300
projects from across the analytical community, both within
government and academia, used that ethics self-assessment tool, and
the guidance and training that sits behind it is again available on
our website.
MF
I'm conscious of sounding just a little
bit sceptical, and putting you through your paces to explain how
the accountability and ethical oversight works, but can you think
of some examples where there's been ethical scrutiny, and research
outcomes having satisfied that process, have gone on to produce
some really valuable benefits?
SW
ONS has done a number of surveys with
victims of child sex abuse to inform various inquiries and various
government policies. They have some very sensitive ethical issues
that require real thinking about and careful handling. You know,
the benefits of that research has been hugely important in showing
the extent of child sex abuse that perhaps previously was
unreported and providing statistics to both policymakers and
charities around experiences of child sex abuse. In terms of
administrative data, yes, there are numerous big data linkage
projects that have come to ONS and have been considered by ONS, in
particular, linkage surveys that follow people over time. Linkages
done over time provide tremendous analytical value, but of course
need some careful handling to ensure that access to that data is
provided in an ethically appropriate way, and that we're being
transparent. So those are the two I think of, big things we are
thinking about in an ethically appropriate way. And being able to
do them in an ethically appropriate way has really allowed us to
unleash the analytical value of those particular methods, but in a
way that takes the public with us and generates that public
trust.
MF
Pete, you are part of the
organisation that in fact runs an award scheme to recognise some of
the outstanding examples of the secure use of
data?
PS
We do, and it's another part of
promoting the public benefit that comes from use of data. Every
year we invite the analysts who use the Secure Research Service
(SRS), or other similar services around the country, to put
themselves forward for research excellence awards. So that we can
genuinely showcase the best projects from across the country, but
then also pick up these real examples of where people have made
fantastic use of data, and innovative use of data, really
demonstrating the public good. We've got the latest of those award
ceremonies in October this year, and it's an open event so anybody
who is interested in seeing the results of that, the use of data in
that way, they would be very welcome to attend.
MF
Give us a couple of examples of recent
winners, what they've delivered.
PS
One of the first award winners was
looking at the efficacy of testing that was done for men who may or
may not have been suffering from prostate cancer, and it analysed
when if a person was given this test, what was the
likelihood of its accuracy, and therefore whether they should
start treatment, and the research was able to demonstrate that
actually, given the efficacy, that it wasn't appropriate to treat
everyone who got a positive test, because there was risk of doing
more harm than good if it had persisted, which is really valuable.
But this year, we'll be seeing really good uses of data in response
to the pandemic, for example, tying this back to the ethics, when
you talk about the use of data made during the pandemic in
retrospect, it's clearly ethical, it's clearly in the public
interest. But, at the start of the pandemic, we had to link
together data from the NHS on who was suffering from COVID which
was really good in terms of the basic details of who had COVID and
how seriously and sadly, whether they died, but it missed a lot of
other detail that helps us to understand why.
We then linked those data with data from the 2011 Census where you can get data on people's ethnic group, on their occupation, on their living conditions, on the type and size of the family they live with, which enable much richer insights, but most importantly, enabled government to be able to target its policy at those groups who were reluctant to get the vaccination to understand whether people were suffering from COVID due to their ethnicity, or whether it was actually more likely to be linked to the type of occupation they did. Really, really valuable insights that came from being able to link these data together, which now sounds sensible, but at the time did have those serious ethical questions. Can we take these two big datasets that people didn't imagine we could link together and and keep the analyses ethically sound and in the public interest. What’s what we were able to do.
MF
That's certainly a powerful example. But before we pat ourselves on the back too much for that survey I mentioned, some of the research we've been doing at the ONS does suggest that there is nevertheless a hardcore cohort of sceptics on all of this. Particularly, it is suggested, among the older age groups, the over 55’s in particular. I mentioned the social media reaction you see as well. Kind of ironic you might think, given the amount of data that big social media platforms and other private organisations hold on people.
Professor, do you think there's a
paradox at work there? People are apparently inclined not to trust
public bodies, accountable public bodies, but will trust the big
social media and internet giants? Or is it just a question of
knowledge, do you think?
LF
I think it might be partly knowledge,
the better you know the system, who is doing what, and also the
ability to differentiate between the different organisations and
how they operate, under what kind of constraints, how reliable they
are, etc, versus for example, commercial uses, advertisement
driven, etc.
The more you know, and it happens to be almost inevitably the younger you are, the more you might be able to see with a different kind of degree of trust, but also almost indifference, toward the fact that the data are being collected and what kind of data are being collected. I think the statistics that you were mentioning seem to be having an overlapping feature. A less young population, a less knowledgeable population, is also the population that is less used to social media, sharing, using data daily, etc. And is also almost inevitably a little bit more sceptical when it comes to giving the data for public good, or knowing that something is going to be done by, for example, cross referencing different databases.
On the other side, you find the
slightly younger, the more socially active, the kids who have been
growing with social media - and they are not even on Facebook these
days anymore, as my students remind me, Facebook is for people like
me - so let's get things right now, when it comes to
Tiktok, they know that they are being monitored, they know
that the data is going to be used all over the place. There is a
mix of inevitability, a sense of who cares, but also a sense of,
that's okay. I mean data is the air you breathe, the energy you
must have, it's like electricity. We don't get worried every time
we turn on the electricity on in the house because we might die if
someone has unreliably connected the wires, we just turn it on and
trust that everything is going to be okay. So, I think that as we
move on with our population becoming more and more well acquainted
with technology, and who does work with the data and what rules are
in place, as we heard before, from Simon and Pete, I mean, there
are plenty of frameworks and robust ways of double checking that
nothing goes wrong, and if something goes wrong, it gets rectified
as quickly as possible. But the more we have that, I think the less
the sceptics will have a real chance of being any more than people
who subscribe to the flat earth theory. But we need to consider
that the point you made is relevant. A bit of extra
education on the digital divide, which we mentioned implicitly
in our conversation today. Who is benefiting from what? And on
which side of the digital innovation are these people placed? I
think that needs to be addressed precisely now, to avoid scepticism
which might be not grounded.
MF
I hope through this interesting discussion we've managed to go some way to explaining how it's all done, and why it's so very important. Simon Whitworth, Pete Stokes, Professor Luciano Floridi, thank you very much indeed for taking part in Statistically Speaking today.
I'm Miles Fletcher and thanks for listening. You can subscribe to new episodes of this podcast on Spotify, Apple podcasts and all the other major podcast platforms. You can comment or ask us a question on Twitter at @ONSFocus. Our producer at the ONS is Julia Short. Until next time, goodbye