There is little value in having so much granularity you can no longer see the wood for the trees
Photo: mdworschak/iStock
In data we trust?
The last year or two have seen the launch of several new data dashboards in the sector, but do you trust them and what are the pitfalls? asks Sarah Thelwall of financial benchmarking company, MyCake.
Interactive data publication and interrogation tools offer exploration and analysis opportunities way beyond the capacity of a static report of findings from research and analysis. But with them come added responsibilities – particularly if the data and the publisher are to be entrusted with what is often sensitive or confidential information.
Over the course of nearly two decades of such work we’ve built up a set of principles that ensure our organisation – MyCake – can be trusted to produce analysis that is as accurate as the data allows, to know where the data may create bias and the honesty to say where it has limits.
This article takes those principles and flips them to a users’ perspective so you can use them to help evaluate any report, data tool or interactive dashboard, to think about whether such new tools meet your needs and whether the data they contain can be trusted.
Since 2007, MyCake has been analysing financial data on UK non-profits. We started with a client base of arts and culture organisations, looking at income diversification beyond grants. Now that has expanded to all types of non-profits in the UK and data on hundreds of thousands of organisations.
We think it’s the biggest data set of its kind and, more importantly, it is intended to be accessible and relevant to non-data and non-finance sector professionals.
The basics of how data is structured
When you look at a contents page of a or a dashboard, can you see the headline versus the granularity? If you go to the detail and to find how the data is reported, can you see:
- The publication year(s) of the data
- The number of organisations in the dataset
- The changes in number between years – and therefore whether you can compare between them
- Totals or averages
- If averages, can you split the data into (income, say) bands, and
- Can you see how one set of data or dashboard relates to others?
All this tells you whether the data is coherent and complete and whether the publishers have worked out how they relate to each other. Ideally, data publishers will have worked out relationships between the various metrics and will articulate this to the reader.
If what you see is a series of disparate pieces of data with varying definitions and timeframes it will be very hard to know what the data is telling you. If you only see a series of totals with no granularity, it will be tricky to see yourself in the data. We describe such unconnected data as ‘data orphans’ – a frustrating experience.
All this helps you to know how much energy to invest in engaging with the data. You should be aiming for: “This is great, if I go here I can find the detail.”
What about the quality of the data?
Thinking about what data has been harnessed to produce a piece of analysis, how do you assess the quality of the raw data? To what extent are the publishers describing the data along with any known data gaps or inconsistencies? And how much clarity is there about how any metric or calculation has been applied to it?
These questions matter because no dataset is perfect. If there is data publishers are not aware of, or have not communicated the limitations of, or have adjusted for the imperfections in the way data tools are built, then it’s difficult for users to decide what weight they can place on any conclusions they reach from the use of the data.
If the answers aren’t clear in the way the data is laid out, can you question the publishers? Transparency about data quality and opportunities to ask questions speak to a desire on the part of the publisher to build trust and be accountable.
Trust that data will be used responsibly
When we answer surveys, supply data to a funder or make a contribution to research, we do so in good faith, assuming it will be used responsibly. While any contribution is not always rewarded with the publication of results in a format useful to those who submitted the data, it is certainly true that data dashboards can deliver in ways that static reports cannot.
Can you see yourself/your organisation in the data?
It is reasonable to expect any data output that includes your organisation is structured in such a way that you can identify your peer group in it. In practical terms, this means data is structured in bands such as turnover ranges, say, so smaller organisations can simply look at results for their output size (or geographic area, artform etc).
It’s almost impossible for a single organisation to usefully compare themselves to a whole sector even if there is a median it will cover a highly heterogenous group. So, data structured into similar groups makes it much easier for a single organisation to see themselves in the data as well as view findings for a sector as a whole.
Trust that the metrics enable headlines and detail
Having said that granularity is great, it’s unhelpful to be confronted with every possible data point in a vast array. It’s better for publishers to use ‘least data for most utility’ approach. In that case, rather than throwing it all at you, they’ve thought about a hierarchy of usefulness of information and a route to get into the detail.
This will mean some reductivism and aggregation. For example, you don’t need to see every type of ticket that can be purchased from every organisation in the dataset. More useful is some broad bands of type and for the publishers to do the matching of the raw data into these bands. It improves read across between similar activities.
There is little value in having so much granularity you can no longer see the wood for the trees. You want data publishers to have thought about what will be useful, not to simply report the variety they find. You’re looking for synthesis as well as analysis.
Trust in confidentiality and security, particularly if data is transformed
Just because granularity exist doesn’t mean it all has to be shown to a wider audience. There can be legal protection – personal data via GDPR – but oftentimes detailed organisational data is already in the public domain which, while not personal, may be sensitive, or commercially valuable.
Dashboards should have built in restrictions so they don’t show a median where there are too few data points, thereby preventing drilling down to a point where you can identify individual results. You could check any dashboard that includes your data to see if such restrictions have been applied. If not, check the terms on which you supplied data.
Trust data transformation is for greatest value to the users
Because at MyCake we transform data for a different purpose from that for which it was collected, we need to be careful about how conclusions about individual organisations are drawn.
For example, we don’t produce rankings based on the proportion of annual turnover derived from NPO funding. We could, if the data and usage protocols permitted it, but it wouldn’t serve the sector well. This is not where the value lies.
We could set up an NPO benchmark to compare any one organisation to any other but the learnings come from comparing your organisation with trends among peers groups rather than from direct comparisons to other individual organisations.
Trust data limitations are understood and their impact accommodated
Finally, any dataset will have limitations. With much of the data MyCake handles, these involve gaps or inconsistency of definitions. If we look at contributed income and scan a clutch of annual reports, we find that some report membership income as earned whereas others treat it as donations.
This difference in definition may be appropriate in that the benefit being achieved by the member will vary between programmes. That’s why the way we aggregate data and the structure of the generic profit and loss sheet matter. As far as possible, we want to maintain the data granularity while also having ways to compare and organisation with the sector as a whole without comparing apples to pears, definitionally.
When data publishers provide analysis or instructions on how to use a data tool, you need them to acknowledge any assumptions made, or details in calculations and metrics they create. You want clarity about whether a whole cohort is represented or whether you are drilling down to smaller subsectors. And it needs to be visually clear to avoid misinterpretation.
In summary
Any data publisher needs to have thought about how to put data out into the world, and about what users need from the structure of the data for maximum usefulness and for it to be most actionable.
Users need to see consideration for privacy and confidentiality both to meet legal requirements but also to accommodate any data transformation from the initial reason for data collection towards any new uses and metrics.
You’re looking for honesty about the limitations of the data. You’re looking not to be overwhelmed by the volume, but for good accessibility, a sense of what the priorities are and a hierarchy of value and usefulness.
If you can’t get what you need, you need to ask if there is a route to talking to the publishers. This is especially important if you’ve provided data to the work being published.
Join the Discussion
You must be logged in to post a comment.