22 June 2022

Writing a “modeler’s Manifesto” for More Transparent, Ethical Data Science

Based in part on Eitzel 20211

“What am I doing, writing a manifesto?” I thought to myself (or you might be thinking). “Aren’t modelers supposed to come up with answers, and be objective, and sure of themselves? And…aren’t manifestos kind of political?”

But what I’ve realized is that modeling is also political. We do our best to make models good representations of the things we are trying to understand, but when you start reading news articles about Black people not being recognized as humans by facial recognition software (never mind the question about whether anyone wants their face recognized in the first place!) and professors publishing popular books titled Weapons of Math Destruction2, you start to get the feeling that something is seriously wrong. And it is. But how do we fix it? Well! Now we come to the need for a Modeler’s Manifesto.

That said, where does a modeler, who’s already out there doing data science, learn about better ways to do modeling? We’re seeing more ethics training in data science undergraduate programs, which is great, but I was already done with my PhD when I realized I needed this kind of training. I’d discovered that the modeling methods I had been taught could have problematic consequences but not what to do about those consequences.

So I taught myself about how I (and we) make knowledge, and in my case, models. I read a lot of dense but illuminating literature from different academic disciplines like anthropology, feminist studies, science and technology studies, and geography, among others. I talked with my colleagues at UC Santa Cruz, where I was a postdoc, to work through all these ideas. I also started looking into what other modelers were saying about how to do a better job of creating models that were more ethical and, in addition, more transparent. (I’d been hearing more and more public alarm about difficulties in confirming the results of published research, so transparency became an important goal, along with ethics and justice.) I found that many of these sources were converging on some of the same principles and strategies. So, after a deep dive into all that material, I developed a list of practices that I thought could help improve modeling. Voila: a Modeler’s Manifesto.

Now, I realize that as a well-resourced white person (and a child of the 1980s) I have some particular biases, and I want to be transparent about them. I tend to think we should try to fix injustices, to try to “make the world a better place,” and that is way more complicated in practice than I ever dreamed when I was a child. For starters, my assumption that we could solve justice and environmental problems might have been a little too glib, and my role as someone with privilege might need to be in the backseat of whatever team I’m on. But I decided that I should still do what I could to help and be ready to step back when appropriate. And as a resident of the San Francisco Bay Area who grew up watching Star Trek: The Next Generation (my family’s alternative to church), I was pretty convinced that we could find ways for technology to help us, as long as we were careful about it. So here I am, offering the manifesto I assembled – hoping that the future of data science can actually benefit us in the face of the challenges of the 21st century.

The suggestions in my Modeler’s Manifesto point towards more collaborative modeling that gives more context to modeling processes and can result in more just models and outcomes. But I have also thought about whether I’ve actually followed these suggestions myself, whether they made a difference in my own modeling outcomes, and which items on that list were most important to various kinds of modeling I’ve done.

Some of the models I’ve made over the years were on topics like physics and seismology, and though they did not tend to have justice aspects as I was aware of them, I believe they would have benefited from more transparency; I wish I had written “data biographies” for those earlier projects. I was also a less experienced researcher, so I wasn’t as confident in having applied the rules of the methods correctly. So, even in retrospect, the manifesto is a useful checklist.

As a statistical ecologist, I became more comfortable with representing uncertainty in models and in trying to draw conclusions from them. But I also became more conservative in those conclusions, keeping my statements broad enough to include a variety of results I generated by trying slightly different analyses. This came in handy when I discovered an error in my data analysis in two papers that I had already published! Because I kept my conclusions conservative, I was able to correct the error without retracting the papers (to my immense relief). But I still hadn’t truly engaged with justice or inclusion in my modeling.

On the other hand, when I was able to do community-based modeling – where I was largely in the backseat and the community determined the questions, had their own data, and decided how to apply the model we had made – I was able to apply all of the manifesto practices. This was particularly true of my postdoc project with the Muonde Trust in rural Zimbabwe, a team of farmer-researchers who often engage with outsiders for training and support in answering their own questions.

We created a model representing their land-use trade-offs between agriculture, grazing, and preserving native forests, and the Muonde research team was able to show the model to local leaders, motivating changes in land-use policies3. Community members who saw the model realized that they could reduce deforestation if people were allowed to plant crops on fields that had been abandoned by absentee owners. The Muonde team immediately began piloting the idea: In the first four years since we made the model, more than 40 new households have been established on unused fields rather than cutting down forests to make new fields.

So, I discovered that different manifesto suggestions applied to different kinds of modeling, different projects, and different contexts for my modeling work. But I found that all the items on the list were worth at least thinking about for every project. And from reviewing my own work, I found that the future of data science can be closer to what I hoped if I choose to work with communities on the questions they need answered. Really engaged, inclusive work can be slower, more complex, and harder to fund and publish, but in my mind those disadvantages are worth it if I can find an appropriate way to help “make the world a better place.” Even if it is much more complicated than I expected as a child. And isn’t life always more complicated than we think at first? That shouldn’t stop us from trying anyway.

Modeler’s Manifesto: The future of data science could be more __ if modelers __.

  1. The future of data science could be more trustworthy if modelers were more candid about the rules and limitations of the specific methods they use. For example, correlations do not imply causation, and statistically significant might not mean biologically or socially significant if the impact (“effect size”) is small.
  2. The future of data science could be more transparent if modelers wrote detailed stories (“data biographies”) about how their models were made. Now I can look into why they made the conclusions they did and whether I agree with them. And if I want to reuse their data, I have a better understanding of how to do it correctly.
  3. The future of data science could be more thorough if modelers recognized and included qualitative aspects of what they do along with the quantitative aspects of what they do. I think about qualitative patterns from my experience with something before I make a model that tries to measure it. Or: My simulation model uses both descriptions and numbers.
  4. The future of data science could be more robust if modelers triangulated between different datasets, analyses, and ways of knowing. We can zero in on a better understanding if we look at how different studies done by different groups agree or disagree.
  5. The future of data science could be more open if modelers treated uncertainty as an opportunity rather than a failure. Talking about what we don’t know could be a way to bring different people together to talk about an important issue, and it could point us to areas we should look into further.
  6. The future of data science could be more relevant if modelers sought out interdisciplinary projects and teammates. Many pressing contemporary problems require backgrounds in more than just statistics or computer science or even ecology or engineering.
  7. The future of data science could be more just if modelers paid attention to how power and privilege impact their modeling. Who gets to decide how we do the modeling? Whose knowledge counts more than others? Who gets compensated fairly for their work and who doesn’t? How can we change these inequities?
  8. The future of data science could be more ethical if modelers paid attention to the impacts and implications of their work. While you’re still designing your model, think about how it might get used and by whom and on whom. Don’t wait to assess impacts “downstream.”
  9. The future of data science could be more inclusive if modelers worked directly with the communities that the model will be used on/for. The community could drive the questions to be answered, helping to ensure better outcomes, and they could contribute knowledge about the system of interest that the modeler would have no other access to.

M.V. Eitzel is a researcher at the Center for Community and Citizen Science at the University of California, Davis, focused on participatory data science. M.V. creates community-based models of messy natural resources data on topics like forestry, marine protected areas, agro-pastoral management, and dam removal and is also focused on understanding how and why collaboration with communities gives better modeling results for everyone.

1 Eitzel, M. V. A modeler’s manifesto: Synthesizing modeling best practices with social science frameworks to support critical approaches to data science. Research Ideas and Outcomes. 7, e71553 (2021).

2 O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. (Crown, 2016).

3 Eitzel, M. V. et al. Assessing the Potential of Participatory Modeling for Decolonial Restoration of an Agro-Pastoral System in Rural Zimbabwe. Citizen Science: Theory and Practice 6, (2021).