Decolonising Data Science

Engineering and design encompasses several disciplines across which we use ingenuity to devise machines and shape other immaterial systems and structures relevant to society. With advances in computational infrastructures, data science has advanced, and in part replaced, several processes of engineering. Every education program in a technical university now incorporates some form of data literacy in their teaching to address disciplinary challenges. But the challenges of our time are not disciplinary: Climate change, inequalities, ecological damage, food scarcity are deeply interconnected and can be traced back to capital and colonial means of extraction that have shaped our livelihoods for over centuries.

To deal with the complexity of challenges, methods in data science are taught linearly as a process of collection, transformation, visualisation, and interpretation. Education material is dominated by western perspectives and largely developed by able-bodied cis-gendered men. Coloured by personal experiences, heuristics and biases that are impossible to overcome (Delbosc, 2022; Takacs, 2003), curriculums have centred singular thinking in how we collect data (APIs, sensors, microtask), clean data (normalisation, outlier detection), map data (proprietary, algorithmic, without communities), model (over or underrepresentation), interpret and evaluate (power asymmetries, algorithmic decision making, utilitarian goals), and share or cite evidence (inaccessible journals).

Open Educational Resources (OERs) in data science are a bit of a misnomer. Despite being free to own, share and modify, these materials reinforce existing systems and structures of power. More broadly, a lecture or tutorial only exemplifies data sources and maps from the US, UK, or European countries, and very rarely illustrates alternative means. In education material where readings are part of the course to promote critical thinking, the material shared is primarily by white older male scientists. Through such dominant forms of pedagogy and western perspectives, any efforts in open education are not fully open. Instead, there remains no space for alternate social realities, lived experiences, datasets, methodologies, map-building practices, and frameworks (Franklin et al., 2022).

The consequences of teaching in this manner are wild. Those who present and those who are represented get to shape futures for themselves (educated, urban, young adults), while the rest of the identities and issues are shifted to the margins of society (e.g. a future with autonomous car fleets disregard rural, non-tech savvy people with multiple children (Delbosc, 2022)). Vulnerable communities are unable to conduct or participate in any data-based analysis, let alone be represented in the evidence that is generated or influence the decisions based on such “evidence”. For example, Dr AE Boyd highlights how the #metoo analysis sidelined experiences of those who do not identify as cis-gendered white women (Boyd, 2021). Knowingly or naively, practicing and teaching such linear and purportedly open methodologies effectively erases everyone that does not confirm with the status-quo and harms communities in the process.

When data science is taught at scale in universities, a complex dynamic emerges. The political economy of education and labour, shaped and delivered by universities globally, has laid out a production line of linear and siloed thinking. Data science graduates now work in all industries. Combined with powerful map-based visualisations, BigTech especially uses these methodologies to control narratives. Many organisations employ mapwashing techniques, where disingenuous uses of maps undermine participatory planning processes (Mattern, 2020). The epistemologies are not only colonial in the content but also in logistics of operations. Universities continue to maintain merit-based examinations, completely neglecting the role of a person’s vulnerability or positionality and how those conditions impact their capability to learn. When colonial forms of education are combined with nationally funded Artificial Intelligence programs of research, it normalises data extraction and unequal forms of participation in analysis, labour, and society, further perpetuating damages to vulnerable communities.

Goal
Our goal is to decolonise data science education, and subsequently support communities in developing a truly open perspective in building data-based knowledge and communities.

Method
Because of our inherent research interests in understanding inequalities in urban spaces, we came across the framework of intersectional methodologies. Using this framework, albeit without any relevant training and systematic processes, we designed a course in data science education at the master’s level. The course is called Introduction to Geographic Data Science and is an OER in development (https://cusp.tbm.tudelft.nl/courses/epa1316/). Through reflexive thinking, we improved our practice of teaching data science methodologies, highlighting several different aspects of intersectionality to the student body. The course teaches the linear process of data science differently, by carefully breaking it into an iterative and cyclic process, where students are encouraged to go back and forth. They are urged to engage with each step of the process critically and with agency in seeing themselves as sources of knowledge. In doing so, we realised that our course started to decolonise education practices.

However, decolonising geographic data science education requires a recentring of geographical knowledge from other parts of the world, especially the non-western and indigenous communities. Further, it is imperative to include these communities actively in mapping social issues, and in the very least, highlighting how data can be produced outside of BigTech practices of extraction. In this project, we aim to formalise our learning processes, design a framework for decolonising data science education, and implement a continually decolonising process of developing quantitative teaching methods.

This project is a CUSP initiative (Trivik Verma and Juliana Gonçalves), funded by the Open Science Programme at TU Delft.