Technology is not neutral.
The use of Artificial Intelligence (AI) tools1 to describe, analyze, visualize, or aid discovery of information from Smithsonian collections, libraries, archives, and research data reflects the biases and positionality of the people and systems who built each tool, as well as those that collected, cataloged, and described any data used for their training. These tools might hold extensive value in their use at the Smithsonian, but there are issues that will limit the applicability and reliability of their use due to the way they were planned and created.
We seek to only begin AI projects2 that implement tools and algorithms that are respectful to the individuals and communities that are represented by the information in our museum, library, and archival collections. We aim to be proactive in identifying and documenting biases and methodologies when building and implementing such tools and making the documentation available to audiences that will interact with the resulting products. We recognize that technology evolves over time and that our efforts must also evolve to ensure our ethical framework stays relevant and robust. We encourage any person, community, or stakeholder involved with or affected by said tools and algorithms to provide feedback and point out any concerns.
We acknowledge the opportunities that AI tools present for cultural heritage organizations:
As digitization of museum, library, and archival collections has become more prevalent, there is a need for tools to make digitized data available to our audiences.
AI tools can be used to make museum, library, and archival collections more discoverable to the public by efficiently extracting, summarizing, and visualizing vast amounts of data.
AI tools can help us become more representative of our audiences, through surfacing the histories of marginalized people and groups.
We urge anyone contemplating an AI project to consider:
Is it the appropriate technology to solve the problem?
The development of AI tools often requires the use of specialized computational hardware, the production of which relies on mining of rare earth metals, and the operation of which can have a large carbon footprint. What is the environmental impact of choosing this technology or tool?
There are no unbiased methodologies, datasets, collections, algorithms, or tools. Therefore, what are the biases in the methodologies, datasets, collections, algorithms, or tools you wish to use?
We strive to promote the following actions when implementing AI tools:
Documentation of the biases in any methodologies, datasets, collections, algorithms, or tools.
Documentation of transparent data statements that outline the intent of methodologies, datasets, collections, algorithms, or tools.
Creation of positionality statements of the creators of datasets or algorithms behind AI tools.
Documentation of potential risks and regular updating of these risks as technology changes.
Solicitation and inclusion of feedback from relevant members of the community.
We strive to recognize the following when implementing AI tools:
Everyone at the Smithsonian is involved in data collection, creation, dissemination, analysis, as a stakeholder.
If any community or individual is harmed by the use of a technology, then that is one too many.
We strive to promote the following when partnering with outside organizations on AI tools or projects:
We should seek projects and partnerships that adhere to our institutional values.
We should not enter into contracts of collaborations with industry or other partners for the use of tools with unspecified or undisclosed methods and biases.
We should require potential partners who create AI and machine learning tools to explicitly evaluate and state if the datasets or data descriptions used in these tools was collected without consent, or contains offensive or racist descriptions before we agree to use these tools.
Version drafted in Spring 2022 by members of the Smithsonian AI & ML community of practice. Comments and suggestions welcomed at SI-DataScience@si.edu.
References
Bender, E. M., Friedman, B. 2018. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. https://aclanthology.org/Q18-1041.
Denton, E., Hanna, A., Amironesei, R., Smart, A., Nicole, H., Scheuerman, M. K. 2020. Bringing the People Back In: Contesting Benchmark Machine Learning Datasets. Proceedings of ICML Workshop on Participatory Approaches to Machine Learning (https://arxiv.org/pdf/2007.07399.pdf).
Murphy, O., Villaespesa, E. 2020. AI: A Museum Planning Toolkit (https://themuseumsainetwork.files.wordpress.com/2020/02/20190317_museums-and-ai-toolkit_rl_web.pdf).
Schwartz, R., Dodge, J., Smith, N. A., Etzioni, O. 2019. Green AI. https://doi.org/10.48550/arxiv.1907.10597.
Stanford Special Collections and University Archives Statement on Potentially Harmful Language in Cataloging and Archival Description (https://library.stanford.edu/spc/using-our-collections/stanford-special-collections-and-university-archives-statement-potentially).
Footnotes
The term “AI tools” includes a variety of technologies that seek to create decision-making software. Some examples include facial and speech recognition, machine learning based optical character recognition, language translation, natural language processing, image recognition, object detection and segmentation, and data clustering. Common commercial examples include virtual assistants such as Siri or Alexa, website search and recommendation algorithms, and tagging and identification of people in images on social media platforms.↩︎
The term “AI project” refers to an intentional effort to utilize or create an AI tool in research or in an existing workflow.↩︎