Program

Panels

Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
14:00 14:15 8:00 8:15 Introduction
Chaired by Organizers
14:15 14:30 8:15 8:30 Keynote 1
Chaired by Nuria Oliver

"Living with AI"

by Joonhwan Lee, Seoul National University, South Korea.

14:30 14:35 8:30 8:35 discussion
Chaired by Nuria Oliver
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
14:35 14:50 8:35 8:40 Panel 1: Explainable AI (XAI)
Chaired by Nuria Oliver

"Closing the Creator-Consumer Gap in XAI: A Call for Participatory XAI Design with End-users"

by Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, Andrés Monroy-Hernández

Abstract: Despite the proliferation of explainable AI (XAI) methods, little is understood about end-users' explainability needs and perceptions of existing XAI approaches. To address this gap, we interviewed 20 end-users of a real-world AI application, and found that end-users' AI and domain background play a critical role in shaping their XAI needs and perceptions. Further, end-users surfaced gaps in current XAI research and offered valuable suggestions. In this position paper, we reflect on our findings and make the case for participatory XAI, especially involving end-users in the XAI design process, towards developing "explanations (XAI) that serve the needs of diverse end-users."

"Rethinking Explainability as a Dialogue: A Practitioner's Perspective"

by DHimabindu Lakkaraju, Dylan Slack, Yuxin Chen, Chenhao Tan, Sameer Singh

Abstract: While there is considerable interest in explainability within the machine learning community, the utility of explanations for domain experts in high-stakes fields, such as doctors and policy makers, is currently not well understood. To help fill this gap, we carry out a study where we interview doctors, healthcare professionals, and policymakers about where explanations fall short and how they could be improved going forward. Our findings indicate that decision-makers are often unsatisfied with current techniques that provide one-off explanations, like feature importances and rule-lists, and would strongly prefer interactive explanations. Further, they agree explanations in the form of open-ended natural language dialogues would achieve these goals. As a way to move forward, we outline a set of principles researchers should follow when designing interactive explanations, demonstrate how natural language dialogues satisfy these principles, and encourage the community to pursue research in explainability through natural language dialogues.

"A Thematic Comparison of Human and AI Explanations of Sexism Assessment"

by Sharon Ferguson, Paula Akemi Aoyagui, Rohan Alexander, Anastasia Kuzminykh

Abstract: Recent developments in Artificial Intelligence (AI) show much promise for algorithmic decision-making. In many scenarios, specifically those open-to-interpretation, scholars suggest that the collaboration between humans and AI may result in maximum complementary performance. To enable this collaboration, AI must be able to explain the rationale behind decisions in a way that is understandable to humans, though it is not yet clear what this means. In this work, we discuss the first step towards a criteria for understandable, natural-language AI explanations by comparing the thematic content of human and AI explanations. We highlight results that point to both promising potential as well as concerning challenges.

14:50 14:55 8:50 8:55 discussion
Chaired by Nuria Oliver
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
14:55 15:10 8:55 9:10 Keynote 2
Chaired by Q. Vera Liao

"Human-Centered Co-Creative AI: From Inspirational to Responsible AI"

by Mary Lou Maher, University of North Carolina Charlotte, US.

15:10 15:15 9:10 9:15 discussion
Chaired by Q. Vera Liao
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
15:15 15:30 9:15 9:20 Panel 2: Large Models
Chaired by Q. Vera Liao

"User and Technical Perspectives of Controllable Code Generation"

by Stephanie Houde, Vignesh Radhakrishna, Praneeth Reddy, Juie Darwade, Haoran Hu, Kalpesh Krishna, Mayank Agarwal, Kartik Talamadupula, Justin D. Weisz

Abstract: Large language models (LLM) such as OpenAI Codex are increasingly being applied to software engineering tasks to generate code for a variety of purposes including code authoring and code translation. Human-centered research in this domain has revealed that software engineers would like the ability to influence or control the properties of generated code so as to optimize output for their particular code base and application needs. In this work, we explore user requirements for controllable code generation and show that human-written code is more optimal than standard beam-search code outputs from a large language model.

"Towards an Understanding of Human-AI Interaction in Prompt-Based Co-Creative Systems"

by Atefeh Mahdavi Goloujeh, Anne Sullivan, Brian Magerko

Abstract: Co-creative AI (Artificial Intelligence) has witnessed unprecedented growth in text-to-image generative systems. In this paper, we posit that a better understanding of the drivers of user interaction with prompt-based co-creative AI tools will significantly improve how they are designed for, used by, and explained to current and future users. Much remains unknown about how users understand, engage with, and evaluate such systems. To fill this gap, we propose a framework for understanding human-AI interaction in prompt-based creative tools informed by semi-structured interviews of 19 users.

"Towards End-User Prompt Engineering: Lessons From an LLM-based Chatbot Design Tool"

by J.D. Zamfirescu-Pereira, Richmond Wong, Bjorn Hartmann, Qian Yang

Abstract: A large body of prior work has examined the capabilities of pre-trained language models ("LLMs") such as GPT-3; in contrast, relatively little work has explored how humans are able to make use those capabilities. Using natural language to steer LLM outputs ("prompting") is emerging as an important design technique---but prompt-based systems comply inconsistently, and users face challenges systematically understanding how a prompt change might impact subsequent LLM outputs. The apparent ease of instruction via prompts has led to an explosion of interest in tools that enable end-users to engage with computational systems using natural language prompts. To explore how these non-expert users approach "end-user prompt engineering, " we conduct a design probe with a prototype LLM-based chatbot design tool that encourages iterative development of prompting strategies, and report briefly on findings here.

15:30 15:35 9:30 9:35 discussion
Chaired by Q. Vera Liao
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
15:35 15:45 9:35 9:45 short break
15:45 16:00 9:45 9:50 Panel 3: Creativity + Collaboration
Chaired by Dmitry

"The Need for Explainability in AI-Based Creativity Support Tools"

by Antonios Liapis, Jichen Zhu

Abstract: A long lineage of computer-assisted design tools has established interaction paradigms that give full control to the designer over the software. Introduction of Artificial Intelligence (AI) to this creative process leads to a more co-creative paradigm, with AI taking a more proactive role. Recent generative approaches based on deep learning have strong potential as an asset creator and co-creator, however current algorithms are opaque and burden the designer with making sense of the output. In order for deep learning to become a colleague that designers can trust and work with, better explainability, controllability, and interactivity is necessary. We highlight current and potential ways in which explainability can inform human users in creative tasks and call for involving end-users in the development of both interfaces and underlying algorithms.

"Quantitatively Assessing Explainability in Collaborative Computational Co-Creativity"

by Michael Paul Clemens, Rogelio Enrique Cardona-Rivera, Courtney Rogers

Abstract: While explainable computational creativity (XCC) seeks to create and sustain computational models of creativity that foster a collaboratively creative process through explainability, there remains no way of quantitatively measuring these models. We believe that assessing collaborations between computational agents and artists will afford designers more confidence in modeling and creating these agents. Although many creative frameworks assist in delineating the creative process, we suggest using The Four P's to explore how creative agents might best co-create with an artist for their respective creative contributions. Through this research, we propose a framework to assist designers of co-creative agents in assessing explainability within their computational models. As a community within both HCI and AI, we believe a workshop will assist in the direction of this research effort to capture the appropriate qualities of this framework to maximize effectiveness and utility.

"Embodied Socio-cognitive Reframing of Computational Co-Creativity"

by Manoj Deshpande, Brian Magerko

Abstract: In this paper, we argue that current definitions of computational co-creativity do not fully capture embodied and intersubjective nature prevalent in human co-creation. We lean on theories like human-machine reconfiguration, embodiment, participatory sense-making, a sociocultural perspective of creativity, and improvisation to reframe computational co-creativity. We argue that for an effective co-creative experience with an AI partner, the agent must have creative agency, sense-making capability, and facilitate improvisational interaction.

16:00 16:05 10:00 10:05 discussion
Chaired by Dmitry
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
16:05 16:20 10:05 10:20 Keynote 3
Chaired by Plamen Angelov

"Independent Community Rooted AI Research"

by Timnit Gebru, DAIR, US.

16:20 16:25 10:20 10:25 discussion
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
16:25 16:40 10:25 10:30 Panel 4: Values + Participation
Chaired by Michael Muller

"Beyond Safety: Toward a Value-Sensitive Approach to the Design of AI Systems"

by Alexander J. Fiannaca, Cynthia L. Bennett, Shaun Kane, Meredith Ringel Morris

Abstract: As modern, pre-trained ML models have proliferated in recent years, many researchers and practitioners have made significant efforts to prevent AI systems from causing harm. This focus on safety is critical, but a singular focus on safety can come at the exclusion of considering other important stakeholder values and the interactions between those values in the AI systems we build. In this position paper, we propose that the AI community should incorporate ideas from the Value-Sensitive Design framework from the Human-Computer Interaction community to ensure the needs and values of all stakeholders are reflected in the systems we build. We share observations and reflections from our experiences working on AI-supported accessibility technologies and with members of various disability communities to illustrate the tensions that sometimes arise between safety and other values.

"Participation Interfaces for Human-Centered AI"

by Sean McGregor

Abstract: Emerging artificial intelligence (AI) applications often balance the preferences and impacts among diverse and contentious stakeholder groups. Accommodating these stakeholder groups during system design, development, and deployment requires tools for the elicitation of disparate system interests and collaboration interfaces supporting negotiation balancing interests. This paper introduces interactive visual ``participation interfaces'' for Markov Decision Processes (MDPs) and collaborative ranking problems as examples restoring a human-centered locus of control.

"Expansive Participatory AI: Supporting Dreaming within Inequitable Institutions"

by Shiran Dudy, Michael Alan Chang

Abstract: Participatory Artificial Intelligence (PAI) has recently gained interest by researchers as means to inform the design of technology through collective's lived experience. PAI has a greater promise than that of providing useful input to developers, it can contribute to the process of democratizing the design of technology, setting the focus on what should be designed. However, in the process of PAI there existing institutional power dynamics that hinder the realization of expansive dreams and aspirations of the relevant stakeholders. In this work we propose co-design principals for AI that address institutional power dynamics focusing on Participatory AI with youth.

16:40 16:45 10:40 10:45 discussion
Chaired by Michael Muller
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
16:45 17:15 10:45 11:15 meal break
17:15 17:30 11:15 11:30 Keynote 4
Chaired by David Piorkowski

"Designing AI Systems for Digital Well-Being"

by Asia Biega, Max Planck Institute for Security and Privacy (MPI-SP), Germany.

17:30 17:35 11:30 11:35 discussion
Chaired by David Piorkowski
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
17:35 17:50 11:35 11:40 Panel 5: Social Good + Human Wellbeing
Chaired by David Piorkowski

"Statelessness in Asylum Data -- A Human-Centered Perspective on Outliers"

by Kristin Kaltenhauser, Naja Müller

Abstract: Refugees around the world are increasingly subject to data-driven decision-making when applying for asylum. In a Danish context, the amount of data and documentation that is constructed about an asylum-seeker from various sources and used in the decision-making process has significantly increased in recent years. We interviewed caseworkers across the immigration services in Denmark and used this qualitative data to meaningfully engage with a public data set of Danish asylum decision summaries which is used as a collaboration tool between organisations. We present initial findings from a study of statelessness to broaden the understanding of how strengthening human engagement with data is critical when designing algorithmic systems to support public and legal decision-making, such as in the asylum domain. We found that cases of stateless asylum seekers constitute outliers in the data set of Danish asylum decision summaries, because they require alternative data practices to constitute the identity of the applicant. Our preliminary findings suggest that 1) we need to pay attention to new forms of data used to construct the asylum seeker in bureaucracies, for example when it comes to stateless people, and 2) develop research strategies where stakeholders such as the Danish immigration services are invited to reflect and develop their practice with an understanding of data that goes beyond the idea of a plain natural resource.

"Another Horizon for Human-Centered AI: An Inspiration to Live Well"

by Julian Posada

Abstract: This presentation reflects on the role of ideology in computing and the need for a new horizon in human-centered AI. It discusses how certain ideologies, from the "Californian ideology" centered on techno-libertarianism to "longtermism, " focused on the moral responsibility for the far future, justify and perpetuate social exploitation. Instead, the presentation will invite us to look for existing philosophies emerging from traditionally marginalized peoples as a "new horizon." The examples brought to the discussion are "living well" perspectives from indigenous Andean and afro-Colombian roots. Instead of being individualistic and centered on future outcomes, they stress the importance of human dignity and the relationship between community and land. The presentation concludes that by focusing on these essential aspects of human existence, the computing field can find meaningful ways to address current issues of ethical and societal impacts of artificial intelligence and other data-driven technologies.

"A Future for AI Governance Systems beyond Predictions"

by Devansh Saxena, Erina Moon, Shion Guha

Abstract: Algorithmic systems have been extensively adopted in various public sector agencies as a means to generate consistent and evidence-backed decisions to citizens. These AI systems promise to transform how government agencies interact with people related to how they support information processing and decision-making. However, prior works show that AI tools largely fuse onto existing practices and cannot transform public sector work at a deeper organizational level. In this position paper, we argue that in order to yield greater use from AI systems and improve decision-making, we need to understand the discretionary choices workers make as they navigate complex sociotechnical systems.

17:50 17:55 11:50 11:55 discussion
Chaired by David Piorkowski
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
17:55 18:10 11:55 12:10 Keynote 5
Chaired by Q. Vera Liao

"Building human-centric AI systems: thoughts on user agency, transparency and trust"

by Fernanda Viegas, Google and Harvard University, US.

18:10 18:15 12:10 12:15 discussion
Chaired by Q. Vera Liao
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
18:15 18:30 12:15 12:20 Panel 6: Users
Chaired by Q. Vera Liao

"(Re)Defining Expertise in Machine Learning Development"

by Mark Diaz, Angela Smith

Abstract: Domain experts are often engaged in the development of machine learning systems in a variety of ways, such as in data collection and evaluation of system performance. At the same time, who counts as an 'expert' and what constitutes 'expertise' is not always explicitly defined. In this project, we conduct a systematic literature review of machine learning research to understand 1) the bases on which expertise is defined and recognized and 2) the roles experts play in ML development. Our goal is to produce a high-level taxonomy to highlight limits and opportunities in how experts are identified and engaged in ML research.

"A Human-Capabilities Orientation for Human-AI Interaction Design"

by Sean Koon

Abstract: Many opportunities and challenges accompany the use of AI in domains with complex human factors and risks. This paper proposes that in such domains the most advanced human-AI interactions will not arise from an emphasis on technical capabilities, but rather from an emphasis on understanding and applying existing human capabilities in new ways. A human-capabilities orientation is explored along with three aims for research and design.

"(De)Noise: Moderating the Inconsistency of Human Decisions"

by Junaid Ali, Nina Grgic-Hlaca, Krishna P. Gummadi, Jennifer Wortman Vaughan

Abstract: Prior work in social psychology has found that people's decisions are often inconsistent. An individual's decisions vary across time, and decisions vary even more across people. Inconsistencies have been identified not only in subjective matters, like matters of taste, but also in settings one might expect to be more objective, such as sentencing, job performance evaluations, and real estate appraisals. In our study, we explore whether algorithmic decision aids can be used to moderate the degree of inconsistency in human decision-making, focusing on bail decision-making as a case study. In a series of human-subject experiments we explore how people react to different cues about their inconsistency, ranging from asking respondents to review their past decisions to providing respondents with algorithmic advice. We find that both (i) asking respondents to review their decisions as a series of pairwise comparisons and (ii) providing respondents with algorithmic advice are effective strategies for influencing human decisions.

18:30 18:35 12:30 12:35 discussion
Chaired by Q. Vera Liao
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
18:35 19:35 12:35 13:35 Posters

More information available below

19:35 19:45 13:35 13:45 short break
19:45 20:00 13:45 13:50 Panel 7: Critical
Chaired by Q. Vera Liao

"Supporting Qualitative Coding with Machine-in-the-loop"

by Matthew K Hong, Francine Chen, Yan-Ying Chen, Matt Klenk

Abstract: A prevailing assumption underlying machine learning research for qualitiative coding is that the goal of machine learning is limited to automating data annotation. Research in machine-assisted qualitative data analysis has primarily considered a narrow interpretation of this goal, where the human role is reduced to providing training data for ML models to apply learned associations to large sets of unstructured text corpora. In this paper, we argue for the need to embrace a machine-in-the-loop approach that prioritizes human-centered needs and suggest how to incorporate MITL computing into the initial data exploration and code identification process.

"``Today we talk to the machine'' - Unveiling data for providing micro-credit loans using conversational systems"

by Heloisa Candello, Emilio Vital Brazil, Rogerio De Paula, Cassia Sanctos, Marcelo Grave, Gabriel Soella, Marina Ito, Adinan Brito Filho

Abstract: In this positional paper, we aim to explore the nuances of designing and developing a conversational user interface for small-business owners in vulnerable situations. By unveiling and considering alternative-criteria for providing micro-credit loans, conversational systems can help financial institutions to deliver micro-credit offerings more effectively and with justice. We describe a pilot study with 34 entrepreneur women in which they were invited to use a business health assessment chatbot prototype, and analyze the experience a focus-group with a sub-set of those women. We conclude by discussing some insights on the use conversational systems in support of small-business entrepreneurs.

"Towards Multi-faceted Human-centered AI"

by Sajjadur Rahman, Hannah Kim, Dan Zhang, Estevam Hruschka, Eser Kandogan

Abstract: Human-centered AI workflows involve stakeholders with multiple roles interacting with each other and automated agents to accomplish diverse tasks. In this paper, we call for a holistic view when designing support mechanisms, such as interaction paradigms, interfaces, and systems, for these multifaceted workflows.

20:00 20:05 14:00 14:05 discussion
Chaired by Q. Vera Liao
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
20:05 20:20 14:05 14:20 Keynote 6
Chaired by Michael Muller

"Why HCAI Needs the Humanities"

by Lauren Klein, Emory University, US.

20:20 20:25 14:20 14:25 discussion
Chaired by Michael Muller
Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
20:25 20:40 14:25 14:30 Panel 8: Data Work
Chaired by Michael Muller

"Labeling instructions matter in biomedical image analysis. An annotator-centric perspective."

by Tim Rädsch, Annika Reinke, Vivienn Weru, Minu D. Tizabi, Nicholas Schreck, A. Emre Kavur, Bünyamin Pekdemir, Tobias Roß, Annette Kopp-Schneider, Lena Maier-Hein

Abstract: Biomedical image analysis algorithm validation depends on high-quality annotation of reference datasets, for which labeling instructions are key. Despite the importance of these instructions, their optimization remains largely unexplored. Here, we present the first systematic study of labeling instructions and their impact on annotation quality in the field from an annotator-centric perspective. Through comprehensive examination of professional practice by surveying 298 professional annotators and investigating the mandatory BIAS statements of 96 major international biomedical image analysis competition tasks, we uncovered a discrepancy between annotators' needs for labeling instructions and their current quality and availability. Based on an analysis of 14, 040 images annotated by 156 annotators from four professional companies and 708 Amazon Mechanical Turk (MTurk) crowdworkers using instructions with different information density levels, we further found that including exemplary images significantly boosts annotation performance compared to text-only descriptions, while solely extending text descriptions does not. Finally, professional annotators constantly outperform MTurk crowdworkers. Our study raises awareness for the need of quality standards in biomedical image analysis labeling instructions.

"Ground(less) Truth: The Problem with Proxy Outcomes in Human-AI Decision-Making"

by Luke Guerdan, Amanda Lee Coston, Steven Wu, Ken Holstein

Abstract: A growing literature on human-AI decision-making investigates strategies for combining human judgment with statistical models to improve the quality and efficiency of decision-making. Existing research in this area typically evaluates proposed improvements to models, interfaces, or workflows by demonstrating improved predictive performance on "ground truth" labels. However, this practice assumes that labels targeted by models adequately reflect the goals and objectives of human decision-makers. In contrast, labels observed in historical data often represent imperfect proxies for the true phenomena of interest to humans. We identify key statistical biases that can impact the validity of labels targeted by predictive models in real-world contexts, and assess the extent to which existing human-AI decision-making studies consider these challenges. Our analysis identifies systematic blind spots and assumptions made by existing studies, and motivates the development of measures of human-AI decision quality beyond accuracy on proxy outcomes.

"Human-centered Proposition for Structuring Data Construction"

by Cheul Young Park, Inha Cha, Juhyun Oh

Abstract: As the saying goes, "Garbage in, Garbage out." Data significantly impacts dataset quality and model performance. However, data is notorious for its human-centric nature, making it subjective and complex. Constructing the data for ML systems requires human interventions, which cannot be easily quantified or structured. Therefore, ML practitioners undergo an iterative process of trials and errors, ad hoc solutions, and heterogeneous methods, which calls for standardization and structured data work. In this work, we suggest human-centric propositions for structured data construction.

20:40 20:45 14:40 14:45 discussion
Chaired by Michael Muller
20:45 21:00 14:45 15:00 Closing
Chaired by Organizers

Posters

Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
18:35 19:35 12:35 13:35 Posters 1: Explainable AI (XAI)

"Pragmatic AI Explanations"

by Shi Feng, Chenhao Tan

Abstract: We use the Rational Speech Act framework to examine AI explanations as a pragmatic inference process. This reveals fatal flaws in how we currently train and deploy AI explainers. To evolve from level-0 explanations to level-1, we present two proposals for data collection and training: learning from L1 feedback, and learning from S1 supervision.

"Science Communications for Explainable AI (XAI)"

by Simon Hudson, Matija Franklin

Abstract: Artificial intelligence has a communications challenge. To create human-centric AI, it is important that XAI is able to adapt to different users. The SciCom field provides a mixed-methods approach that can provide better understanding of users' framings so as to improve public engagement and expectations of AI systems, as well as help AI systems better adapt to their particular user.

"Social Construction of XAI: Do We Need One Definition to Rule Them All?"

by Upol Ehsan, Mark Riedl

Abstract: There is a growing frustration amongst researchers and developers in Explainable AI (XAI) around the lack of consensus around what is meant by 'explainability'. Do we need one definition of explainability to rule them all? In this paper, we argue why a singular definition of XAI is neither feasible nor desirable at this stage of XAI's development. We view XAI through the lenses of Social Construction of Technology (SCOT) to explicate how diverse stakeholders (relevant social groups) have different interpretations (interpretative flexibility) that shape the meaning of XAI. Forcing a standardization (closure) on the pluralistic interpretations too early can stifle innovation and lead to premature conclusions. We share how we can leverage the pluralism to make progress in XAI without having to wait for a definitional consensus.

"Trust Explanations to Do What They Say"

by Neil Natarajan, Reuben Binns, Jun Zhao, Nigel Shadbolt

Abstract: How much are we to trust a decision made by an AI algorithm? Trusting an algorithm without cause may lead to abuse, and mistrusting it may similarly lead to disuse. Trust in an AI is only desirable if it is warranted; thus, calibrating trust is critical to ensuring appropriate use. In the name of calibrating trust appropriately, AI developers should provide contracts specifying use cases in which an algorithm can and cannot be trusted. Automated explanation of AI outputs is often touted as a method by which trust can be built in the algorithm. However, automated explanations arise from algorithms themselves, so trust in these explanations is similarly only desirable if it is warranted. Developers of algorithms explaining AI outputs (xAI algorithms) should provide similar contracts, which should specify use cases in which an explanation can and cannot be trusted.

Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
18:35 19:35 12:35 13:35 Posters 2: Large Models

"Human-AI Co-Creation of Personas and Characters with Text-Generative Models"

by Toshali Goel, Orit Shaer

Abstract: Natural language generation has been one of the prime focuses of human-AI collaboration in recent years. We are specifically interested in exploring the idea of creativity in human-AI co-creation, most especially in the context of persona generation for the iterative human-centered design process. Collaborating with AIs to generate engaging personas may present opportunities to overcome the shortcomings of personas and how they're currently used in the design process. We aim to study how collaborating with AIs might help designers and researchers to create engaging personas and narrative scenarios for their products, and by extension the implications of human-AI collaborative creative writing on fields like literature with character generation. The implications of such a study could be generalized beyond user-experience design and persona generation. The ability to create engaging personas is not dissimilar from the ability to generate characters as a whole, and the subsequent potential for natural language generation to assist in creative writing and thinking is implicit. In this paper, we will discuss the process and potential merits of iterating with AIs for creative content creation, as well as expand upon experiments we have conducted and the questions we hope to answer in our future research.

"Generation Probabilities are Not Enough: Improving Error Highlighting for AI Code Suggestions"

by Helena Vasconcelos, Gagan Bansal, Adam Fourney, Q.Vera Liao, Jennifer Wortman Vaughan

Abstract: Large-scale generative models are increasingly being used in tooling applications. As one prominent example, code generation models recommend code completions within an IDE to help programmers author software. However, since these models are imperfect, their erroneous recommendations can introduce bugs or even security vulnerabilities into a code base if not overridden by a human user. In order to override such errors, users must first detect them. One method of assisting this detection has been highlighting tokens with low generation probabilities. We also propose another method, predicting the tokens people are likely to edit in a generation. Through a mixed-methods, pre-registered study with N = 30 participants, we find that the edit model highlighting strategy results in significantly faster task completion time, significantly more localized edits, and was strongly preferred by participants.

"Is It Really Useful?: An Observation Study of How Designers Use CLIP-based Image Generation For Moodboards"

by Seungho Baek, Hyerin Im, Uran Oh, Youn-kyung Lim, Takyeon Lee

Abstract: Contrastive neural network models (i.e. CLIP) based image generation services (e.g. DALL-E2, MidJourney, Stable Diffusion) have shown that they can produce a huge range of flawless images, consistent with a user-provided image concept in text. While a lot of people have shared successful cases on the Internet, we still have very limited knowledge about whether such tools are helpful for daily design work. We conducted a preliminary observational study to investigate how designers create moodboards using DALL-E2. The results indicate that novice users would find it hard to find best prompts for creating and modifying generate images. The goal of this position paper is to propose potential research areas and ideas such as how to set guidelines for designing interactive image generation services for a specific purpose.

"The Design Space of Pre-Trained Models"

by Meredith Ringel Morris, Carrie J. Cai, Jess Holbrook, Chinmay Kulkarni, Michael Terry

Abstract: Card et al.'s classic paper "The Design Space of Input Devices" established the value of design spaces as a tool for HCI analysis and invention. We posit that developing design spaces for emerging pre-trained, general AI models is necessary for supporting their integration into human-centered systems and practices. We explore what it means to develop an AI model design space by proposing two design spaces relating to pre-trained AI models: the first considers how HCI can impact pre-trained models (i.e., interfaces for models) and the second considers how pre-trained models can impact HCI (i.e., models as an HCI prototyping material).

Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
18:35 19:35 12:35 13:35 Posters 3: Creativity + Collaboration

"The Challenges and Opportunities in Overcoming Algorithm Aversion in Human-AI Collaboration"

by Lingwei Cheng, Alexandra Chouldechova

Abstract: Algorithm aversion occurs when humans are reluctant to use algorithms despite their superior performance. Prior studies have shown that giving users ``outcome control'', the ability to appeal or modify model's predictions, can mitigate this aversion. This can be contrasted with ``process control'', which entails control over the development of the algorithmic tool. The effectiveness of process control is currently under-explored. To compare how various controls over algorithmic systems affect users' willingness to use the systems, we replicate a prior study on outcome control and conduct a novel experiment investigating process control. We find that involving users in the process does not always result in a higher reliance on the model. We find that process control in the form of choosing the training algorithm mitigates algorithm aversion, but changing inputs does not. Giving users both outcome and process control does not result in further mitigation than either outcome or process control alone. Having conducted the studies on both Amazon Mechanical Turk (MTurk) and Prolific, we also reflect on the challenges of replication for crowdsourcing studies of human-AI interaction.

"Feature-Level Synthesis of Human and ML Insights"

by Isaac Lage, Sonali Parbhoo, Finale Doshi-Velez

Abstract: We argue that synthesizing insights from humans and ML models at the level of features is an important direction to explore to improve human-ML collaboration on decision-making problems. We show through an illustrative example that feature-level synthesis can produce correct predictions in a case where existing methods fail, then lay out directions for future exploration.

"Exploring Human-AI Collaboration for Fair Algorithmic Hiring"

by Hyun Joo Shin, Anqi Liu

Abstract: The current machine learning applications in the hiring process are prone to bias, especially due to poor quality and small quantity of data. The bias in hiring imposes potential societal and legal risks. Thus, it is important to evaluate ML applications' bias in the hiring context. To investigate the algorithmic bias, we use real-world employment data to train models for predicting job candidates' performance and retention. The result shows that ML algorithms make biased decisions toward a certain group of job candidates. This analysis motivates us to resort to an alternative method---AI-assisted hiring decision making. We plan to conduct an experiment with human subjects to evaluate the effectiveness of human-AI collaboration for algorithmic bias mitigation. In our designed study, we will systematically explore the role of human-AI teaming in enhancing the fairness of hiring in practice.

"Understanding the Criticality of Human Adaptation when Designing Human-Centered AI Teammates"

by Christopher Flathmann, Nathan J McNeese

Abstract: Research on human-centered AI teammates has often worked to create AI teammates that adapt around humans, but humans have a remarkable and natural ability to adapt around their environment and teammates. This paper capitalizes on human adaptability by showcasing how humans actively adapt around their AI teammates even when those teammates change. In doing so, results of a mixed-methods experiment (N = 60) demonstrates that human adaptation is a critical and natural component of human-centered AI teammate design.

"Towards a Human-Centered Approach for Automating Data Science"

by Anamaria Crisan, Lars Kotthoff, Marc Streit, Kai Xu

Abstract: Technology for Automating Data Science (AutoDS) consistently undervalues the role of human labor, resulting in tools that, at best, are ignored and, at worst, can actively mislead or even cause harm. Even if full and frictionless automation were possible, human oversight is still desired and required to review the outputs of AutoDS tooling and integrate them into decision-making processes. We propose a human-centered lens to AutoDS that emphasizes the collaborative relationships between humans and these automated processes and elevates the effects these interactions have on downstream decision-making. Our approach leverages a provenance framework that integrates user-, data-, and model-centric approaches to make AutoDS platforms observable and interrogable by humans.

Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
18:35 19:35 12:35 13:35 Posters 4: Values + Participation

"Values Shape Optimizers Shape Values"

by Joe Kwon

Abstract: We often construct AI systems that optimize over specified objectives serving as proxies for human values. Consider recommender systems on social media and entertainment streaming platforms, which maximize the time-on-application or other user-engagement metrics, as a proxy for providing entertaining content to users. The research community has also begun to study how optimizing systems influence human values (e.g., shifts political leanings or predictable induction into specific online communities). We are left with an obvious, yet overlooked framework: Consideration of values and optimizers as a highly intertwined and interactive system; one that constantly feeds into and transforms the other. This perspective is crucial for engineering safe and beneficial AI systems--ones which preserve diverse values across individuals and communities.

"Tensions Between the Proxies of Human Values in AI"

by Teresa Datta, Daniel Nissani, Max Cembalest, Akash Khanna, Haley Massa, John Dickerson

Abstract: Motivated by mitigating potentially harmful impacts of technologies, the AI community has formulated and accepted mathematical definitions for certain pillars of accountability: e.g. privacy, fairness, and model transparency. Yet, we argue this is fundamentally misguided because these definitions are imperfect, siloed constructions of the human values they hope to proxy, while giving the guise that those values are sufficiently embedded in our technologies. Under popularized techniques, tensions arise when practitioners attempt to achieve each pillar of fairness, privacy, and transparency in isolation or simultaneously. In this position paper, we argue that the AI community needs to consider alternative formulations of these pillars based on the context in which technology is situated. By leaning on sociotechnical systems research, we can formulate more compatible, domain-specific definitions of our human values for building more ethical systems.

"Revisiting Value Alignment Through the Lens of Human-Aware AI"

by Sarath Sreedharan, Subbarao Kambhampati

Abstract: Value alignment has been widely argued to be one of the central safety problems in AI. While the problem itself arises from the way humans interact with the AI systems, most current solutions to value alignment tend to sideline the human or make unrealistic assumptions about possible human interactions. In this position paper, we propose a human-centered formalization of the value alignment problem that generalizes human-AI interaction frameworks that were originally developed for explainable AI. We see how such a human-aware formulation of the problem provides us with novel ways of addressing and understanding the problem.

"Towards Better User Requirements: How to Involve Human Participants in XAI Research"

by Thu Nguyen, Jichen Zhu

Abstract: Human-Center eXplainable AI (HCXAI) literature identifies the need to address user needs. This paper examines how existing XAI research involves human users in designing and developing XAI systems and identifies limitations in current practices, especially regarding how researchers identify user requirements. Finally, we propose several suggestions on how to derive better user requirements by deeper engagement with user groups.

"Honesty as the Primary Design Guidelineof Machine Learning User Interfaces"

by Claudio Pinhanez

Abstract: The outputs of most Machine Learning (ML) systems are often riddled with uncertainties, biased from the training data, sometimes incorrect, and almost always inexplicable. However, in most cases, their user interfaces are oblivious to those shortcomings, creating many undesirable consequences, both practical and ethical. I propose that ML user interfaces should be designed to make clear those issues to the user by exposing uncertainty and bias, instilling distrust, and avoiding imposture. This is captured by the overall concept of Honesty, which I argue should be the most important guide for the design of ML interfaces.

"Towards Companion Recommendation Systems"

by Konstantina Christakopoulou, Yuyan Wang, Ed H. Chi, Minmin Chen

Abstract: Recommendation systems can be seen as one of the first successful paradigms of true human-AI collaboration. That is, the AI identifies what the user might want and provide this to them at the right time; and the user, implicitly or explicitly, gives feedback of whether they value said recommendations. However, to make the recommender a emph{true companion} of users, amplifying and augmenting the capabilities of users to be more knowledgeable, healthy, and happy, requires a shift into the way this collaboration happens. In this position paper, we argue for an increasing focus into reflecting the user values into the design, evaluation, training objectives, and interaction paradigm of state-of-the-art recommendation.

Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
18:35 19:35 12:35 13:35 Posters 5: Social Good + Human Wellbeing

"Indexing AI Risks with Incidents, Issues, and Variants"

by Sean McGregor, Kevin Paeth, Khoa Lam

Abstract: Two years after publicly launching the AI Incident Database (AIID) as a collection of harms or near harms produced by AI in the world, a backlog of ``issues'' that do not meet its incident ingestion criteria have accumulated in its review queue. Despite not passing the database's current criteria for incidents, these issues advance human understanding of where AI presents the potential for harm. Similar to databases in aviation and computer security, the AIID proposes to adopt a two-tiered system for indexing AI incidents (i.e., a harm or near harm event) and issues (i.e., a risk of a harm event). Further, as some machine learning-based systems will sometimes produce a large number of incidents, the notion of an incident ``variant'' is introduced. These proposed changes mark the transition of the AIID to a new version in response to lessons learned from editing 1, 800+ incident reports and additional reports that fall under the new category of ``issue.''

"Human-Centered Algorithmic Decision-Making in Higher Education"

by Kelly McConvey, Anastasia Kuzminykh, Shion Guha

Abstract: Algorithms used for decision-making in higher education promise cost-savings to institutions and personalized service for students, but at the same time, raise ethical challenges around surveillance, fairness, and interpretation of data. To address the lack of systematic understanding of how these algorithms are currently designed, we reviewed algorithms proposed by the research community for higher education. We explored the current trends in the use of computational methods, data types, and target outcomes, and analyzed the role of human-centered algorithm design approaches in their development. Our preliminary research suggests that the models are trending towards deep learning, increased use of student personal data and protected attributes, with the target scope expanding towards automated decisions. Despite the associated decrease in interpretability and explainability, current development predominantly fails to incorporate human-centered lenses.

Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
18:35 19:35 12:35 13:35 Posters 6: Users

"Human-in-the-loop Bias Mitigation in Data Science"

by Romila Pradhan, Tianyi Li

Abstract: With the successful adoption of machine learning (ML) in decision making, there have been growing concerns around the transparency and fairness of ML models leading to significant advances in the field of eXplainable Artificial Intelligence (XAI). Generating explanations using existing techniques in XAI and merely reporting model bias, however, are insufficient to locate and mitigate sources of bias. In line with the data-centric AI movement, we posit that to mitigate bias, we must solve the myriad data errors and biases inherent in the data, and propose a human-machine framework that strengthens human engagement with data to remedy data errors and data biases toward building fair and trustworthy AI systems.

"A View From Somewhere: Human-Centric Face Representations"

by Jerone Theodore Alexander Andrews, Przemyslaw Joniak, Alice Xiang

Abstract: Biases in human-centric computer vision models are often attributed to a lack of sufficient data diversity, with many demographics insufficiently represented. However, auditing datasets for diversity can be difficult, due to an absence of ground-truth labels of relevant features. Few datasets contain self-identified demographic information, inferring demographic information risks introducing additional biases, and collecting and storing data on sensitive attributes can carry legal risks. Moreover, categorical demographic labels do not necessarily capture all the relevant dimensions of human diversity that are important for developing fair and robust models. We propose to implicitly learn a set of continuous face-varying dimensions, without ever asking an annotator to explicitly categorize a person. We uncover the dimensions by learning on a novel dataset of 638, 180 human judgments of face similarity (FAX). We demonstrate the utility of our learned embedding space for predicting face similarity judgments, collecting continuous face attribute values, and comparative dataset diversity auditing. Moreover, using a novel conditional framework, we show that an annotator's demographics influences the importance they place on different attributes when judging similarity, underscoring the need for diverse annotator groups to avoid biases.

"Metric Elicitation; Moving from Theory to Practice"

by Safinah Ali, Sohini Upadhyay, Gaurush Hiranandani, Elena Glassman, Oluwasanmi O Koyejo

Abstract: Metric Elicitation (ME) is a framework for eliciting classification metrics that better align with implicit user preferences based on the task and context. The existing ME strategy so far is based on the assumption that users can most easily provide preference feedback over classifier statistics such as confusion matrices. This work examines ME, by providing a first ever implementation of the ME strategy. Specifically, we create a web-based ME interface and conduct a user study that elicits users' preferred metrics in a binary classification setting. We discuss the study findings and present guidelines for future research in this direction.

"Combating Toxicity in Online Games with HCAI"

by Regan Mandryk, Julian Frommel

Abstract: Multiplayer gaming yields social benefits, but can cause harm through toxicity--particularly as directed toward women, players of color, and 2SLGBTQ+ players. Detecting toxicity is challenging, but is necessary for intervention. We present three challenges to automated toxicity detection, and share potential solutions so that researchers can develop HCAI models that detect toxic game communication.

Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
18:35 19:35 12:35 13:35 Posters 7: Critical

"The Aleph & Other Metaphors for Image Generation"

by Gonzalo Ramos, Rick Barraza, Victor Dibia, Sharon Lo

Abstract: In this position paper, we reflect on fictional stories dealing with the infinite and how they connect with the current, fast-evolving field of image generation models. We draw attention to how some of these literary constructs can serve as powerful metaphors for guiding human-centered design and technical thinking in the space of these emerging technologies and the experiences we build around them. We hope our provocations seed conversations about current and yet to be developed interactions with these emerging models in ways that may amplify human agency.

"Accuracy Is Not All You Need"

by David Piorkowski, Rachel Ostrand, Yara Rizk, Vatche Isahagian, Vinod Muthusamy, Justin D. Weisz

Abstract: Improving the performance of human-AI (artificial intelligence) collaborations tends to be narrowly scoped, with better prediction performance often considered the only metric of improvement. As a result, work on improving the collaboration usually focuses on improving the AI's accuracy. Here, we argue that such a focus is myopic, and instead, practitioners should take a more holistic view of measuring the performance of AI models, and human-AI collaboration more specifically. In particular, we argue that although some use cases merit optimizing for classification accuracy, for others, accuracy is less important and improvement on human-centered metrics should be valued instead.

"The Role of Labor Force Characteristics and Organizational Factors in Human-AI Interaction"

by Lingwei Cheng, Alexandra Chouldechova

Abstract: Algorithmic risk assessment tools are now commonplace in public sector domains such as criminal justice and human services. In this paper we argue that understanding how the deployment of such tools affect decision-making requires a considering of organizational factors and worker characteristics that may influence the take-up of algorithmic recommendations. We discuss some existing evaluations of real-world algorithms and show that labor force characteristics play a significant role in influencing these human-in-the-loop systems. We then discuss our findings from a real-world child abuse hotline screening use case, in which we investigate the role that worker experience plays in algorithm-assisted decision-making. We argue that system designers should consider ways of preserving institutional knowledge when introducing algorithms into settings with high employee turnover.

"Beyond Decision Recommendations: Stop Putting Machine Learning First and Design Human-Centered AI for Decision Support"

by Zana Buçinca, Alexandra Chouldechova, Jennifer Wortman Vaughan, Krzysztof Gajos

Abstract: AI-driven decision-support tools often share a common form: given a decision instance, the decision maker is presented with a prediction from a machine learning model, with or without an explanation. Sometimes, this model predicts a factor thought to be pivotal for the decision, such as a risk score [8]. Other times, the model implicitly or directly predicts a recommended decision, such as whether a patient has a certain disease [7] or which course of treatment to select for a patient [9]. We argue that this “ML-first” paradigm of building decision-support tools around a single machine learning model with readily available data has emerged primarily out of convenience and may be fundamentally limited. We suggest that the community move towards more robust and human-centered ways of supporting decision makers with AI-powered tools.

Start
(CET)
End
(CET)
Start
(EDT)
End
(EDT)
Session Name
18:35 19:35 12:35 13:35 Posters 8: Data Work

"Data Issues Challenging AI Performance in Diagnostic Medicine and Patient Management"

by Mohammad Hossein Jarrahi, Mohammad Haeri, W. Christopher Lenhardt

Abstract: In this short article, we focus on four dimensions of data that create layers of complexity using the data in the context of AI systems developed for medical applications collectively referred to as diagnostic medicine. These complexities, or 'data dilemmas, ' share a core human element, making it clear why a human-centered approach is needed in understanding the relationship between medical data and AI systems.

"High-stakes team based public sector decision making and AI oversight"

by Deborah Morgan, Vincent J. Straub, Youmna Hashem, Jonathan Bright

Abstract: Oversight mechanisms, whereby the functioning and behaviour of AI systems are controlled to ensure that they are tuned to public benefit, are a core aspect of human-centered AI. They are especially important in public sector AI applications, where decisions on core public services such as education, benefits, and child welfare have significant impacts. Much current thinking on oversight mechanisms revolves around the idea of human decision makers being present 'in the loop' of decision making, such that they can insert expert judgment at critical moments and thus rein in the functioning of the machine. While welcome, we believe that the theory of human in the loop oversight has yet to fully engage with the idea that decision making, especially in high-stakes contexts, is often currently made by hierarchical teams rather than one individual. This raises the question of how such hierarchical structures can effectively engage with an AI system that is either supporting or making decisions. In this position paper, we outline some of the key contemporary elements of hierarchical decision making in contemporary public services and show how they relate to current thinking about AI oversight, thus sketching out future research directions for the field.

"Explainable Representations of Human Interaction: Engagement Recognition model with Video Augmentation"

by Yubin Kim, Hae Won Park, Sharifa Alghowinem

Abstract: In this paper, we explore how different video augmentation techniques transition the representation learning of a dyad's joint engagement. We evaluate state-of-the-art action recognition models (TimeSformer, X3D, I3D, and SlowFast) on parent-child interaction video dataset with joint engagement recognition task and demonstrate how the performance varies by applying different video augmentation techniques (General Aug, DeepFake, and CutOut). We also introduce a novel metric to objectively measure the quality of learned representations (Grad-CAM) and relate this with social cues (smiling, head angle, and body closeness) by conducting correlation analysis. Furthermore, we hope our method serves as a strong baseline for future human interaction analysis research.

Important Dates

Submission: 2022-09-22 AoE

Notification: 2022-10-20

Camera Ready: To be announced

Workshop: 2022-12-09

Submissions

Submit your work to OpenReview using the link below.