Human Factors in Computing Systems: Understanding the Next Generation of Human-AI Interaction
15 May, 2025
15 May, 2025
My association with the University of Toronto's Dynamic Graphics Project (DGP) lab began through my involvement with TorCHI, a grassroots organization that brings together User Experience Research experts. The organization creates a network for professionals to visit different research labs after work hours, share insights, and get a better understanding of diverse research processes. I became good friends with several team leads and university professors, which led me to the DGP lab, where I had the privilege of meeting with teaching staff to discuss the groundbreaking PhD theses selected for presentation at the AMC CHI conference in Yokohama, Japan.
It was a privilege to be more than just an observer. We had in-depth discussions about their testing methods and research questions, and I worked with them to challenge their assumptions and strengthen their arguments.
The theory explores a new mixed reality (MR) system designed to help university students take notes during live lectures. The system addresses the common problem of students getting distracted and missing important details while trying to capture information from lecture slides or the instructor's speech on their devices.
The user's gaze is used for quick repositioning of the cursor, while the pen provides precise and expressive input. This allows users to draw and interact with the virtual spatial panels.
Study was conducted with 45 university students via surveys and interviews with 12 participants.
Identified User Challenges:
Difficulty keeping up with fast-paced lectures.
Cumbersome process of merging notes from multiple devices.
Positive User Feedback:
Received a "good" rating on the System Usability Scale.
Enhanced note-taking efficiency and lecture engagement.
Described as "seamless," "intuitive," and an "all-in-one spot for note-taking".
Valued Features:
Automatic slide capture reduces manual effort.
Easy review of lecture history aids in catching up on missed content.
Content capture and annotation features improve speed and convenience compared to traditional methods.
Challenges and Areas for Improvement:
Adjustment period required for Gaze+Pen interaction due to unfamiliarity.
Dual use of Head-Mounted Display (HMD) and tablet is perceived as more cumbersome compared to using just a tablet.
This theory explores how different interventions can influence a user's reliance on Large Language Models (LLMs) when performing a task.
A randomized online experiment with 400 participants to test three specific interventions was conducted: Reliance Disclaimer, Uncertainty Highlighting, and an Implicit Answer. The tasks involved were LSAT logical reasoning and image-based numerical estimation.
Reduction of Over-Reliance: This study found that all three interventions reduced this over-reliance. They didn't really improve how we rely on the AI when it's right and when it's not. Proper reliance is all about using the AI when it's got it right and trusting your own judgment when it's not.
Reliance Disclaimer: This intervention, which adds a persistent warning to verify the LLM's answers, was the most effective. It was the only one to improve appropriate reliance in the LSAT task without significantly increasing the time participants took to process the information.
Uncertainty Highlighting: This method, which visually highlights uncertain tokens in the LLM's output, was found to have the least effect on reliance and also led to significantly lower ratings in user perceptions of the LLM's accuracy and usefulness.
Implicit Answer: This intervention, which provides hints or guidance instead of a direct answer, required significantly more time for participants to process. It induced better self-reliance but at the cost of increased cognitive effort.
Poor Confidence Calibration: The study revealed that participants generally exhibited poor confidence calibration, meaning they became more confident after making wrong reliance decisions in certain contexts. This highlights a key challenge in human-LLM collaboration.
This research highlights how tricky it is to design effective reliance interventions. It turns out that just spending more time with an intervention doesn't always mean you'll rely on it more. It's important to do a thorough, human-centred evaluation of new tech methods. We need to make sure these methods actually help people rely on things rather than just making them over-reliant.
The study revealed a significant "misalignment" between the tools developed by researchers and the tools writers actually use in their work. Researchers should shift their focus to these underexplored areas, such as supporting pre-writing activities and using diverse visual representations to aid writers. This approach will help bridge the gap between tool development and the evolving, unmet needs of creative writers.
Academic literature: A systematic review of 67 research papers.
Online questionnaire: Responses from 22 creative writers.
Reddit discussions: An analysis of discussions on the r/Writing subreddit.
Focus on the Wrong Stage: Academic research has primarily focused on the "writing activity" (text generation) while largely overlooking the pre-writing stage. This stage includes activities like gathering inspiration, organizing ideas, and planning. The paper argues that creative writing is a complex process that begins long before a single word is drafted.
The Importance of Visualization: There was a significant gap in the importance of visualization in the writing process. Writers often use visual aids, such as mind maps, mood boards, and timelines, to brainstorm and organize their ideas. These visual representations help with "imagery and coherence".
Need for Holistic Tools: While commercial tools often try to support the entire writing workflow, many writers still use a fragmented approach, combining multiple tools (e.g., OneNote for character profiles, Google Docs for writing, and Grammarly for grammar checks). This creates a need for tools that can more seamlessly integrate these different processes and help writers manage large projects.
AI for Brainstorming and Revision: The paper also discusses the role of AI in creative writing. While some writers find AI helpful for brainstorming ideas or getting unstuck, others feel that AI-generated text "doesn't sound like [them]" and can detract from the creative enjoyment. This suggests a need for more subtle AI interventions, such as recommender systems.
Structural Revision: The study found that while tools like Scrivener can help with moving large chunks of text, there is a lack of research on tools that can support large-scale structural revisions.
This Theory investigates how Large Language Models (LLMs) and humans differ in their argumentation when assessing subjective scenarios, specifically those involving subtle sexism. It argues that while LLMs can be useful in providing users with additional rationales for their decisions, it is crucial to understand the "perspectives" they take.
The study classified the different viewpoints into a finite set of perspectives:
Perpetrator: The person who made the sexist remark.
Victim: The person who was targeted by the sexist remark.
Decision-Maker: An objective third party evaluating the scenario.
It categorized the overall stance of the response as either "sexist", "not sexist", "depends", or "no stance". They then compared the responses of different LLMs (GPT-3, GPT-3.5, GPT-4, Llama 3.1) and humans when asked to evaluate various subtle sexism scenarios.
Have you ever wondered how humans and LLMs think about the same things? It turns out both humans and LLMs tend to use the same set of perspectives when they're arguing about scenarios. These perspectives are like the perpetrator, the victim, and the decision-maker.
While the perspectives are the same, they don't always appear in the same way. Different models and humans tend to use different combinations of these perspectives. For instance, newer models like GPT-4 and Llama 3.1 are more likely to give a multi-perspective response. This is a big change from the single-perspective approach that was more common in older models and even in human responses.
But what's even more exciting is how LLMs are changing over time. Older models like GPT-3 had responses that were pretty similar to human responses. But newer models like GPT-4 and Llama 3.1 are showing a greater tendency to include the victim's perspective. This means they're able to provide more nuanced and multifaceted arguments.
So, what does this mean for us? It suggests that when we're evaluating LLMs, we should go beyond just checking their accuracy. We should also consider their ability to take on different perspectives. For example, if we want to design an AI that can help us with a complex ethical problem, a model that can provide multiple perspectives might be a better choice than one that only gives a single answer.
This theory addresses the gap in understanding how users think about and interact with the long-term memory features of AI agents like ChatGPT. A preliminary investigation was conducted by interviewing six regular users of AI tools with memory and thematically analyzing 54 discussion threads on Reddit. The goal was to understand users' mental models, their unmet needs, and their current practices for managing agent memory.
Incomplete Mental Models and Desire for Transparency
Users often don't fully grasp how AI agents remember and recall information. Participants were pretty keen on knowing what the system remembers, how it decides what info is important, and how its memories affect its responses. Some participants were surprised by how much personal or "mundane" information the agents had collected, which made them feel a bit concerned and want more control over the process.
The Hierarchy of Memory
Users tend to think of an agent's memory like a pyramid, with general knowledge at the top and more specific, personalized info at the bottom.
The study breaks this down into five types of memories:
Factual Memories: General knowledge from the model's training data.
User-Related Memories: Personal details, preferences, and biographical information.
Domain-Related Memories: Information related to a specific subject, like programming or health advice.
Project-Related Memories: Higher-level contextual details for a long-term goal.
Task-Related Memories: Lower-level details for a specific, immediate task.
Users' needs for a system's memory change depending on their current task or project. Many participants wanted memories from different projects to be kept separate or "disjoint" from one another.
Current User Practices
Because existing tools often lack sufficient memory management features, users have developed their own workarounds.
Creating separate chat threads for different projects or tasks.
Creating custom agents or personas by providing different base instructions.
Manually copying and pasting information into new chats to ensure the agent has the necessary context. This can be a bit time-consuming and not very easy to maintain
Design Opportunities
Designers have the opportunity to create better agent memory mechanisms. Instead of relying on automatic retrieval methods like vector embeddings or time-based storage, the paper suggests aligning memory management with users' existing mental models.
Here are some cool design ideas that could make a difference:
Organize memories hierarchically based on user-defined tasks and projects.
Provide users with direct control over which groups of memories the agent can access.
Offer features like an on/off toggle or the ability to manually categorize and delete memories.