Results
We compared Sensible Agent to a conventional, voice-controlled AR assistant baseline. We measured cognitive load using the NASA Task Load Index (NASA-TLX), overall usability with the System Usability Scale (SUS), user preference on a 7-point Likert scale, and total interaction time.
The most significant finding was the reduction in cognitive workload. The NASA-TLX data showed that on a 100-point scale for mental demand, the average score for Sensible Agent was 21.1, compared to 65.0 for the baseline with a statistically significant difference (𝑝 < .001). We saw a similar significant reduction in perceived effort (𝑝 = .0039), which suggests that the proactive system successfully offloaded the mental work of forming a query.
Regarding usability, both systems performed well, with no statistically significant difference between their SUS scores (𝑝 = .11). However, participants expressed a strong and statistically significant preference for Sensible Agent (𝑝 = .0074). On a 7-point scale, the average preference rating was 6.0 for Sensible Agent, compared to 3.8 for the baseline.
For the interaction time, logged from the moment a prompt was triggered to the final system response to the user’s input, the baseline was faster (μ = 16.4s) compared to Sensible Agent (μ = 28.5s). This difference is an expected trade-off of the system’s two-step interaction flow, where the agent first proposes an action and the user then confirms it. The strong user preference for Sensible Agent suggests this trade-off was acceptable, particularly in social contexts where discretion and minimal user effort were important.