Reflections on “Interacting with AI” conference

5 min readOct 2, 2021

I took a development day to attend a conference on interacting with AI although I am likely to pay for it next week. This was an academic conference and most presenters were conversation or multimodal interaction analysts and the AI they talked about was mainly robots or voice UI.

Some generic observations and reflections below:

We are still heavily drawing on the human/natural in robot and voice design even when it comes to the form factor although one could think that the human form factor, for example, is not necessarily the best for every use case. It is painful to look at care robots dressed as female nurses or (headless and furless) four-legged service robots for the blind. I would still call for a a step away from the anthropomorphism and step towards imagining, or even celebrating, the non-human suited for the use case. In the examples from the conference, the ethnographic method was definitely pushing the robot design towards anthropomorphism because the robot replaced a living actor (human or dog servant) and the design was inspired by observations of humans acting in the role the robot was meant to fill. Maybe we should evaluate both the goal of the robot and the research methods because the end results are so uncanny valley. Or maybe it is still too early.
As a result of the above, the quality of the robot ended up being judged by its ability to deal with the social, which is a bit unfair. This was then attributed to the intelligence of its creators, where I detected some dissing of designers and developers for not understanding the complexity of interaction, which I think is doubly unfair because not only are we talking about machines but also about machines developed in constrained environments. For example, when the Google Duplex was not producing pauses that felt right to the human participants, it was perceived as a shortcoming of the designer, although one should be able to imagine the multitudes of contexts impacting the length of the pause and the exponential complexity it adds to its implementation (machine learning aside). And its importance to the product value? Not important at all, unless the goal of the Duplex is to mimic a human and not assist the human in relatively straight-forward tasks.
The conference was hybrid which created interesting separations to the audience: the Zoom chat was very active but not accessible to the physical participants and the mics could not pick up the questions from the physical room which meant that the QA session after the presentation was not really accessible to online participants. One presentation was a YouTube video of the presenter who was present in the online conference. This seemed very meta but I guess was actually for purely pragmatic reasons of not being comfortable presenting in English, which definitely hindered the delivery of some of the presentations.

And the cool bits:

There was a presentation about self-driving cars and their inability to understand the “social road”. While humans yield in traffic, self-driving cars cannot read or signal this which again makes it hard for the humans outside the car to decide what to do. There was a delicious situation where the human in the car needed to tell the pedestrians that this is a self-driving car and to not worry “it can see you”, for the pedestrians to be able to make a decision about whether or not to cross the road. We have been preoccupied with creating trust for the person in the car but not really for the people outside the car. If the cars ever catch on big time (which I still doubt), it will be interesting to see what kinds of conventions and affordances we create and adopt for yielding, and subsequently, how that changes human behaviour in traffic. Someone even asked how to do road rage against the self-driving machine.
A presentation about in-car voice UI was also very interesting because the presenters showed videos of different people interacting with the VUI and each other while driving. There was an amazing elderly couple who in their years of being together had established a smooth collaboration: the wife was the VUI expert and the husband used the other modes. Together they navigated the task: the husband was in charge of pressing the VUI activation button, so the wife would point to the button, husband would press and the wife utter the voice command.
A presentation about people who were born blind using their mobile phones was a revelation. I have never seen people who are born blind using devices. In the tasks they were trying to scan product information from packages for the voice UI to read it back to them, but positioning the scanner was very difficult for them. They positioned and repositioned both the phone and the bag without any feedback of whether they are close to a successful read by the scanner. And curiously the voice feedback was the only (I think) feedback of success, so in a noisy supermarket, they would then have to bring the device closer to their ear to hear it which is when they lose the position again. I am not sure if there was a notice and a delay before the product info was read because otherwise they would also miss some info. Maybe some kind of haptic feedback for the scanning (no source — some source — almost reading — reading — bring to your ear) would be useful. The researcher’s topic was not this though: she is interested in the relationship between body, object and space as a multi-sensorial phenomenon and the blind people are a good way to get a glimpse into other ways of sensing in a world that is so dominantly visual. Even the way they held the mobile phone was different because the primary feedback is audio but they still need to be able to touch the display... Mind blown.
The final presentation was about virtual and physical assistants collaborating in the care of disabled people. It is remarkable how much voice commands operating a smart home can help the everyday of the disabled, but the presenter reminded us that the tech can also justify cost cutting because and aims at the independence of the disabled person while it could also aim at the interdependence of multiparty assistance (human and virtual). The examples showed that the voice assistants are not (yet?) good in multiparty situations.

I do think academic analysis of human-tech interactions e.g. using conversation or multimodal analyses as methods is very important. For me as an industry person, this level of detail can rarely be achieved as our tendency is to move fast and jump to conclusions. And because we are constrained and in the weeds, we do not make connections between Wittgenstein and what we do, nor do we always remember that it is indeed people who live with the tech we build. So thank you academia.

Reflections on “Interacting with AI” conference

Written by Pami Hekanaho