Tuesday, 7 September 2010

SiMPE 2010, Keynote: Trends and Challenges in Mobile Interaction

I was invited to give a keynote talk at the 5th Workshop on Speech in Mobile and Pervasive Environments that was held as a part of ACM MobileHCI 2010 in Lisbon, Portugal. Over the last years we looked at speech as an additional modality in the automotive user interface domain; besides this my experience with speech interfaces is limited.

My talk, with the title "Trends and Challenges in Mobile Interaction" looked at different issues in mobile interaction. In some parts I reflected on modalities and opportunities for speech interaction.

When characterizing mobile interaction I pointed out the following points:
  • Interacting while on the move
  • Interaction is one of the user's tasks (besides others - e.g. walking, standing in a crowd)
  • Environment in which the interaction takes place changes (e.g. on the train with varying light and noise conditions)
  • Interruptions happen frequently (e.g. boarding the bus, crossing the road)
  • Application usage is short (typically seconds to minutes)
My small set of key issues and recommendations for mobile UIs is:
  • Simple and Understandable
  • Perceptive and Context-Aware
  • Unobtrusive, Embedded and Integrated
  • Low Cognitive Load and Peripheral Usage
  • Users want to be in Control (especially on the move)
The presentations and discussion at the workshop were very interesting and I got a number of ideas for multimodal user interfaces - including speech.

One issue that made me think more was the question about natural language speech vs. specific speech commands. A colleague pointed me to Speech Graffiti [1] / Universal Speech Interface at CMU. I wonder if it would make sense to invent a Human Computer Interaction language (with a simple grammar and a vocabulary) that we could teach in a course over several weeks (e.g. similar effort than touch typing on a QUERTY keyboard) or as a foreign language at school to have a new effective means for interaction. Could this make us more effictive in interacting with information? Or should we try harder to get natural languge interaction working? Looking at the way (experienced) people use Google we can see that people adapt very successfully - probably faster than systems improve…

From some of the talks it seems that "Push to talk" seems to be a real issue for users and a reason for many user related errors in speech systems. Users do not push at the appropriate time, especially when there are other tasks to do, and hence utterances are cut off at the start and end. I would guess continuous recording of the speech and using the "push to talk" only as an indicator where to search in the audio stream may be a solution.

[1] Tomko, S. and Rosenfeld, R. 2004. Speech graffiti vs. natural language: assessing the user experience. In Proceedings of HLT-NAACL 2004: Short Papers (Boston, Massachusetts, May 02 - 07, 2004). Human Language Technology Conference. Association for Computational Linguistics, Morristown, NJ, 73-76. http://www.cs.cmu.edu/~usi/papers/HLT04.pdf