My talk, with the title "Trends and Challenges in Mobile Interaction" looked at different issues in mobile interaction. In some parts I reflected on modalities and opportunities for speech interaction.
When characterizing mobile interaction I pointed out the following points:
- Interacting while on the move
- Interaction is one of the user's tasks (besides others - e.g. walking, standing in a crowd)
- Environment in which the interaction takes place changes (e.g. on the train with varying light and noise conditions)
- Interruptions happen frequently (e.g. boarding the bus, crossing the road)
- Application usage is short (typically seconds to minutes)
- Simple and Understandable
- Perceptive and Context-Aware
- Unobtrusive, Embedded and Integrated
- Low Cognitive Load and Peripheral Usage
- Users want to be in Control (especially on the move)
One issue that made me think more was the question about natural language speech vs. specific speech commands. A colleague pointed me to Speech Graffiti  / Universal Speech Interface at CMU. I wonder if it would make sense to invent a Human Computer Interaction language (with a simple grammar and a vocabulary) that we could teach in a course over several weeks (e.g. similar effort than touch typing on a QUERTY keyboard) or as a foreign language at school to have a new effective means for interaction. Could this make us more effictive in interacting with information? Or should we try harder to get natural languge interaction working? Looking at the way (experienced) people use Google we can see that people adapt very successfully - probably faster than systems improve…
From some of the talks it seems that "Push to talk" seems to be a real issue for users and a reason for many user related errors in speech systems. Users do not push at the appropriate time, especially when there are other tasks to do, and hence utterances are cut off at the start and end. I would guess continuous recording of the speech and using the "push to talk" only as an indicator where to search in the audio stream may be a solution.
 Tomko, S. and Rosenfeld, R. 2004. Speech graffiti vs. natural language: assessing the user experience. In Proceedings of HLT-NAACL 2004: Short Papers (Boston, Massachusetts, May 02 - 07, 2004). Human Language Technology Conference. Association for Computational Linguistics, Morristown, NJ, 73-76. http://www.cs.cmu.edu/~usi/papers/HLT04.pdf