About the project
This project aims to explore a Multimodal Large Language Model framework that enables Social Robots to interpret interaction contexts from various modality inputs, such as vision, language or audio, and provide interactions to users through multiple communication channels, such as speech, gestures or images.
Social robots are rapidly integrating into many aspects of daily life, from education and healthcare to workplaces and personal settings. All these practical applications require that robots collaborate effectively with humans in shared environments, where social interaction is essential. For socially assistive robots, it is crucial to be context-aware, enabling them to interact and deliver services that align with users' customs and needs, much like a human would.
This project aims to develop a multimodal input-output framework based on large language model, and specifically designed for human-robot interaction. The framework will empower robots to perceive multiple social signals collected from environments and users during daily interaction, to form a deep understanding of the interaction context, and respond appropriately to users via multiple communication channels.
You will have the opportunity to collaborate with other researchers in the Agent, Interaction, and Complexity group, and the Responsible AI project team, contributing to high-impact journals and conferences. You will have access to high-performance computers and multiple robot platforms to support your research.