IN DEVELOPMENT

Multi-modal Control for Robotic Devices

The transition toward Industry 5.0 necessitates robotic systems capable of seamless, intuitive collaboration with humans, moving beyond pre-programmed tools to adaptive partners. However, current Human-Robot Interaction (HRI) frameworks often struggle to interpret natural, multimodal human cues in dynamic environments, relying on rigid control interfaces that lack contextual reasoning. This research aims to bridge the gap between high-level cognitive reasoning and low-level motor control by developing a robust, multimodal HRI framework for a 7-Degree-of-Freedom (DoF) robotic manipulator. The methodology integrates three core components: a Convolutional Neural Network (CNN) based system (GESTID) for real-time static and dynamic gesture recognition; a Goal-Oriented Action Planning (GOAP) engine for dynamic task sequencing; and a Large Language Model (LLM) interface for interpreting natural language commands. These components are unified via the Robot Operating System (ROS) to control a Franka Emika Panda robot. The study specifically investigates the use of LLMs to perform semantic mapping of ambiguous speech instructions into deterministic robotic actions.

Experimental evaluations across complex scenarios, such as beverage preparation, demonstrated high system efficacy. The vision-based gesture recognition module achieved a classification accuracy of 99.1% with low latency, enabling realtime teleoperation. The speech-based interaction, integrated with GOAP, yielded a task success rate of 85.4% under dynamic conditions, while the LLM-driven semantic mapping correctly interpreted natural language instructions with 94.67% accuracy. These findings indicate that fusing visual perception with linguistic reasoning significantly enhances robotic adaptability. This thesis contributes a validated, scalable framework for multimodal HRI, offering tangible implications for industrial automation, healthcare, and assistive technologies where intuitive, hands-free control is paramount.