Enhancing Vehicle Interior Action Recognition using Contrastive Self-Supervised Learning with 3D Human Skeleton Representations
Abstrakt
Over the past few years, a mounting alarm regarding the rising fatalities attributed to driver distraction-related car accidents has been highlighted the urgency of developing advanced action recognition systems within the car interior. This master thesis addresses the pressing issue of the need for advanced action recognition systems in the car interior emphasizing the potential of examining human behavior in the vehicle's interior in light of the increasing adoption of automation for better driver adaptation, human-vehicle communication, and safety. We investigate two self-supervised learning approaches, DINO with STTFormer and PSTL with STGCN, using 3D human skeleton representations on NTU RGB+D and Drive&Act datasets. Extensive experiments and evaluations, including linear and k-NN assessments, demonstrate the competitive performance of PSTL with ST-GCN, while revealing challenges in the Drive&Act dataset and the complexities of self-supervised learning convergence. This research not only contributes to the advancement of action recognition systems for safer driving and dynamic adaptation but also underscores the significance of self-supervised learning in interpreting and improving human activities inside vehicles, facilitating the development of more intuitive and responsive autonomous driving systems.
