Short Bio

Dongqi Cai (蔡东琪) is currently an AI research scientist at Intel Labs. She got her Ph.D. degree from Multimedia Communication and Pattern Recognition Lab (MCPRL), Beijing University of Posts and Telecommunications (BUPT) in March 2016, supervised by Prof. Fei Su. She was a joint Postdoctoral Researcher of Intel and Tsinghua University (THU) during 2017 to 2019, advised by Prof. Li Zhang.  

Her research interests include Computer Vision, Deep Learning and Machine Learning, specifically deep learning based visual recognition, multi-modality visual understanding and efficient AI application deployment.

Ske2Grid: Skeleton-to-Grid Representation Learning for Action Recognition 

Intl. Conf. on Machine Learning (ICML), 2023

Dongqi Cai, YangYuxuan Kang, Anbang Yao, Yurong Chen

This paper presents Ske2Grid, a new representation learning framework for improved skeleton-based action recognition. In Ske2Grid, we define a regular convolution operation upon a novel grid representation of human skeleton, which is a compact image-like grid patch constructed and learned through three novel designs, namely graph-node index transform (GIT), up-sampling transform (UPT) and progressive learning strategy (PLS). We construct networks upon prevailing graph convolution networks and conduct experiments on six mainstream skeleton-based action recognition datasets. Experiments show that our Ske2Grid significantly outperforms existing GCN-based solutions under different benchmark settings, without bells and whistles.

    author = {Cai, Dongqi and Kang, Yangyuxuan and Yao, Anbang and Chen, Yurong},
    title = {Ske2Grid: Skeleton-to-Grid Representation Learning for Action Recognition},
    booktitle = {International Conference on Machine Learning},

Dynamic Normalization and Relay for Video Action Recognition 

Advances in Neural Information Processing Systems (NeurIPS), 2021

Dongqi Cai, Anbang Yao, Yurong Chen

Convolutional Neural Networks (CNNs) have been the dominant model for video action recognition. Due to the huge memory and compute demand, popular action recognition networks need to be trained with small batch sizes, which makes learning discriminative spatial-temporal representations for videos become a challenging problem. In this paper, we present Dynamic Normalization and Relay (DNR), an improved normalization design, to augment the spatial-temporal representation learning of any deep action recognition model, adapting to small batch size training settings. DNR introduces two dynamic normalization relay modules to explore the potentials of cross-temporal and cross-layer feature distribution dependencies for estimating accurate layer-wise normalization parameters. These two DNR modules are instantiated as a light-weight recurrent structure conditioned on the current input features, and the normalization parameters estimated from the neighboring frames based features at the same layer or from the whole video clip based features at the preceding layers. Experimental results show that DNR brings large performance improvements to the baselines, achieving over 4.4% absolute margins in top-1 accuracy without training bells and whistles. More experiments on 3D backbones and several latest 2D spatial-temporal networks further validate its effectiveness.

    author = {Cai, Dongqi and Yao, Anbang and Chen, Yurong},
    title = {Dynamic Normalization and Relay for Video Action Recognition},
    booktitle = {Advances in Neural Information Processing Systems},

Earlier Publications

  • Learning visual knowledge memory networks for visual question answering
    Zhou Su, Chen Zhu, Yinpeng Dong, Dongqi Cai, Yurong Chen and Jianguo Li
    In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2018 [Paper]

  • Learning supervised scoring ensemble for emotion recognition in the wild
    Ping Hu, Dongqi Cai, Shandong Wang, Anbang Yao, and Yurong Chen
    In Proceedings of the 19th ACM International Conference on Multimodal Interaction (ICMI), 2017 [Paper]

  • HoloNet: towards robust emotion recognition in the wild
    Anbang Yao, Dongqi Cai, Ping Hu, Shandong Wang, Liang Sha, and Yurong Chen
    In Proceedings of the 18th ACM international conference on multimodal interaction (ICMI), 2016 [Paper]

  • Adaptive Synopsis of Non-Human Primates’ Surveillance Video Based on Behavior Classification
    Dongqi Cai, Fei Su and Zhicheng Zhao
    In 22nd International Conference on MultiMedia Modeling (MMM), 2016 [Paper]

  • Deep CCA based super vector for action recognition
    Dongqi Cai and Fei Su
    In IEEE International Conference on Image Processing (ICIP), 2015 [Paper]

  • Local metric learning for EEG-based personal identification
    Dongqi Cai, Kai Liu and Fei Su
    In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015 [Paper]

  • An adaptive symmetry detection algorithm based on local features
    Dongqi Cai, Pengyu Li, Fei Su and Zhicheng Zhao
    In IEEE Visual Communications and Image Processing Conference (VCIP), 2014 [Paper]

  • A joint NHP's behaviour classification method based on sticky HDP-HMM
    Dongqi Cai and Fei Su
    In IEEE International Conference on Network Infrastructure and Digital Content (ICNIDC), 2014 [Paper]

  • HVS based visual quality assessment for digital cinema environment
    Dongqi Cai and Fang Wei
    In IEEE InternationalConference on Network Infrastructure and Digital Content (ICNIDC), 2010 [Paper]

  • 2022 Intel Beijing RYC Top-3 Volunteers
  • Intel China Employee of the Year Award 2021
  • Intel China Award 2017, Highest Annual Team Award of Intel China
  • Winner Team of EmotiW-AFEW 2017, out of 100+ Teams
  • Gordy Award 2016(named after Intel’s co-founder Gordon Earle Moore), Highest Annual Research Award of Intel Labs
  • 1st Runner-up Team of EmotiW-AFEW 2016, out of ~100 Teams