• Dataset
    IOID Dataset for Instance of Interest Detection
    The IOID dataset is the first dataset for the Instance of Interest Detection (IOID) task. The IOID dataset is built on the basis of MSCOCO2017, and divided into a training set and a test set, with 133 categories of Instance of Interest (IOI). Each IOI is represented by an object category and a corresponding segmentation region on the image. [details] [download]
    NJU-DS400 Dataset for Depth-Aware Salient Object Detection
    NJU-DS400 dataset contains over 1000 stereo images collected from Internet, 3D movies and photographs taken by a Fuji W3 stereo camera with salient object masks labeled by four volunteers. [details] [download]
    NJU-DS2000 Dataset for Depth-Aware Salient Object Detection
    NJU-DS2000 dataset is extended on the basis of NJU-DS400, which includes 2000 stereo images collected from Internet, 3D movies and photographs taken by a Fuji W3 stereo camera. Four volunteers were invited to label the groundtruth salient object masks. [details] [download]
    ViROI Dataset for Visual Relation of Interest Detection
    The ViROI dataset is the first dataset for Visual Relation of Interest Detection (VROID). The ViROI dataset is built on the basis of IOID and MSCOCO2017, and is divided into a training set (25,091 images with 91,496 VROIs) and a test set (5,029 images with 18,268 VROIs). [details] [download]
    Code
    MMSF
    This code constructs a multimodal processing method incorporating emotions for the task of recognizing conversation styles in videos, which enhances the differentiation of conversational videos, and achieves SOTA level on the video conversation style recognition dataset LVU, with 10% higher accuracy than previous methods.
    VROID
    This code provides a link to download the dataset, instructions for installing the runtime environment, and scripts for training, evaluation, testing, and visualization of the results, which implements the detection of visual relationship triples in a picture that may be of interest to humans for a given image, and gives the bounding boxes and categories of the objects involved, with a Recall of 30.75% for TOP10 on the self-built ViROI dataset, a TOP20 Recall is 38.79%, TOP50 Recall is 49.60% and TOP100 Recall is 57.50%.
    HOI-det
    This code constructs a multi-level conditional network for the task of human-centered interaction understanding, and comprehensively understands human-object interactions from multiple perspectives. It achieves SOTA level on both HICO-Det and V-COCO, which are commonly used datasets for character interaction detection, and outperforms the previous methods in detecting various types of interactions for both common and non-common objects.
    IOID
    This code provides links to download the dataset, instructions for installing the runtime environment, and scripts for training, evaluation, and testing, and implements the detection of the enclosing boxes and categories of objects in the graph that may be of interest to humans for a given image, achieving Precision of 68.47% and Recall of 30.15% on the self-built IOID dataset.
    Competition Results
    Video Relation Understanding Challenge
    Video Relation Understanding (VRU) is a technical challenge hosted by ACM Multimedia, the premier international conference in multimedia. MAGUS.Gamma won first place in the Visual Relation Detection Task of VRU Challenge 2019, and the related paper "Video Visual Relation Detection via Multi-model Feature Fusion" was published in MM 2019 Grand Challenge. [paper] [code&features]