MAGUS

Dataset

	COCO-GVR Dataset for Group Visual Relation Detection The COCO-GVR dataset is the first dataset designed for evaluating Group Visual Relation Detection (GVRD). It is built upon the MSCOCO dataset and is divided into a training set and a test set, containing 8,056 and 1,514 images, respectively. The dataset includes 80 object categories, and the Group Visual Relations (GVRs) in each image are represented in the form of triplets. [details] [download]
	IOID Dataset for Instance of Interest Detection The IOID dataset is the first dataset for the Instance of Interest Detection (IOID) task. The IOID dataset is built on the basis of MSCOCO2017, and divided into a training set and a test set, with 133 categories of Instance of Interest (IOI). Each IOI is represented by an object category and a corresponding segmentation region on the image. [details] [download]
	NJU-DS400 Dataset for Depth-Aware Salient Object Detection NJU-DS400 dataset contains over 1000 stereo images collected from Internet, 3D movies and photographs taken by a Fuji W3 stereo camera with salient object masks labeled by four volunteers. [details] [download]
	NJU-DS2000 Dataset for Depth-Aware Salient Object Detection NJU-DS2000 dataset is extended on the basis of NJU-DS400, which includes 2000 stereo images collected from Internet, 3D movies and photographs taken by a Fuji W3 stereo camera. Four volunteers were invited to label the groundtruth salient object masks. [details] [download]
	ViROI Dataset for Visual Relation of Interest Detection The ViROI dataset is the first dataset for Visual Relation of Interest Detection (VROID). The ViROI dataset is built on the basis of IOID and MSCOCO2017, and is divided into a training set (25,091 images with 91,496 VROIs) and a test set (5,029 images with 18,268 VROIs). [details] [download]

Code

	GVRD The code provides the dataset download link, instructions for setting up the runtime environment, as well as scripts for training, evaluation, and testing. It performs detection of group visual relationship triplets in a given image, and outputs the bounding boxes and categories of the involved objects. On the self-constructed COCO-GVR dataset, it achieves a Top-10 mRecall of 14.71%, Top-20 mRecall of 22.19%, and Top-30 mRecall of 25.78%.
	MMSF This code constructs a multimodal processing method incorporating emotions for the task of recognizing conversation styles in videos, which enhances the differentiation of conversational videos, and achieves SOTA level on the video conversation style recognition dataset LVU, with 10% higher accuracy than previous methods.
	VROID This code provides a link to download the dataset, instructions for installing the runtime environment, and scripts for training, evaluation, testing, and visualization of the results, which implements the detection of visual relationship triples in a picture that may be of interest to humans for a given image, and gives the bounding boxes and categories of the objects involved, with a Recall of 30.75% for TOP10 on the self-built ViROI dataset, a TOP20 Recall is 38.79%, TOP50 Recall is 49.60% and TOP100 Recall is 57.50%.
	HOI-det This code constructs a multi-level conditional network for the task of human-centered interaction understanding, and comprehensively understands human-object interactions from multiple perspectives. It achieves SOTA level on both HICO-Det and V-COCO, which are commonly used datasets for character interaction detection, and outperforms the previous methods in detecting various types of interactions for both common and non-common objects.
	IOID This code provides links to download the dataset, instructions for installing the runtime environment, and scripts for training, evaluation, and testing, and implements the detection of the enclosing boxes and categories of objects in the graph that may be of interest to humans for a given image, achieving Precision of 68.47% and Recall of 30.15% on the self-built IOID dataset.

Competition Results

Video Relation Understanding Challenge

Video Relation Understanding (VRU) is a technical challenge hosted by ACM Multimedia, the premier international conference in multimedia. MAGUS.Gamma won first place in the Visual Relation Detection Task of VRU Challenge 2019, and the related paper "Video Visual Relation Detection via Multi-model Feature Fusion" was published in MM 2019 Grand Challenge. [paper] [code&features]