|
MMSF
This code constructs a multimodal processing method incorporating emotions for the task of recognizing conversation styles in videos, which enhances the differentiation of conversational videos, and achieves SOTA level on the video conversation style recognition dataset LVU, with 10% higher accuracy than previous methods.
|
|
VROID
This code provides a link to download the dataset, instructions for installing the runtime environment, and scripts for training, evaluation, testing, and visualization of the results, which implements the detection of visual relationship triples in a picture that may be of interest to humans for a given image, and gives the bounding boxes and categories of the objects involved, with a Recall of 30.75% for TOP10 on the self-built ViROI dataset, a TOP20 Recall is 38.79%, TOP50 Recall is 49.60% and TOP100 Recall is 57.50%.
|
|
HOI-det
This code constructs a multi-level conditional network for the task of human-centered interaction understanding, and comprehensively understands human-object interactions from multiple perspectives. It achieves SOTA level on both HICO-Det and V-COCO, which are commonly used datasets for character interaction detection, and outperforms the previous methods in detecting various types of interactions for both common and non-common objects.
|
|
IOID
This code provides links to download the dataset, instructions for installing the runtime environment, and scripts for training, evaluation, and testing, and implements the detection of the enclosing boxes and categories of objects in the graph that may be of interest to humans for a given image, achieving Precision of 68.47% and Recall of 30.15% on the self-built IOID dataset.
|