Person Search in Videos with One Portrait
Through Visual and Temporal Links

Qingqiu Huang1, Wentao Liu2, 3, Dahua Lin1

CUHK-SenseTime Joint Lab, The Chinese University of Hong Kong1
Tsinghua University2      SenseTime Research3
Proceedings of European Conference on Computer Vision (ECCV) 2018


In real-world applications, e.g. law enforcement and video retrieval, one often needs to search a certain person in long videos with just one portrait. This is much more challenging than the conventional settings for person re-identification, as the search may need to be carried out in the environments different from where the portrait was taken. In this paper, we aim to tackle this challenge and propose a novel framework, which takes into account the identity invariance along a tracklet, thus allowing person identities to be propagated via both the visual and the temporal links. We also develop a novel scheme called Progressive Propagation via Competitive Consensus, which significantly improves the reliability of the propagation process. To promote the study of person search, we construct a large-scale benchmark, which contains 127K manually annotated tracklets from 192 movies. Experiments show that our approach remarkably outperforms mainstream person re-id methods, raising the mAP from 42.16% to 62.27%.

Progressive Propagation via Competitive Consensus

Competitive Consensus

Competitive Consensus is a novel scheme for label propagation. Compared to the conventional linear diffusion, it improves the reliability by propagating the most confident information.

Progressive Propagation

Progressive Propagation is a simple but effective scheme to accelerate the propagation process and reduce the effects of noise, which freezes the label of a certain fraction of nodes at each iteration according to their confidence.


Accuracy on CSM

mAP R@1 R@3 R@5 mAP R@1 R@3 R@5
FACE 53.33 76.19 91.11 96.34 42.16 53.15 61.12 64.33
IDE 17.17 35.89 72.05 88.05 1.67 1.68 4.46 6.85
FACE+IDE 53.71 74.99 90.30 96.08 40.43 49.04 58.16 62.10
LP 8.19 39.70 70.11 87.34 0.37 0.41 1.60 5.04
PPCC-v(ours) 62.37 84.31 94.89 98.03 59.58 63.26 74.89 78.88
PPCC-vt(ours) 63.49 83.44 94.40 97.92 62.27 62.54 73.86 77.44

Examples of Search Results

CSM Dataset

Cast Search in Movies (CSM) contains more than 127K tracklets (with more than 11M person instances) of 1,218 cast from 192 movies. Bounding box and indentity of key frames are manually annotated. Bounding box of other instances of tracklets are gotten by a person detector based on Faster-RCNN.

Comparison between CSM and Related Datasets

task search re-id re-id re-id re-id det.+re-id recog.
type video video video video video image image
identities 1,218 1,261 300 200 1,501 8,432 2,356
tracklets 127K 20K 600 400 - - -
instances 11M 1M 44K 40K 32K 96K 63K

Examples of CSM



    title={Person Search in Videos with One Portrait Through Visual and Temporal Links},
    author={Huang, Qingqiu and Liu, Wentao and Lin, Dahua},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},


Coming soon!