Zhun Zhong 钟准

PhD Student at XMU, Xiamen University


Google Scholar/ GitHub

About me


I am currently a Ph.D. student in Cognitive Science Department, at Xiamen University under the supervision of Prof. Shaozi Li . I received the M.S. Degree in Computer Science and Technology in 2015 from China University Of Petroleum, Qingdao, China. I received the B.S. degree from the Information Engineering Department, from East China University of Technology in 2012.

I am currently doing person re-identification research, under the co-supervision of Dr. Liang Zheng.

My research interests include person re-identification, object detection, and machine learning.


  • [2017.03.04] "Re-ranking Person Re-identification with k-reciprocal Encoding" is accepted to CVPR 2017. Code and the CUHK03 new training/testing protocol is available. Link.
  • [2016.02.26] One paper is accepted to Neurocomputing 2017.
  • [2016.12.01] ID-discriminative Embedding baseline on Market-1501 is available. [Code]
  • [2016.09.01] One paper is accepted to Multimedia Tools and Applications 2016.


Re-ranking Person Re-identification with k-reciprocal Encoding
Zhun Zhong, Liang Zheng, Donglin Cao,Shaozi Li
To appear in CVPR, 2017
abstract / bibtex / PDF/ Code and CUHK03 new training/testing protocol
  title={Re-ranking Person Re-identification with k-reciprocal Encoding},
  author={Zhong, Zhun and Zheng, Liang and Cao, Donglin and Li, Shaozi},
When considering person re-identification (re-ID) as a retrieval process, re-ranking is a critical step to improve its accuracy. Yet in the re-ID community, limited effort has been devoted to re-ranking, especially those fully automatic, unsupervised solutions. In this paper, we propose a k-reciprocal encoding method to re-rank the re-ID results. Our hypothesis is that if a gallery image is similar to the probe in the k-reciprocal nearest neighbors, it is more likely to be a true match. Specifically, given an image, a k-reciprocal feature is calculated by encoding its k-reciprocal nearest neighbors into a single vector, which is used for re-ranking under the Jaccard distance. The final distance is computed as the combination of the original distance and the Jaccard distance. Our re-ranking method does not require any human interaction or any labeled data, so it is applicable to large-scale datasets. Experiments on the large-scale Market-1501, CUHK03, MARS, and PRW datasets confirm the effectiveness of our method.
Class-Specific Object Proposals Re-ranking for Object Detection in Automatic Driving
Zhun Zhong, Mingyi Lei, Donglin Cao, Jianping Fan, Shaozi Li
Neurocomputing, 2017
abstract / bibtex / PDF
  title={Class-Specific Object Proposals Re-ranking for Object Detection in Automatic Driving},
  author={Zhong, Zhun and Lei, Mingyi and Cao, Donglin and Fan, Jianping and Li, Shaozi},
Object detection often suffers from a plenty of bootless proposals, selecting high quality proposals remains a great challenge. In this paper, we propose a semantic, class-specific approach to re-rank object proposals, which can consistently improve the recall performance even with less proposals. We first extract features for each proposal including semantic segmentation, stereo information, contextual information, CNN-based objectness and low-level cue, and then score them using classspecific weights learnt by Structured SVM. The advantages of the proposed model are two-fold: 1) it can be easily merged to existing generators with few computational costs, and 2) it can achieve high recall rate uner strict critical even using less proposals. Experimental evaluation on the KITTI benchmark demonstrates that our approach significantly improves existing popular generators on recall performance. Moreover, in the experiment conducted for object detection, even with 1,500 proposals, our approach can still have higher average precision (AP) than baselines with 5,000 proposals.
Detecting Ground Control Points via Convolutional Neural Network for Stereo Matching
Zhun Zhong, Songzhi Su, Donglin Cao, Shaozi Li, Zhihan Lv
Multimedia Tools and Applications (MTA), 2016
abstract / bibtex / PDF
  title={Detecting ground control points via convolutional neural network for stereo matching},
  author={Zhong, Zhun and Su, Songzhi and Cao, Donglin and Li, Shaozi and Lv, Zhihan},
  journal={Multimedia Tools and Applications},
In this paper, we present a novel approach to detect ground control points (GCPs) for stereo matching problem. First of all, we train a convolutional neural network (CNN) on a large stereo set, and compute the matching confidence of each pixel by using the trained CNN model. Secondly, we present a ground control points selection scheme according to the maximum matching confidence of each pixel. Finally, the selected GCPs are used to refine the matching costs, then we apply the new matching costs to perform optimization with semi-global matching algorithm for improving the final disparity maps. We evaluate our approach on the KITTI 2012 stereo benchmark dataset. Our experiments show that the proposed approach significantly improves the accuracy of disparity maps.
Unsupervised domain adaption dictionary learning for visual recognition
Zhun Zhong, Zongming Li, Runlin Li, Xiaoxia Sun
ICIP, 2015
abstract / bibtex / axXiv
  title={Unsupervised domain adaption dictionary learning for visual recognition},
  author={Zhong, Zhun and Li, Zongmin and Li, Runlin and Sun, Xiaoxia},
  journal={arXiv preprint arXiv:1506.01125},
Over the last years, dictionary learning method has been extensively applied to deal with various computer vision recognition applications, and produced state-of-the-art results. However, when the data instances of a target domain have a different distribution than that of a source domain, the dictionary learning method may fail to perform well. In this paper, we address the cross-domain visual recognition problem and propose a simple but effective unsupervised domain adaption approach, where labeled data are only from source domain. In order to bring the original data in source and target domain into the same distribution, the proposed method forcing nearest coupled data between source and target domain to have identical sparse representations while jointly learning dictionaries for each domain, where the learned dictionaries can reconstruct original data in source and target domain respectively. So that sparse representations of original data can be used to perform visual recognition tasks. We demonstrate the effectiveness of our approach on standard datasets. Our method performs on par or better than competitive state-of-the-art methods.

I like this website!.

HTML Counter unique visitors since Dec 2016