Zhun Zhong 钟准

PhD Student at XMU, Xiamen University


Google Scholar/ GitHub

About me


I am a fourth-year Ph.D. student in Cognitive Science Department, at Xiamen University under the supervision of Prof. Shaozi Li . I received the M.S. Degree in Computer Science and Technology in 2015 from China University Of Petroleum, Qingdao, China. I received the B.S. degree from the Information Engineering Department, from East China University of Technology in 2012.

I am currently doing person re-identification research, as a joint Ph.D. student at University of Technology Sydney under the co-supervision of Prof. Yi Yang and Dr. Liang Zheng.

My research interests include person re-identification, object detection, and machine learning.


  • [2018.09.30] One paper is accepted to TIP 2019.
  • [2018.07.07] One paper is accepted to ECCV 2018.
  • [2018.02.24] One paper is accepted to CVPR 2018.

Arxiv Papers

Random Erasing Data Augmentation
Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, Yi Yang
Arxiv, 2017
abstract / bibtex / PDF/ Code
  title={Random Erasing Data Augmentation},
  author={Zhong, Zhun and Zheng, Liang and Kang, Guoliang and Li, Shaozi and Yang, Yi},
  journal={arXiv preprint arXiv:1708.04896},
In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN). In training, Random Erasing randomly selects a rectangle region in an image and erases its pixels with random values. In this process, training images with various levels of occlusion are generated, which reduces the risk of over-fitting and makes the model robust to occlusion. Random Erasing is parameter learning free, easy to implement, and can be integrated with most of the CNN-based recognition models. Albeit simple, Random Erasing is complementary to commonly used data augmentation techniques such as random cropping and flipping, and yields consistent improvement over strong baselines in image classification, object detection and person re-identification.


CamStyle: A Novel Data Augmentation Method for Person Re-identification
Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, Yi Yang
IEEE Transactions on Image Processing (TIP), 2019
bibtex / pdf / Code
  title={CamStyle: A Novel Data Augmentation Method for Person Re-identification},
  author={Zhong, Zhun and Zheng, Liang and Zheng, Zhedong and Li, Shaozi and Yang, Yi},
  journal={IEEE Transactions on Image Processing},
Generalizing A Person Retrieval Model Hetero- and Homogeneously
Zhun Zhong, Liang Zheng, Shaozi Li, Yi Yang
European Conference on Computer Vision (ECCV), 2018
abstract / bibtex / PDF/ Code
  title={Generalizing A Person Retrieval Model Hetero-and Homogeneously},
  author={Zhong, Zhun and Zheng, Liang and Li, Shaozi and Yang, Yi},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
Person re-identification (re-ID) poses unique challenges for unsupervised domain adaptation (UDA) in that classes in the source and target sets (domains) are entirely different and that image variations are largely caused by cameras. Given a labeled source training set and an unlabeled target training set, we aim to improve the generalization ability of re-ID models on the target testing set. To this end, we introduce a Hetero-Homogeneous Learning (HHL) method. Our method enforces two properties simultaneously: 1) camera invariance, learned via positive pairs formed by unlabeled target images and their camera style transferred counterparts; 2) domain connectedness, by regarding source / target images as negative matching pairs to the target / source images. The first property is implemented by homogeneous learning because training pairs are collected from the same domain. The second property is achieved by heterogeneous learning because we sample training pairs from both the source and target domains. On Market-1501, DukeMTMC-reID and CUHK03, we show that the two properties contribute indispensably and that very competitive re-ID UDA accuracy is achieved.
Camera Style Adaptation for Person Re-identification
Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, Yi Yang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
abstract / bibtex / PDF/ Code
  title={Camera style adaptation for person re-identification},
  author={Zhong, Zhun and Zheng, Liang and Zheng, Zhedong and Li, Shaozi and Yang, Yi},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
Being a cross-camera retrieval task, person re-identification suffers from image style variations caused by different cameras. The art implicitly addresses this problem by learning a camera-invariant descriptor subspace. In this paper, we explicitly consider this challenge by introducing camera style (CamStyle) adaptation. CamStyle can serve as a data augmentation approach that smooths the camera style disparities. Specifically, with CycleGAN, labeled training images can be style-transferred to each camera, and, along with the original training samples, form the augmented training set. This method, while increasing data diversity against over-fitting, also incurs a considerable level of noise. In the effort to alleviate the impact of noise, the label smooth regularization (LSR) is adopted. The vanilla version of our method (without LSR) performs reasonably well on few-camera systems in which over-fitting often occurs. With LSR, we demonstrate consistent improvement in all systems regardless of the extent of over-fitting. We also report competitive accuracy compared with the state of the art.
Re-ranking Person Re-identification with k-reciprocal Encoding
Zhun Zhong, Liang Zheng, Donglin Cao,Shaozi Li
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
abstract / bibtex / PDF/ Code and CUHK03 new training/testing protocol
  title={Re-ranking person re-identification with k-reciprocal encoding},
  author={Zhong, Zhun and Zheng, Liang and Cao, Donglin and Li, Shaozi},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on},
When considering person re-identification (re-ID) as a retrieval process, re-ranking is a critical step to improve its accuracy. Yet in the re-ID community, limited effort has been devoted to re-ranking, especially those fully automatic, unsupervised solutions. In this paper, we propose a k-reciprocal encoding method to re-rank the re-ID results. Our hypothesis is that if a gallery image is similar to the probe in the k-reciprocal nearest neighbors, it is more likely to be a true match. Specifically, given an image, a k-reciprocal feature is calculated by encoding its k-reciprocal nearest neighbors into a single vector, which is used for re-ranking under the Jaccard distance. The final distance is computed as the combination of the original distance and the Jaccard distance. Our re-ranking method does not require any human interaction or any labeled data, so it is applicable to large-scale datasets. Experiments on the large-scale Market-1501, CUHK03, MARS, and PRW datasets confirm the effectiveness of our method.
Class-Specific Object Proposals Re-ranking for Object Detection in Automatic Driving
Zhun Zhong, Mingyi Lei, Donglin Cao, Jianping Fan, Shaozi Li
Neurocomputing, 2017
abstract / bibtex / PDF
  title={Class-specific object proposals re-ranking for object detection in automatic driving},
  author={Zhong, Zhun and Lei, Mingyi and Cao, Donglin and Fan, Jianping and Li, Shaozi},
Object detection often suffers from a plenty of bootless proposals, selecting high quality proposals remains a great challenge. In this paper, we propose a semantic, class-specific approach to re-rank object proposals, which can consistently improve the recall performance even with less proposals. We first extract features for each proposal including semantic segmentation, stereo information, contextual information, CNN-based objectness and low-level cue, and then score them using classspecific weights learnt by Structured SVM. The advantages of the proposed model are two-fold: 1) it can be easily merged to existing generators with few computational costs, and 2) it can achieve high recall rate uner strict critical even using less proposals. Experimental evaluation on the KITTI benchmark demonstrates that our approach significantly improves existing popular generators on recall performance. Moreover, in the experiment conducted for object detection, even with 1,500 proposals, our approach can still have higher average precision (AP) than baselines with 5,000 proposals.
Detecting Ground Control Points via Convolutional Neural Network for Stereo Matching
Zhun Zhong, Songzhi Su, Donglin Cao, Shaozi Li, Zhihan Lv
Multimedia Tools and Applications (MTA), 2016
abstract / bibtex / PDF
  title={Detecting ground control points via convolutional neural network for stereo matching},
  author={Zhong, Zhun and Su, Songzhi and Cao, Donglin and Li, Shaozi and Lv, Zhihan},
  journal={Multimedia Tools and Applications},
In this paper, we present a novel approach to detect ground control points (GCPs) for stereo matching problem. First of all, we train a convolutional neural network (CNN) on a large stereo set, and compute the matching confidence of each pixel by using the trained CNN model. Secondly, we present a ground control points selection scheme according to the maximum matching confidence of each pixel. Finally, the selected GCPs are used to refine the matching costs, then we apply the new matching costs to perform optimization with semi-global matching algorithm for improving the final disparity maps. We evaluate our approach on the KITTI 2012 stereo benchmark dataset. Our experiments show that the proposed approach significantly improves the accuracy of disparity maps.
Unsupervised domain adaption dictionary learning for visual recognition
Zhun Zhong, Zongming Li, Runlin Li, Xiaoxia Sun
ICIP, 2015
abstract / bibtex / axXiv
  title={Unsupervised domain adaption dictionary learning for visual recognition},
  author={Zhong, Zhun and Li, Zongmin and Li, Runlin and Sun, Xiaoxia},
  journal={arXiv preprint arXiv:1506.01125},
Over the last years, dictionary learning method has been extensively applied to deal with various computer vision recognition applications, and produced state-of-the-art results. However, when the data instances of a target domain have a different distribution than that of a source domain, the dictionary learning method may fail to perform well. In this paper, we address the cross-domain visual recognition problem and propose a simple but effective unsupervised domain adaption approach, where labeled data are only from source domain. In order to bring the original data in source and target domain into the same distribution, the proposed method forcing nearest coupled data between source and target domain to have identical sparse representations while jointly learning dictionaries for each domain, where the learned dictionaries can reconstruct original data in source and target domain respectively. So that sparse representations of original data can be used to perform visual recognition tasks. We demonstrate the effectiveness of our approach on standard datasets. Our method performs on par or better than competitive state-of-the-art methods.


My Friends

I like this website!.

HTML Counter unique visitors since Dec 2016