Research

3D Computer Vision and Inverse Graphics

Perceiving the 3D geometric configuration of scenes is essential for numerous tasks in many computer vision and robotics applications, such as autonomous driving vehicles or drones, mobile robots localization and mapping, obstacle avoidance and path planning. To reconstruct 3D information from an image precisely, our research topics include 3D reconstruction, stereo matching, multi-view stereo matching, LiDAR-stereo fusion, monocular depth estimation, stereo confidence estimation, etc. In addition, we are now interested in recovering humans or objects in full 3D.

Canonical 3D Object Representation , ICCV 2021

LiDAR-Stereo Fusion, TITS'20

Unsupervised Depth, TITS'20

Adversarial Confidence, TITS'20

Recent Related Publications (2018-2020)

Sunghun Joung, Seungryong Kim, Minsu Kim, Ig-Jae Kim, and Kwanghoon Sohn, "Learning Canonical 3D Object Representation for Fine-grained Recognition," IEEE International Conference on Computer Vision (ICCV), 2021.

Sunok Kim, Dongbo Min, Seungryong Kim, and Kwanghoon Sohn, “Learning Adversarial Confidence Measures for Robust Stereo Matching,” IEEE Trans. on Intelligent Transportation Systems (TITS), 2020. (Impact Factor: 6.319)

Sunghun Joung, Seungryong Kim, Kihong Park, and Kwanghoon Sohn, “Unsupervised Stereo Matching for Confidential Correspondence Consistency,” IEEE Trans. on Intelligent Transportation Systems (TITS), 2020. (Impact Factor: 6.319)

Kihong Park, Seungryong Kim, and Kwanghoon Sohn, “High-precision Depth Estimation Using Uncalibrated LiDAR and Stereo Fusion,” IEEE Trans. on Intelligent Transportation Systems (TITS), vol. 21, no. 1, pp. 321-335, Jan. 2020. (Impact Factor: 6.319)

Sunok Kim, Dongbo Min, Seungryong Kim, and Kwanghoon Sohn, “Unified Confidence Estimation Networks for Robust Stereo Matching,” IEEE Trans. on Image Processing (TIP), vol. 28, no. 3, pp. 1299-1313, Mar. 2019. (Impact Factor: 9.340)

Sunok Kim, Seungryong Kim, Dongbo Min, and Kwanghoon Sohn, “LAF-Net: Locally Adaptive Fusion Networks for Stereo Confidence Estimation,” IEEE Conf. Computer Vision Pattern Recognition (CVPR), Jun. 2019. (Oral Presentation) (5.58% acceptance rate)

Sungil Choi, Seungryong Kim, Kihong Park, and Kwanghoon Sohn, “Learning Descriptor, Confidence, and Depth Estimation in Multi-view Stereo,” IEEE CVPR Workshop- 1st International Workshop on Deep Learning for Visual SLAM (CVPRW), Jun. 2018.

Kihong Park, Seungryong Kim, and Kwanghoon Sohn, “High-precision Depth Estimation with the 3D LiDAR and Stereo Fusion,” IEEE International Conference on Robotics and Automation (ICRA), May 2018.

Visual Correspondence

Numerous computer vision and computational photography applications require the points on an object in one image to be matched with their corresponding object points in another image, such as a motorbike wheel matched to a different model of motorbike’s wheel. Establishing such dense correspondences across semantically similar images can facilitate a variety of computer vision applications including non-parametric scene parsing, semantic segmentation, object detection, and image editing. Our research topics include robust feature extraction and optimization techniques to establish highly accurate matching fields, covering from optical flow to semantic correspondence.

Cost Aggregation Transformers, NIPS'21

Deep Matching Prior, ICCV'21

DCTM Optimizer, TPAMI'20

Guided Semantic Flow, ECCV'20

Recent Related Publications (2018-2020)

Sunghwan Hong and Seungryong Kim,Deep Matching Prior: Test-Time Optimization for Dense Correspondence, IEEE International Conference on Computer Vision (ICCV), 2021.

Sangryul Jeon, Seungryong Kim, Dongbo Min, and Kwanghoon Sohn, “Pyramidal Semantic Correspondence Networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI). (Under Review)

Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn, “Dense Cross-Modal Correspondence Estimation with the Deep Self-Correlation Descriptor,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 2020. (Impact Factor: 17.861)

Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn, “Discrete-Continuous Transformation Matching for Dense Semantic Correspondence,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), vol. 42, no. 1, pp. 59-73, Jan. 2020. (Impact Factor: 17.861)

Sangryul Jeon, Dongbo Min, Seungryong Kim, and Kwanghoon Sohn, “Guided Semantic Flow,” European Conference on Computer Vision (ECCV), Sep. 2020.

Somi Jeong, Seungryong Kim, Kihong Park, and Kwanghoon Sohn, “Learning to Find Unpaired Cross-spectral Correspondences,” IEEE Trans. on Image Processing (TIP), vol. 28, no. 11, pp. 5394-5406, Nov. 2019. (Impact Factor: 9.340)

Seungryong Kim, Dongbo Min, Bumsub Ham, Stephen Lin, and Kwanghoon Sohn, “FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), vol. 41, no. 03, pp. 581-595, Mar. 2019. (Impact Factor: 17.861)

Seungryong Kim, Dongbo Min, Somi Jeong, Sunok Kim, Sangryul Jeon, and Kwanghoon Sohn, “Semantic Attribute Matching Networks,” IEEE Conf. Computer Vision Pattern Recognition (CVPR), Jun. 2019. (25.2% acceptance rate)

Seungryong Kim, Stephen Lin, Sangryul Jeon, Dongbo Min, and Kwanghoon Sohn, “Recurrent Transformer Networks for Semantic Correspondence,” Neural Information Processing Systems (NeurIPS), Dec. 2018. (Spotlight Presentation) (3.46% acceptance rate)

Sangryul Jeon, Seungryong Kim, Dongbo Min, and Kwanghoon Sohn, “PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence,” European Conference on Computer Vision (ECCV), Sep. 2018. (31.8% acceptance rate)

Visual Recognition and Cognition

Understanding visual contents enables analyzing images for scenes, objects, and attributes, and is a starting point for a number of applications, including image or video recognition, detection, segmentation, etc. We are interested in developing deep networks to recognize a scene in a human level. In addition, our works will go beyond recognition to cognition, where we are presenting a machine that can achieve higher-order cognition and commonsense reasoning about the world.

AggMatch, Arxiv'21

Volumetric Transformer Networks, ECCV'20

Cylindrical Convolutional Networks, CVPR'20

Context-aware Emotion Recognition, ICCV'19

Recent Related Publications (2018-2020)

Jiwon Kim, Kwangrok Ryoo, Gyuseong Lee, Seokju Cho, Junyoung, Seo, Daehwan Kim, and Seungryong Kim, "AggMatch: Aggregating Pseudo Labels for Semi-Supervised Learning", Arxiv, 2021

Jiyoung Lee, Seungryong Kim, Sunok Kim, and Kwanghoon Sohn, "Learning Discriminative Action Tubelets for Weakly-Supervised Action Detection", IEEE Trans. on Image Processing (TIP), (Under Review)

Seungryong Kim, Sabine Süsstrunk, and Mathieu Salzmann, “Volumetric Transformer Networks,” European Conference on Computer Vision (ECCV), Sep. 2020.

Sunghun Joung, Seungryong Kim, Hanjae Kim, Minsu Kim, Ig-Jae Kim, Junghyun Cho, and Kwanghoon Sohn, “Cylindrical Convolutional Networks for Joint Object Category and Viewpoint Estimation,” IEEE Conf. Computer Vision Pattern Recognition (CVPR), Jun. 2020. (22% acceptance rate)

Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, and Kwanghoon Sohn, “Context-Aware Emotion Recognition Networks,” IEEE International Conference on Computer Vision (ICCV), Oct. 2019. (25% acceptance rate)

Sangryul Jeon, Dongbo Min, Seungryong Kim, and Kwanghoon Sohn, “Joint Learning of Semantic Alignment and Object Landmark Detection,” IEEE International Conference on Computer Vision (ICCV), Oct. 2019. (25% acceptance rate)

Jungin Park, Jiyoung Lee, Sangryul Jeon, Seungryong Kim, and Kwanghoon Sohn, “Graph Regularization Network with Semantic Affinity for Weakly-supervised Temporal Action Localization,” IEEE International Conference on Image Processing (ICIP), Sep. 2019.

Minsu Kim, Sunghun Joung, Kihong Park, Seungryong Kim, and Kwanghoon Sohn, “Unpaired Cross-Spectral Pedestrian Detection via Adversarial Feature Learning,” IEEE International Conference on Image Processing (ICIP), Sep. 2019.

Jungin Park, Sangryul Jeon, Seungryong Kim, Jiyoung Lee, Sunok Kim, and Kwanghoon Sohn, “Learning to Detect, Associate, and Recognize Human Actions and Surrounding Scenes in Untrimmed Videos,” ACM Multimedia Workshop- The 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild (MMW), Oct. 2018.

Kwanghoon Sohn, Ming-Hsuan Yang, Hyeran Byun, Jongwoo Lim, Jison Hsu, Stephen Lin, Euntai Kim, and Seungryong Kim, “CoVieW’18: The 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild,” ACM Multimedia (MM), Oct. 2018.

Deep Representation Learning

The performance of machine learning or deep learning methods is heavily dependent on the choice of data representation or features, which is a key to the success of computer vision tasks. Learning discriminative feature representations of semantic objects or scene is key to the success of computer vision tasks such as fine-grained image recognition, instance-level image retrieval, and people re-identification. In this direction, we are exploring designing the feature representation towards a common mode such that semantic recognition suffers less from deformation. We are also interested in techniques to train the representation with various learning paradigms, such as meta learning and reinforcement learning.

Cost Aggregation Transformers, NIPS'21

Volumetric Transformer Networks, ECCV'20

Deep Self-Correlation, TPAMI'20

FCSS Descriptor, TPAMI'20

Recent Related Publications (2018-2020)

Seokju Cho*, Sunghwan Hong*, Sangryul Jeon, Yunsung Lee, Kwanghoon Sohn, and Seungryong Kim,CATs: Cost Aggregation Transformers for Visual Correspondence,” Neural Information Processing Systems (NeurIPS), 2021.

Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn, “Dense Cross-Modal Correspondence Estimation with the Deep Self-Correlation Descriptor,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), 2020. (Impact Factor: 17.861)

Seungryong Kim, Sabine Süsstrunk, and Mathieu Salzmann, “Volumetric Transformer Networks,” European Conference on Computer Vision (ECCV), Sep. 2020.

Sunghun Joung, Seungryong Kim, Hanjae Kim, Minsu Kim, Ig-Jae Kim, Junghyun Cho, and Kwanghoon Sohn, “Cylindrical Convolutional Networks for Joint Object Category and Viewpoint Estimation,” IEEE Conf. Computer Vision Pattern Recognition (CVPR), Jun. 2020. (22% acceptance rate)

Seungryong Kim, Dongbo Min, Bumsub Ham, Stephen Lin, and Kwanghoon Sohn, “FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), vol. 41, no. 03, pp. 581-595, Mar. 2019. (Impact Factor: 17.861)

Seungryong Kim, Dongbo Min, Somi Jeong, Sunok Kim, Sangryul Jeon, and Kwanghoon Sohn, “Semantic Attribute Matching Networks,” IEEE Conf. Computer Vision Pattern Recognition (CVPR), Jun. 2019. (25.2% acceptance rate)

Multi-modal Learning

For decades, numerous computer vision and image processing applications have been reformulated using different spectral modalities to overcome their inherent limitations when using mono-spectral data, such as scene classification, object segmentation, and pedestrian detection. In addition, forms of modalities such as natural language and audio signal were also possible other sources to improve the computer vision approaches. In this direction, we are interested in developing the multi-modal fusion techniques and their applications such as object detection or scene recognition.

DUNIT, CVPR'20

Tri-modal Facial Expression, TIP'20

Cross-spectral Matching, TIP'19

Multispectral Pedestrian Detection, PR'18

Recent Related Publications (2018-2020)

Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn, “Tri-modal Recurrent Attention Networks for Emotion Recognition,” IEEE Trans. on Image Processing (TIP), 2020. (Impact Factor: 9.340)

Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn, “Audio-Visual Attention Networks for Emotion Recognition,” ACM Multimedia Workshop- Workshop on Audio-Visual Scene Understanding for Immersive Multimedia (MMW), Oct. 2018.

Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn, “Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2018.

Image Translation and Manipulation

Image-to-image translation (I2I) has recently gained significant traction, to the point of being deployed in diverse applications, such as super-resolution, photo-realistic image synthesis, colorization and domain adaptation. We are exploring a method to translate or manipulate an image into a desirable one by using generative adversarial networks (GANs) in a weakly- or unsupervised fashion. We are also interested in restoring a degraded image with deep networks.

InstaFormer, CVPR'22

Deep Translation Prior, AAAI'22

DUNIT, CVPR'20

Single Image Deraining, TIP'20

Recent Related Publications (2018-2020)

Soohyun Kim, Jongbeom Baek, Jihye Park, Kyeongnyeon Kim , and Seungryong Kim, "InstaFormer: Instance-Aware Image-to-Image Translation with Transformer," Arxiv, 2021.

Sunwoo Kim, Soohyun Kim, and Seungryong Kim, "Deep Translation Prior: Test-time Training for Photorealistic Style Transfer," Arxiv, 2021.

Deblina Bhattacharjee, Seungryong Kim, Guillaume Vizier, and Mathieu Salzmann, “DUNIT: Detection-based Unsupervised Image-to-Image Translation,” IEEE Conf. Computer Vision Pattern Recognition (CVPR), Jun. 2020. (22% acceptance rate)

Jaehoon Cho, Seungryong Kim, Dongbo Min, and Kwanghoon Sohn, “Single Image Deraining Using Time-lapse Data,” IEEE Trans. on Image Processing (TIP), 2020. (Impact Factor: 9.340)

Seungryong Kim, Dongbo Min, Somi Jeong, Sunok Kim, Sangryul Jeon, and Kwanghoon Sohn, “Semantic Attribute Matching Networks,” IEEE Conf. Computer Vision Pattern Recognition (CVPR), Jun. 2019. (25.2% acceptance rate)