4th Workshop on Image/Video/Audio Quality in Computer Vision and Generative AI

Home

Important Dates/Links:

Submission Deadline: 15 December, 2024 -> 21 December, 2024
Submission Link: https://cmt3.research.microsoft.com/ImageQuality2025
Acceptance Notification Deadline: 5 January, 2025
Camera Ready Papers Submission Deadline: 10 January, 2025, at 11:59 PM Pacific Daylight Time.
Workshop Day: 28 February, 2025
Workshop Website: https://wacv2025-image-quality-workshop2.github.io/index.html
Authors of accepted papers are required to present their work live, either in-person or remote. If pre-recorded videos are used, the authors are required to join the meeting online to do Q&A live.

Description:

Many machine learning tasks and computer vision algorithms are susceptible to image/video/audio quality artifacts. Nonetheless, most visual learning and vision systems assume high-quality image/video/audio as input. In reality, noises and distortions are common in image/video/audio capturing and acquisition process. Oftentimes, artifacts can be introduced in the video compression, transcoding, transmission, decoding, and/or rendering process. All of these quality issues play a critical role on the performance of learning algorithms, systems and applications, therefore could directly impact the customer experience.

Topics:

This workshop addresses topics related to image/video/audio quality in machine learning, computer vision, and generative AI. The topics include, but are not limited to:

Impact of image/video/audio quality in traditional machine learning and computer vision use cases such as object detection, segmentation, tracking, and recognition;
Analyze, model and learn the quality impact from image/video/audio acquisition, compression, transcoding, transmission, decoding, rendering, and/or display;
Techniques used to improve image/video/audio quality in terms of
- brightening, color adjustment, sharpening, inpainting, deblurring, denoising, de-hazing, de-raining, demosaicing,
- removing artifacts such as shadows, glare, and reflections, etc.,
- resolution, frame rate, color gamut, dynamic range (SDR vs. HDR), etc.,
- noise/echo cancellation, speech enhancement, etc.;
- Film grain preservation and synthesis.
Novel image/video/audio quality assessment methodologies: full reference, reduced-reference, and non-reference;
Impact of image/video/audio quality in multi-modal use cases;
Evaluate image/video/audio quality produced by generative AI;
Techniques to detect and mitigate audio/video synchronization issue;
Techniques to measure the quality consistency across different types of content in video (such as ads, movies, streamed content, etc.);
Subjective image/video/audio quality data creation, cleaning, and validation in both lab and crowd-sourcing setups;
Datasets, statistics, and theory of image/video/audio quality;
Research, applications and system development of the above.

Schedule (MST)

Feb 28th, 2025, 8:20 AM – 5:30 PM

Time	Event	Duration
8:20-8:30am	Opening Remarks (Host: Joe Liu)	10 mins
8:30-9:30am	Keynote Keynote Speaker: Kevin Bowyer, "What makes a good quality face recognition training set?" (Host: Joe Liu)	60 mins
9:30-10:15am	Coffee Break	45 mins
10:15-11:15am	Oral Long Session I (Host: Yarong Feng)	60 mins
10:15-10:30am	DaBiT: Depth and Blur informed Transformer for Video Deblurring (in person) Presenter: Crispian Morris	15 mins
10:30-10:45am	Lights, Camera, Matching: The Role of Image Illumination in Fair Face Recognition (in person) Presenter: Gabriella Pangelinan	15 mins
10:45-11:00am	Quantifying Generative Stability: Mode Collapse Entropy Score for Mode Diversity Evaluation (in person) Presenter: Jens Duym	15 mins
11:00-11:15am	TE-NeRF: Triplane-Enhanced Neural Radiance Field for Artifact-Free Human Rendering (in person) Presenter: Sadia Mubashshira	15 mins
11:15-11:45am	Oral Short Session I (Host: Yarong Feng)	30 mins
11:15-11:20pm	LatentPS: Image Editing Using Latent Representations in Diffusion Models (online) Presenter: Zilong Wu	5 mins
11:20-11:25pm	IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion (online) Presenter: Tharun Anand	5 mins
11:25-11:30pm	Advancing Super-Resolution in Neural Radiance Fields via Variational Diffusion Strategies (online) Presenter: Shrey Vishen	5 mins
11:30-11:35pm	Similarity Trajectories: Linking Sampling Process to Artifacts in Diffusion-Generated Images (online) Presenter: Dennis Menn	5 mins
11:35-11:40pm	A Distortion Aware Image Quality Assessment Model (online) Presenter: Ha Thu Nguyen	5 mins
11:40-11:45pm	SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing (online) Presenter: Youshan Zhang	5 mins
11:45am-1:15pm	Lunch Break	90 mins
1:15-2:00pm	Oral Long Session II (Host: Qipin Chen)	45 mins
1:15-1:30pm	Unsupervised Generative Approach for Anomaly Detection to Enhance the Quality of Unseen Medical Datasets (in person) Presenter: Zhemin Zhang	15 mins
1:30-1:45pm	Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images (in person) Presenter: Zeyun Deng	15 mins
1:45-2:00pm	HipyrNet: Hypernet-Guided Feature Pyramid network for mixed-exposure correction (in person) Presenter: Aravind Shenoy	15 mins
2:00-2:15pm	Challenge Introduction (Host: Xiaonan Zhao)	15 mins
2:15-3:00pm	Oral Long Session III (Host: Qipin Chen)	45 mins
2:15-2:30pm	Temporally Streaming Audio-Visual Synchronization for Real-World Videos (in person) Presenter: Jordan Voas	15 mins
2:30-2:45pm	MambaTron: Efficient Cross-Modal Point Cloud Enhancement using Aggregate Selective State Space Modeling (in person) Presenter: Sai Tarun Inaganti	15 mins
2:45-3:00pm	Diffusion Prism: Enhancing Diversity and Morphology Consistency in Mask-to-Image Diffusion (in person) Presenter: Hao Wang	15 mins
3:00-3:45pm	Coffee Break	45 mins
3:45-4:20pm	Oral Short Session II (Host: Joe Liu)	35 mins
3:45-3:50pm	Mahalanobis k-NN: A Statistical Lens for Robust Point-Cloud Registrations (in person) Presenter: Tejas Anvekar	5 mins
3:50-3:55pm	Improving Human Pose-Conditioned Generation: Fine-tuning ControlNet Models with Reinforcement Learning (online) Presenter: Jeonghwan Lee	5 mins
3:55-4:00pm	PQD: POST-TRAINING QUANTIZATION FOR EFFICIENT DIFFUSION MODELS (online) Presenter: Jiaojiao Ye	5 mins
4:00-4:05pm	High-Fidelity 4x Neural Reconstruction of Real-time Path Traced Images (in person) Presenter: Zhiqiang Lao	5 mins
4:05-4:10pm	Confident Pseudo-labeled Diffusion Augmentation for Canine Cardiomegaly Detection (online) Presenter: Youshan Zhang	5 mins
4:10-4:15pm	Revealing Palimpsests with Latent Diffusion Models: A Generative Approach to Image Inpainting and Handwriting Reconstruction (in person) Presenter: Mahdi Champour	5 mins
4:15-4:20pm	LS-GAN: Human Motion Synthesis with Latent-space GANs (in person) Presenter: Avinash Amballa	5 mins
4:20-4:30pm	Closing Remarks (Host: Joe Liu)	10 mins
4:30-5:30pm	Poster Session	60 mins

Zoom Information for virtual presentations:

Topic: VAQ Workshop - WACV2025

Time: Feb 28, 2025 08:15 AM

Join Zoom Meeting

https://us06web.zoom.us/j/82700199161?pwd=SOAhhH2uWedGxcxvjn2vqRQD8tJmDW.1

Keynotes

Keynote Speaker: Kevin Bowyer

Title: "What makes a good quality face recognition training set?"

Abstract: This talk will start with comments on web-scraped, in-the-wild training sets. And possibly also on CVPR and FG reviewing. Then we will touch on the “identity problem” in training sets of images of persons who don’t exist. Lastly, we will describe an example of a training set with targeted synthetic enhancements as a means of training set augmentation to solve a specific problem. This talk should be entertaining and informative for a broad audience.

Bio: Kevin Bowyer is the Schubmehl-Prein Family Professor of Computer Science and Engineering at the University of Notre Dame. He is a Fellow of the AAAS, IEEE and IAPR, past EIC of the IEEE Transactions on Pattern Analysis and Machine Intelligence and the IEEE Transactions on Biometrics, Behavior, and Identity Science, recipient of a Technical Achievement Award from the IEEE Computer Society, and of the Meritorious Service Award and the Leadership Award from the IEEE Biometrics Council.

Submission Guidelines and Review Process:

Authors are encouraged to submit high-quality, original (i.e., not been previously published or accepted for publication in substantially similar form in any peer-reviewed venue including journal, conference or workshop) research.
All submissions should follow the same template as for the main WACV2025 conference. Please refer to the WACV Author Kit (Overleaf template, ZIP Archieve).
The main paper has an 8-page limit, references do not count toward this. There is no limit on the number of pages in the supplementary material. Only pdf files are accepted.
Unlike the main conference(WACV2025), the review process for this workshop has only one round, and is single-blind. Authors do not have to be anonymized when submitting their work.
Please submit your paper via this link: https://cmt3.research.microsoft.com/ImageQuality2025
Authors of accepted papers will be notified via email by: 5 January, 2025