The 1st International Workshop and Challenge on Disentangled Representation Learning for Controllable Generation

ICCV 2025 DRL4Real Workshop

01 Overview

Disentangled Representation Learning for Controllable Generation (DRL4Real)

Disentangled Representation Learning shows promise for enhancing AI's fundamental understanding of the world, potentially addressing hallucination issues in language models and improving controllability in generative systems. Despite significant academic interest, DRL research remains confined to synthetic scenarios due to a lack of realistic benchmarks and unified evaluation metrics.

ICCV 2025 DRL4Real Workshop aims to bridge this gap by introducing novel, realistic datasets and comprehensive benchmarks for evaluating DRL methods in practical applications. We will focus on key areas including controllable generation and autonomous driving, exploring how DRL can advance model robustness, interpretability, and generalization capabilities.

02 Dataset (used for the competition)

The DRL4Real dataset is specifically designed for disentangled representation learning and controllable generation, covering multiple realistic scenarios and diverse disentanglement factors. It consists of two main categories aligned with our competition tracks:

Dataset Overview

  • General Image Subset
  • It Contains approximately 14,000 images across 13 distinct categories: car, cat, cattle, dog, eagle, facial, flower, fruit, horse, pigeon, tree, vegetable, plus time-lapse photography. Images in this subset are partially synthesized using text-to-image generative models, while others are derived from controllable generation video sources or captured from real videos.
    • Single-factor variations: Our dataset provides both text descriptions and corresponding images showing factor changes.
    • Multi-factor variations: We primarily provide corresponding text descriptions with a limited number of images as examples.

    Single-Factor Manipulation Examples

    Here are demonstration videos showing examples of single-factor manipulation:

    Car Color Manipulation
    Hair Color Manipulation
    Rose Color Manipulation

    Multi-Factor Manipulation Examples

    Here are demonstration images showing an example of multi-factor manipulation (vase appearance and hand position):

    Flowers in Vase
    Flowers in Vase
    Flowers in Hand
    Flowers in Hand
    Vase in Hand
    Vase in Hand

  • Autonomous Driving Subset
  • A vehicle-centered dataset featuring 5 camera perspectives:
    • 4 outward-facing cameras positioned around the vehicle
    • 1 top-down overview camera

    Each sample consists of 100 frames extracted from videos captured by these 5 cameras, with simultaneous variations in 8 key factors: vehicle speed (accelerate, constant, decelerate), weather conditions (clear, fog, light rain), lighting, and other environmental factors. The dataset includes 3 distinct vehicle types and more than 5 background scenarios, systematically generated using the CARLA simulator, totaling 20,000 high-quality images.

    Autonomous Driving Example

    Here is a demonstration video showing an example from the autonomous driving dataset:

    Autonomous Driving Scene

Dataset Download

Access our comprehensive dataset for disentangled representation learning and controllable generation:

Download DRL4Real Dataset

03 Challenge

Our competition is designed to advance the field of disentangled representation learning through realistic benchmarks and comprehensive evaluation metrics. The challenge is divided into two main tracks:

Competition Tracks

1. Single-Factor Track

This track provides over 13,000 images covering scenarios mentioned in the dataset and corresponding text descriptions. Participants must use disentanglement models for this track, such as β-VAE, GEM, StyleGAN, EncDiff, CL-Dis, STA.

Evaluation Metrics
  • Image Quality (50%): Measured using Fréchet Inception Distance (FID)
  • Latent Variable Disentanglement (50%): Evaluated using Bidirectional DCI, which extends the traditional DCI metric by also measuring how perturbations in real images affect the corresponding latent variables in the encoder

For more detailed technical information about the metrics, please see this PDF (It will be released soon)

2. Multi-Factor Track

This track is divided into two sub-categories, each with two separate leaderboards:

2.1 General Image Dataset
  • Overall Leaderboard: Open to all methods. Evaluation uses LLM to assess:
    • Evaluate intensity scoring across a set of images
    • Ratio of intended attribute changes to unintended changes (closer to 1 is better)
  • Disentanglement Leaderboard: Only for disentanglement models, using the same metrics as the Single-Factor Track
2.2 Autonomous Dataset
  • Overall Leaderboard: Open to all methods. Evaluation uses Structural Similarity Index Measure (SSIM) to compare reconstructed frames with ground truth
  • Disentanglement Leaderboard: Only for disentanglement models, using the same metrics as the Single-Factor Track

For more detailed technical information about the metrics, please see this PDF (It will be released soon)

Prizes for Winning Teams:

  • First Place: $800 USD (or equivalent prizes)
  • Second Place: $400 USD (or equivalent prizes)
  • Third Place: $200 USD (or equivalent prizes)

Winners will be invited to submit challenge papers to the DRL4Real 2025 Workshop. Accepted papers will be published in the ICCV 2025 Workshops Proceedings.

Important Dates

  • May 31, 23:59:59 AOE: Workshop Announcement and Training & Validation Dataset Release
  • June 23, 23:59:59 AOE: Final Test Dataset Release
  • June 28, 23:59:59 AOE: Workshop Paper Deadline
  • June 30, 23:59:59 AOE: Competition Submission Deadline
  • July 3, 23:59:59 AOE: Notification of Competition Results
  • July 7, 23:59:59 AOE: Challenge Paper Deadline
  • July 11, 23:59:59 AOE: Notification of Paper Acceptance
  • August 15, 23:59:59 AOE: Camera Ready Paper Deadline
  • October 19: Workshop and Competition Results Presentation at ICCV 2025

For more details about the challenge, please visit the Challenge page.

04 Call for Papers

The DRL4Real Workshop aims to bring together researchers and practitioners from academia and industry to discuss and explore the latest trends, challenges, and innovations in disentangled representation learning and controllable generation. We welcome original research contributions addressing, but not limited to, the following topics:

  • Disentangled representation learning
  • Controllable image and video generation
  • Scene understanding and generation in autonomous driving
  • Multi-modal disentanglement (image, video, text)
  • Disentanglement in generative models (GANs, VAEs, Diffusion Models)
  • Applications of disentangled representations in real scensarios
  • Few-shot and zero-shot learning with disentangled representations
  • Integration of large language models for controllable generation

Submission Details

Papers will be peer-reviewed and comply with the ICCV 2025 proceedings style, format and length. The camera-ready deadline aligns with the main conference. Accepted papers must be registered and presented to ensure their inclusion in the IEEE Xplore Library. For details, refer to the ICCV 2025 Author Guidelines.

05 Invited Speakers

Coming Soon

06 People

Organizers

Xin Jin

Xin Jin

Assistant Professor

Eastern Institute of Technology, Ningbo, China

jinxin@eitech.edu.cn

Qiuyu Chen

Qiuyu Chen

Ph.D. Candidate

Eastern Institute of Technology & Shanghai Jiao Tong University

qychen@eitech.edu.cn

Yue Song

Yue Song

Research Associate

California Institute of Technology (Caltech), USA

yue.song@unitn.it

Xihui Liu

Xihui Liu

Assistant Professor

The University of Hong Kong, Hong Kong, China

xihuiliu@eee.hku.hk

Shuai Yang

Shuai Yang

Assistant Professor

Peking University, Beijing, China

williamyang@pku.edu.cn

Tao Yang

Tao Yang

Ph.D.

Xi'an Jiaotong University, Xi'an, China

yt14212@stu.xjtu.edu.cn

Ziqiang Li

Ziqiang Li

Ph.D. Candidate

Eastern Institute of Technology & Shanghai Jiao Tong University

zqli@eitech.edu.cn

Jianguo Huang

Jianguo Huang

Ph.D. Candidate

Eastern Institute of Technology, Ningbo, China

huangjgtmyz@gmail.com

Yuntao Wei

Yuntao Wei

Ph.D. Candidate

Eastern Institute of Technology & The Hong Kong Polytechnic University

ytwei@eitech.edu.cn

Baao Xie

Ba'ao Xie

Research Associate

Institute of Digital Twin, EIT, Ningbo, China

bxie@idt.eitech.edu.cn

Nicu Sebe

Nicu Sebe

Professor

University of Trento, Italy

niculae.sebe@unitn.it

Wenjun (Kevin) Zeng

Wenjun (Kevin) Zeng

Professor

Eastern Institute of Technology, Ningbo, China

kevzeng@eitech.edu.cn

Assistance

Zhenyu Hu

Research Engineer

Institute of Digital Twin, EIT, Ningbo, China

zhu@idt.eitech.edu.cn

Shengyang Zhao

Research Associate

Institute of Digital Twin, EIT, Ningbo, China

szhao@idt.eitech.edu.cn

Xin Li

Postdoc Research Fellow

University of Science and Technology of China, Hefei, China

xin.li@ustc.edu.cn

Jinming Liu

Ph.D. Candidate

Eastern Institute of Technology & Shanghai Jiao Tong University

jmliu206@sjtu.edu.cn

Qi Wang

Ph.D. Candidate

Eastern Institute of Technology & Shanghai Jiao Tong University

qiwang067@sjtu.edu.cn

Yunnan Wang

Ph.D. Candidate

Eastern Institute of Technology & Shanghai Jiao Tong University

wangyunnan@sjtu.edu.cn

Wenyao Zhang

Ph.D. Candidate

Eastern Institute of Technology & Shanghai Jiao Tong University

wy_zhang@sjtu.edu.cn

Yunzhong Zheng

Undergraduate

New York University, New York, USA

yz9712@nyu.edu

Contact

Xin Jin

jinxin@eitech.edu.cn

Assistant Professor, Eastern Institute of Technology, Ningbo, China

Qiuyu Chen

canghaimeng@sjtu.edu.cn

Ph.D. Candidate, Eastern Institute of Technology, Ningbo, China