Header Image

EReL Logo The 2nd EReL@MIR Workshop on Efficient Representation Learning for
Multimodal Information Retrieval

The 2nd EReL@MIR Workshop will be co-located with ACM Multimedia 2026 in Rio de Janeiro, Brazil (10–14 November 2026).

Multimodal information retrieval (MIR) underpins modern multimedia services across both industry and academia — from large-scale recommendation and e-commerce search to general-purpose web and conversational search systems. The rapid progress of large pre-trained foundation models for language and vision–language learning (e.g., Qwen, LLaVA, and CLIP) has significantly reshaped how multimodal representations are learned and transferred across tasks. These advances have enabled strong performance in MIR settings such as web search, cross-modal retrieval, and multimodal recommender systems.

Despite these gains, deploying large multimodal models in real-world MIR pipelines exposes a persistent gap between effectiveness and system feasibility. Production MIR systems must satisfy strict latency and throughput targets while controlling VRAM, storage, and end-to-end serving costs, yet efficiency is still under-reported and inconsistently evaluated in much of the literature. This limits fair comparison, reproducibility, and practical adoption at scale — motivating research on efficient adaptation, fusion, compression, indexing, and scalable serving for MIR.

Two emerging shifts have further expanded the MIR design space and sharpened the urgency of efficiency research. First, omni-modal models unify a broader set of modalities (text, image, audio, video) within a single framework, enabling richer interactions but exacerbating computational and serving constraints: their larger scale and prompt-dependent representations make embedding pre-computation and caching difficult. Second, generative MIR approaches that generate identifiers or structured candidates introduce new efficiency and reliability challenges, including decoding-time latency, output validity and controllability, and tokenization complexity.

We therefore propose the 2nd EReL@MIR workshop at ACM Multimedia 2026 as a timely venue to advance efficiency-aware multimodal representation learning and multimedia retrieval. The workshop complements the main MM 2026 program by strengthening core multimedia themes — including multimodal learning, vision–language modeling, multimedia retrieval, recommendation, and multimodal generation — through the lens of efficiency, deployability, and cost-aware system design. The workshop has four main goals: (i) promote research on efficiency-aware multimodal learning for large-scale multimedia applications; (ii) encourage unified metrics and benchmarks that jointly evaluate effectiveness and system cost; (iii) advance deployable solutions for real-world multimedia retrieval and generation systems; and (iv) stimulate discussion on emerging omni-modal and generative paradigms under realistic computational constraints.

Previous edition (2025) archive: https://erel-mir.github.io/2025/

Conference homepage: https://2026.acmmm.org/index.html

Call for Papers

We invite researchers to submit their latest work on efficient multimodal representation learning for multimodal information retrieval (MIR). Topics include, but are not limited to:

  • Efficient Multimodal Representation Adaptation based on Multimodal Foundation Models
  • Data-Efficiency in Multimodal Representation Learning
  • Efficient Multimodal Fusion for Representation Learning
  • Efficient Cross-Modality Interaction for MIR
  • Real-Time Inference for Multimodal Representations
  • Efficient MIR Foundation Models
  • Benchmarks and Metrics for Multimodal Representation Learning Efficiency
  • Efficient Omni-modal MIR
  • Efficient Multimodal Generative MIR

Submission Guidelines

Submissions of papers must be at least 4 pages and at most 8 pages (including figures, tables, proofs, appendixes, acknowledgments, and any content except references) in length, with unlimited pages for references. Submissions must be in English and in PDF format, using the ACM two-column conference format. Suitable templates are available from the ACM Website (use the “sigconf” proceedings template for LaTeX and the Interim Template for Word).

The review process will be done with our program committee. Selection depends on technical soundness and relevance to the workshop community. At least one author of each accepted paper must attend the workshop on-site and present their work.

Important Dates

  • Contribution Submission: 16-July
  • Author Notification: 06-August
  • Camera-Ready: 20-August
  • Author Registration: 20-August

Organizers

  • Junchen Fu, University of Glasgow
  • Xuri Ge, Shandong University
  • Xin Xin, Shandong University
  • Alexandros Karatzoglou, Amazon
  • Ioannis Arapakis, Telefónica Scientific Research
  • Xi Wang, University of Sheffield
  • Qijiong Liu, The Hong Kong Polytechnic University
  • Qian Li, Beijing University of Posts and Telecommunications
  • Joemon M. Jose, University of Glasgow
Junchen Fu

Junchen Fu
University of Glasgow

Xuri Ge

Xuri Ge
Shandong University

Xin Xin

Xin Xin
Shandong University

Alexandros Karatzoglou

Alexandros Karatzoglou
Amazon

Ioannis Arapakis

Ioannis Arapakis
Telefonica Scientific Research

Xi Wang

Xi Wang
University of Sheffield

Qijiong Liu

Qijiong Liu
The Hong Kong Polytechnic University

Qian Li

Qian Li
Beijing University of Posts and Telecommunications

Joemon M. Jose

Joemon M. Jose
University of Glasgow

Program Committee

  • Jiayi Ji, National University of Singapore
  • Mingyue Cheng, University of Science and Technology of China
  • Ying Zhou, Shandong University
  • Hengchang Hu, National University of Singapore
  • Yumeng Wang, Leiden University
  • Fuhai Chen, Fuzhou University
  • Hui Li, Xiamen University
  • Songpei Xu, University of Glasgow
  • Zheng Yuan, The Hong Kong Polytechnic University
  • Jiaqi Zhang, University of Queensland

Contact

If you have any questions about EReL@MIR, contact: