The 2^nd EReL@MIR Workshop on Efficient Representation Learning for
Multimodal Information Retrieval

The 2nd EReL@MIR Workshop will be co-located with ACM Multimedia 2026 in Rio de Janeiro, Brazil (10–14 November 2026).

Multimodal information retrieval (MIR) underpins modern multimedia services across both industry and academia — from large-scale recommendation and e-commerce search to general-purpose web and conversational search systems. The rapid progress of large pre-trained foundation models for language and vision–language learning (e.g., Qwen, LLaVA, and CLIP) has significantly reshaped how multimodal representations are learned and transferred across tasks. These advances have enabled strong performance in MIR settings such as web search, cross-modal retrieval, and multimodal recommender systems.

Despite these gains, deploying large multimodal models in real-world MIR pipelines exposes a persistent gap between effectiveness and system feasibility. Production MIR systems must satisfy strict latency and throughput targets while controlling VRAM, storage, and end-to-end serving costs, yet efficiency is still under-reported and inconsistently evaluated in much of the literature. This limits fair comparison, reproducibility, and practical adoption at scale — motivating research on efficient adaptation, fusion, compression, indexing, and scalable serving for MIR.

Two emerging shifts have further expanded the MIR design space and sharpened the urgency of efficiency research. First, omni-modal models unify a broader set of modalities (text, image, audio, video) within a single framework, enabling richer interactions but exacerbating computational and serving constraints: their larger scale and prompt-dependent representations make embedding pre-computation and caching difficult. Second, generative MIR approaches that generate identifiers or structured candidates introduce new efficiency and reliability challenges, including decoding-time latency, output validity and controllability, and tokenization complexity.

We therefore propose the 2nd EReL@MIR workshop at ACM Multimedia 2026 as a timely venue to advance efficiency-aware multimodal representation learning and multimedia retrieval. The workshop complements the main MM 2026 program by strengthening core multimedia themes — including multimodal learning, vision–language modeling, multimedia retrieval, recommendation, and multimodal generation — through the lens of efficiency, deployability, and cost-aware system design. The workshop has four main goals: (i) promote research on efficiency-aware multimodal learning for large-scale multimedia applications; (ii) encourage unified metrics and benchmarks that jointly evaluate effectiveness and system cost; (iii) advance deployable solutions for real-world multimedia retrieval and generation systems; and (iv) stimulate discussion on emerging omni-modal and generative paradigms under realistic computational constraints.

Previous edition (2025) archive: https://erel-mir.github.io/2025/

Conference homepage: https://2026.acmmm.org/index.html

Call for Papers

We invite researchers to submit their latest work on efficient multimodal representation learning for multimodal information retrieval (MIR). Topics include, but are not limited to:

Efficient Multimodal Representation Adaptation based on Multimodal Foundation Models
Data-Efficiency in Multimodal Representation Learning
Efficient Multimodal Fusion for Representation Learning
Efficient Cross-Modality Interaction for MIR
Real-Time Inference for Multimodal Representations
Efficient MIR Foundation Models
Benchmarks and Metrics for Multimodal Representation Learning Efficiency
Efficient Omni-modal MIR
Efficient Multimodal Generative MIR

Submission Guidelines

Submissions of papers must be at least 4 pages and at most 8 pages (including figures, tables, proofs, appendixes, acknowledgments, and any content except references) in length, with unlimited pages for references. Submissions must be in English and in PDF format, using the ACM two-column conference format. Suitable templates are available from the ACM Website (use the “sigconf” proceedings template for LaTeX and the Interim Template for Word).

The review process will be done with our program committee. Selection depends on technical soundness and relevance to the workshop community. At least one author of each accepted paper must attend the workshop on-site and present their work.