Asia-Pacific Chapter of ACL 2025 Tutorial

Beyond Guardrails: Advanced Safety for Large Language Models — Monolingual, Multilingual and Multimodal Frontiers

About this tutorial

Time: 23 Dec, 14:00 - 17:30

Location: TBD

LLMs are now embedded in workflows that span languages, modalities, and tools. This raises safety challenges that outpace conventional "guardrails": jailbreaks and prompt injections, attributional safety failures under code-mixing, multimodal bypass via typography and icons, activation-level manipulation, and agentic risks from tool use.

This tutorial synthesizes the newest advances (2023–2025) and lays out open research questions around (i) failure modes in monolingual / multilingual / multimodal settings, (ii) training-time and inference-time defenses (rejection SFT, RLHF/RLAIF, decoding-time safety, parameter/activation steering), and (iii) evaluation and red-teaming pipelines balancing safety and utility.

We anchor the tutorial with recent results including our safety related papers published at top tier conferences, and connect them to emerging best practices from recent safety tutorials. The target audience is researchers/engineers with basic NLP knowledge who want the latest techniques and a research roadmap; format is half-day with short demos and Q&A.

Target Audience

NLP researchers, applied scientists, industry professionals, and practitioners working with LLMs.
Red-teamers / robustness and security researchers.
Policy/governance professionals seeking a technical grounding in LLM safety.

Prior Knowledge

Transformers, fine-tuning, and basic evaluation; Python proficiency is helpful; prior experience in AI Safety is not required.
Optional but useful: familiarity with multilingual NLP and/or basic computer vision.

Schedule

Reading List

We've compiled a comprehensive reading list of papers related to AI safety. If you have accepted papers in relevant areas, please submit a pull request to have them considered for inclusion in the tutorial.

Submit Your Accepted Papers

If your paper has been accepted at a major conference or journal and relates to LLM safety, multilingual AI, multimodal security, or agentic AI risks, we invite you to submit it via pull request. The organizers will review all submissions, and accepted papers will be covered in this tutorial and cited in our materials. We especially welcome recent work (2023–2025) on multilingual safety, multimodal jailbreaks, and agentic AI risks.

BibTeX

@misc{banerjee2025beyondguardrails,
      title={Beyond Guardrails: Advanced Safety for Large Language Models — Monolingual, Multilingual and Multimodal Frontiers}, 
      author={Somnath Banerjee and Rima Hazra and Animesh Mukherjee},
      year={2025},
      note={Tutorial at Asia-Pacific Chapter of ACL 2025, Mumbai},
      url={https://example.com}, 
}