Time: 23 Dec, 14:00 - 17:30
Location: TBD
LLMs are now embedded in workflows that span languages, modalities, and tools. This raises safety challenges that outpace conventional "guardrails": jailbreaks and prompt injections, attributional safety failures under code-mixing, multimodal bypass via typography and icons, activation-level manipulation, and agentic risks from tool use.
This tutorial synthesizes the newest advances (2023–2025) and lays out open research questions around (i) failure modes in monolingual / multilingual / multimodal settings, (ii) training-time and inference-time defenses (rejection SFT, RLHF/RLAIF, decoding-time safety, parameter/activation steering), and (iii) evaluation and red-teaming pipelines balancing safety and utility.
We anchor the tutorial with recent results including our safety related papers published at top tier conferences, and connect them to emerging best practices from recent safety tutorials. The target audience is researchers/engineers with basic NLP knowledge who want the latest techniques and a research roadmap; format is half-day with short demos and Q&A.
Expected participants: 60–90, based on recent safety tutorials' attendance and the cross-disciplinary interest in multilingual and agentic safety.
We've compiled a comprehensive reading list of papers related to AI safety. If you have accepted papers in relevant areas, please submit a pull request to have them considered for inclusion in the tutorial.
If your paper has been accepted at a major conference or journal and relates to LLM safety, multilingual AI, multimodal security, or agentic AI risks, we invite you to submit it via pull request. The organizers will review all submissions, and accepted papers will be covered in this tutorial and cited in our materials. We especially welcome recent work (2023–2025) on multilingual safety, multimodal jailbreaks, and agentic AI risks.
@misc{banerjee2025beyondguardrails,
title={Beyond Guardrails: Advanced Safety for Large Language Models — Monolingual, Multilingual and Multimodal Frontiers},
author={Somnath Banerjee and Rima Hazra and Animesh Mukherjee},
year={2025},
note={Tutorial at Asia-Pacific Chapter of ACL 2025, Mumbai},
url={https://example.com},
}