Publications
2025
ICLR
International Conference on Learning Representations (2025)
Differentially Private Steering for Large Language Model Alignment
ICLR
International Conference on Learning Representations (2025)
Protecting Against Simultaneous Data Poisoning Attacks
ICLR
International Conference on Learning Representations (2025)
Accuracy on the Wrong Line: On the Pitfalls of Noisy Data for Out-of-Distribution Generalisation
AISTATS
Artificial Intelligence and Statistics (2025)
2024
Robust Mixture Learning when Outliers Overwhelm Small Groups
NeurIPS
Conference on Neural Information Processing Systems (2024)
What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
NeurIPS
Conference on Neural Information Processing Systems (2024)
Provable Privacy with Non-Private Pre-Processing
ICML
International Conference on Machine Learning (2024)
The Role of Learning Algorithms in Collective Action
ICML
International Conference on Machine Learning (2024)
On the Growth of Mistakes in Differentially Private Online Learning: A Lower Bound Perspective
COLT
Conference on Learning Theory (2024)