Fang, R., Bindu, R., Gupta, A., & Kang, D. (2024). Llm agents can autonomously exploit one-day vulnerabilities (arXiv:2404.08144). arXiv. https://doi.org/10.48550/arXiv.2404.08144
Declan Grabb,D., Lamparth, M., Vasan, N. (2024) Risks from language models for automated mental healthcare: Ethics and structure for implementation. medRxiv. https://doi.org/10.1101/2024.04.07.24305462
Stranisci, M. A., & Hardmeier, C. (2025). What are they filtering out? A survey of filtering strategies for harm reduction in pretraining datasets (arXiv:2503.05721). arXiv. https://doi.org/10.48550/arXiv.2503.05721
Rahman-Jones, I. (2025, August 28). Hackers used Anthropic AI to “to commit large-scale theft.” BBC News; British Broadcasting Corporation. https://www.bbc.com/news/articles/crr24eqnnq9o