Security and Privacy of Machine Learning

On this page, we list papers about security and privacy of machine learning by CrySP members.

AI Generated Content Watermarking

  • Nils Lukas, Florian Kerschbaum: PTW: Pivotal Tuning Watermarking for Pre-Trained Image Generators. USENIX Security 2023. arXiv preprint

    This paper presents an improved watermarking scheme for AI generated images that can withstand strong attacks. It is based on an ML detector of the watermark.


  • Nils Lukas, Florian Kerschbaum: Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks against Deep Image Classification. arXiv preprint

    This paper presents a new defense against poisoning attacks that combines removal and detection after training. In combination, this defense improves robustness over previous work.

Privacy-Preserving Inference

  • Abdulrahman Diaa, Lucas Fenaux, Thomas Humphries, Marian Dietz, Faezeh Ebrahimianghazani, Bailey Kacsmar, Xinda Li, Nils Lukas, Rasoul Akhavan Mahdavi, Simon Oya, Ehsan Amjadian, Florian Kerschbaum: Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions. arXiv preprint

    This paper shows that it is possible to use high-degree polynomials to approximate activation functions when combined with proper loss functions during training and preserve accuracy. In turn, this allows to run inference using two-party computation much faster than previous work.

Model Provenance Verification

  • Nils Lukas, Edward Ye Duo Jiang, Xinda Li, Florian Kerschbaum: SoK: How Robust is Image Classification Deep Neural Network Watermarking? S&P (Oakland) 2022. arXiv preprint

    This paper conducts a systematic study of several deep neural network watermarking schemes and removal attacks from the literature. It concludes that each scheme can be removed by an adaptively chosen attack and derives a combined attack that removes all watermarks. We have generated a set of interactive graphics to browse the results available here and our source code is available here.

  • Masoumeh Shafieinejad, Jiaqi Wang, Nils Lukas, Xinda Li, Florian Kerschbaum: On the Robustness of the Backdoor-based Watermarking in Deep Neural Networks. ACM IH&MMSEC 2021. arXiv preprint

    This paper shows that the deep neural network watermarking scheme by Adi et al. (USENIX Security 2018) can be removed by simple model extraction attacks.

  • Tianhao Wang, Florian Kerschbaum: RIGA: Covert and Robust White-Box Watermarking of Deep Neural Networks. WWW 2021. arXiv preprint

    This paper removes the flaws identified in Uchida et al.'s scheme. White-box watermarking schemes are not designed to defend against model extraction attacks.

  • Nils Lukas, Yuxuan Zhang, Florian Kerschbaum: Deep Neural Network Fingerprinting by Conferrable Adversarial Examples. ICLR 2021. arXiv preprint

    This paper presents a new approach to defend against model extraction attacks. Our evaluation shows that it is successful against almost all attacks.

  • Buse Gul Atli, Yuxi Xia, Samuel Marchal, N. Asokan: WAFFLE: Watermarking in Federated Learning. SRDS 2021. arXiv preprint

    This paper explores how traditional whitebox model watermarking schemes can be adapted for the federated learning setting, where the model owner who wants to insert the watermark is not necessarily the same entity as the data owners who may be incentivized to remove the watermark. [source code]

  • Sebastian Szyller, Buse Gul Atli, Samuel Marchal, N. Asokan: DAWN: Dynamic Adversarial Watermarking of Neural Networks. MM 2021. arXiv preprint

    Given all whitebox watermarking schemes are ineffective against model extraction attacks, this paper proposes a way to deter model extraction by embedding a watermark at the inference interface. [source code]

  • Tianhao Wang, Florian Kerschbaum: Attacks on Digital Watermarks for Deep Neural Networks. ICASSP 2019.

    This paper shows that the first deep neural network watermarking scheme by Uchida et al. (ICMR 2017) is easy to detect and is easy to remove by overwriting it.

Model Extraction

  • Sebastian Szyller, Vasisht Duddu, Tommi Gröndahl, N. Asokan: Good Artists Copy, Great Artists Steal: Model Extraction Attacks Against Image Translation Generative Adversarial Networks, 2021. arXiv preprint

    This paper shows that real-world image-translation models, based on generative adversarial networks, can be successfully extracted by an attacker, thus adding to the different types of models vulnerable to model extraction.

  • Mika Juuti, Sebastian Szyller, Alexey Dmitrenko, Samuel Marchal, N. Asokan: PRADA: Protecting against DNN Model Stealing Attacks. Euro S&P 2019. arXiv preprint

    This paper shows that deep neural network models can be extracted more effectively than with previously known techniques, and presents a technique to detect model extraction using queries with synethetic data. [source code]

  • Buse Gul Atli, Sebastian Szyller, Mika Juuti, Samuel Marchal, N. Asokan: Extraction of Complex DNN Models: Real Threat or Boogeyman? AAAI-EDSMLS 2020. arXiv preprint

    This paper systematically reproduces a state-of-the-art model extraction technique by Orekondy et al (CVPR 2019) and shows that it can be detected within the assumed adversary model. It also revisits this adversary model to show that some of its assumptions are unrealistic but concludes that detecting a strong but realistic model extraction adversary is likely to be very difficult. [source code available on request]

Membership Inference Attacks

  • Thomas Humphries, Matthew Rafuse, Lindsey Tulloch, Simon Oya, Ian Goldberg, Urs Hengartner, Florian Kerschbaum: Differentially Private Learning Does Not Bound Membership Inference. IEEE CSF 2023. arXiv preprint

    This paper shows that the theoretical bound on membership inference attacks can be broken with unmodified attacks just given an "unfortunate" data set.

  • Jiaxiang Liu, Simon Oya, Florian Kerschbaum: Generalization Techniques Empirically Outperform Differential Privacy against Membership Inference. arXiv preprint

    This paper shows that simple regularization techniques and early stopping have a slightly better accuracy vs. privacy trade-off than differential privacy when privacy is measured as resistance to practical attacks.


  • Tommi Gröndahl, N. Asokan: Effective writing style imitation via combinatorial paraphrasing. PETS 2020. arXiv preprint

    This paper shows how author profiling and deanoymization techniques based on stylometry can be evaded by transforming text in ways that remove or alter relevant category-indicative features. [source code]

  • Mika Juuti, Buse Gul Atli, N. Asokan: Making targeted black-box evasion attacks effective and efficient. AISEC 2019. arXiv preprint

    This paper investigates how an agile adversary with black box knowledge can switch through different attack strategies in order to reduce the amount of queries for a targeted evasion attacks against deep neural networks.

  • Tommi Gröndahl, Luca Pajola, Mika Juuti, Mauro Conti and N. Asokan: All You Need is "Love": Evading Hate-speech Detection. AISEC 2018. arXiv preprint

    This paper systematically and empirically comparing state of the art hate-speech detection techniques to show that they can be evaded easily. [source code available on request]


  • Yihan Wu, Xinda Li, Florian Kerschbaum, Heng Huang, Hongyang Zhang: Towards Robust Dataset Learning. arXiv preprint

    This paper shows that robustness guarantees can be incorporated into a training data set, such that all models trained with that data set will be robust to adversarial examples.