Intro

As machine learning (ML) and artificial intelligence (AI) systems have garnered significant attention in recent years, concerns about their trustworthiness have surfaced. The main considerations include fairness, unbiasedness, personal data protection, transparency, and robustness. Researchers have addressed these issues from various angles, yet a persistent legal and regulatory challenge remains: how can we verify that a system truly meets these standards without forcing developers to reveal their proprietary data, such as training sets or model weights? From a legal scholar’s perspective, the question is how regulators, courts, and stakeholders can hold developers accountable under, for example, data protection or intellectual property laws when direct inspection of the model is not possible.

To understand one potential solution, it helps to first grasp some cryptographic and machine learning fundamentals. In basic terms, a machine learning model is trained on a dataset to learn patterns, resulting in “weights” that guide its predictions. These weights are often confidential trade secrets. Cryptography, particularly zero-knowledge proofs (ZKPs), offers methods to verify that a certain statement is true without revealing why it is true. Combining these ideas leads to zero-knowledge machine learning (zkML), where legal and regulatory goals – such as proving compliance – can potentially be achieved without disclosing sensitive details.

Trust Issues in ML Systems and Their Legal Context

Training a model can be costly, and the resulting weights are often considered the core proprietary asset of a company’s intellectual property. Regulating bodies (e.g., Data Protection Authorities under GDPR in the EU) face a dilemma: how to confirm that an ML system respects legal requirements if they cannot directly inspect it? Without direct access to the model’s inner workings, authorities must rely on the developer’s word, which creates a trust imbalance. For instance, a regulator may need to verify that no personal data from a specific individual were used in training (to comply with the right to erasure) or that a model did not rely on copyrighted content (to comply with intellectual property laws). Yet simply asking the developer for assurances is not enough from a legal standpoint – there needs to be verifiable evidence.

Private companies also face trust issues when using third-party ML services: how do they know that the advertised model is indeed the one being deployed, especially if different subscription tiers promise different performance or compliance guarantees? Even governments must demonstrate to the public that their own algorithms adhere to legal and ethical standards, a requirement that is becoming more pronounced in evolving AI governance frameworks.

Foundational Cryptographic Concepts and Their Relevance

To address these trust gaps, we can look to cryptography. Zero-knowledge proofs (ZKPs) are a cryptographic technique allowing one party (the prover) to prove a statement is true (e.g., “this model did not use data X”) to another party (the verifier) without revealing why or how that is true. When applied to ML, these proofs become zero-knowledge machine learning (zkML), enabling a party to prove certain properties about an ML model – such as compliance with specific legal standards – without exposing its training data or weights.

In practice, zkML might rely on specialized proofs called SNARKs (Succinct Non-Interactive Arguments of Knowledge), which can compress complex statements about ML models into compact, easily checkable evidence. While these techniques are still evolving, ongoing research aims to make them efficient and scalable enough for real-world ML models.

Concrete Use Cases: From Compliance to Verification

One practical application of zero-knowledge machine learning is the verification of performance claims made by developers. When a developer asserts that a model satisfies specific accuracy requirements set by regulation or by contract, cryptographic proofs can be used to demonstrate that the model reaches the stated level of performance on a known test dataset. Crucially, this can be done without disclosing the model’s internal parameters. Although this capability remains in a relatively early phase of development, initial research has shown that such claims can indeed be supported using cryptographic methods. Nevertheless, extending these techniques to very large and complex models continues to pose technical difficulties.

Another promising direction involves compliance with data protection law, especially within the framework of the General Data Protection Regulation (GDPR)[1]. When an individual invokes the right to erasure, the data controller must ensure that personal information is not only deleted from storage systems but also no longer exerts any influence on trained algorithms. Zero-knowledge techniques provide a basis for what is known as verifiable unlearning – the ability to demonstrate that an individual’s data has ceased to affect the outputs of a model, all without revealing the underlying model or dataset. While some foundational work exists in this area, the creation of efficient and reliable mechanisms for verifiable unlearning remains a goal that has not yet been fully realized.

A further use case concerns compliance with intellectual property law. If regulatory authorities or rights holders seek assurance that a model was not trained using copyrighted content, zero-knowledge methods could make it possible to demonstrate that no overlap exists between the training data and a defined set of protected material. In practical terms, however, the implementation of such proofs depends on two difficult questions: how to define what counts as copyrighted data, and how to construct reference datasets or detection systems that are both comprehensive and legally valid.

Finally, there is the matter of model authentication and service differentiation, which is particularly relevant to cloud-based AI platforms. A service provider may claim to deploy a model that has been reviewed or certified for legal compliance, but the customer has no direct means of confirming this. Here too, zero-knowledge proofs could serve to confirm that the model used for inference is indeed the one that underwent regulatory or contractual scrutiny. This form of cryptographic assurance offers an elegant solution to the problem of trust between service providers and clients. Early research prototypes suggest this is achievable, but broader commercial adoption has yet to occur.

Integrating Legal, Political, and Regulatory Perspectives Throughout

The political landscape also plays a crucial role. Mandatory zkML-based verification could face resistance from industry lobbies concerned about costs or the complexity of compliance. Policymakers must weigh the benefits of trustworthy, privacy-preserving evidence against potential burdens on innovation. As AI regulations evolve (e.g., the EU AI Act[2]), they may incorporate zkML-type proofs as a standard compliance tool, enabling regulators to demand cryptographic proof of adherence to law without forcing businesses to disclose their competitive secrets.

To be legally effective, these proofs must be admissible evidence. That means standards must be developed for what constitutes a valid zero-knowledge proof in a court or regulatory inquiry. This will involve collaboration among technologists, legal scholars, standards bodies, and government authorities. By establishing recognized criteria for zkML proofs, legal systems can integrate these cryptographic verifications seamlessly into enforcement processes.

Differentiating Present Reality from the Aspirational Future

It is important to distinguish what is currently achievable from what remains research-in-progress. While some basic zkML demonstrations exist, fully scalable, easily deployable systems are not yet widely implemented. The computational overhead of generating and verifying proofs is still high for large-scale models. Defining and detecting “undesirable” data types – such as personally identifiable or copyrighted content – remains challenging. Establishing trusted frameworks for data integrity and ensuring that proofs are tamper-resistant also requires more research and possibly complementary technologies like blockchain.

Yet the trend is promising. As efficiency improves and legal definitions clarify what must and must not be proved, zkML may transition from an experimental technique to a cornerstone of AI compliance frameworks.

Conclusion

These cryptographic techniques, though still maturing, could offer a new avenue for legal verification in the age of AI. By providing evidence of compliance without revealing sensitive details, zkML can help build trust among regulators, companies, and the public. As research evolves, what is now an aspirational technology could soon become a practical, legally recognized tool for ensuring that ML systems meet data protection, intellectual property, and other regulatory standards, all while preserving the confidentiality and competitiveness that drive innovation.

Suggested citation: Baumann Iago, zkML in support of regulatory compliance for AI systems, Blog of the LexTech Institute, 28 May 2025

[1] Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation), OJ L 119/1.

[2] Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act), OJ L, 2024/1689.

Author(s) of this blog post

Other publications

Master in Law student at the University of Neuchâtel and student-assistant at the LexTech Institute, with a particular interest in the legal implications of the digitization of society and in private international law