Internet-Draft AI Preferences May 2025
Keller & Thomson Expires 23 November 2025 [Page]
Workgroup:
AI Preferences
Internet-Draft:
draft-ietf-aipref-vocab-latest
Published:
Intended Status:
Standards Track
Expires:
Authors:
P. Keller
Open Future
M. Thomson, Ed.
Mozilla

A Vocabulary For Expressing AI Usage Preferences

Abstract

This document proposes a standardized vocabulary for expressing preferences related to how digital assets are used by automated processing systems. This vocabulary allows for the creation of structured declarations about restrictions or permissions for use of digital assets by such systems.

About This Document

This note is to be removed before publishing as an RFC.

The latest revision of this draft can be found at https://ietf-wg-aipref.github.io/drafts/draft-ietf-aipref-vocab.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/.

Discussion of this document takes place on the AI Preferences Working Group mailing list (mailto:ai-control@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/ai-control/. Subscribe at https://www.ietf.org/mailman/listinfo/ai-control/.

Source for this draft and an issue tracker can be found at https://github.com/ietf-wg-aipref/drafts.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 23 November 2025.

Table of Contents

1. Introduction

This document defines a common vocabulary of terms for automated systems that process digital assets. The primary purpose of this vocabulary is to enable machine-readable expressions of preferences about how digital assets are used by automated processing systems in the context of training AI models and other forms of automated processing.

The terms defined by the vocabulary can be used to describe, in a standardized way, the types of uses that a declaring party may wish to explicitly restrict or allow. Preferences are then expressed as a grant or denial of permission concerning each of the types of use defined in the vocabulary. This ensures that preferences can be communicated, processed, and stored in a consistent and interoperable manner.

The vocabulary or the preferences that might be expressed do not proscribe how automated processing systems obtain or act on preferences. Separate documents will describe how preferences might be associated with assets. It is designed to ensure that preference information can be exchanged between different systems and consistently understood.

The vocabulary is intended to work in contexts where such preferences result in legal obligations (such as rights reservations made by rightholders in jurisdictions with conditional TDM exceptions), and in contexts where this is not the case. It is without prejudice to applicable laws and the applicability of exceptions and limitations to copyright.

2. Conventions and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

This document uses the following terms:

Asset:

A digital file or stream of data, usually with associated metadata.

Declaring party:

The entity that expresses a preference with regards to an Asset.

3. Statements of Preference

The vocabulary is a set of categories, each of which is defined to cover a class of usage for assets. Section 4 defines these categories in more detail.

A statement of preference is made about an asset. Statements of preferences can assign preferences to each of the categories of use in the vocabulary. Preferences regarding each category can be expressed either to allow or disallow the usage associated with the category.

A statement of preferences can express preferences about some, all, or none of the categories from the vocabulary. This can mean that no preference is expressed for a given usage category.

Some categories describe a proper subset of the usages of other categories. A preference that is expressed for the more general category applies if no preference is expressed for the more specific category.

For example, the TDM category might be assigned a preference that allows the associated usage. In the absence of any statement of preference regarding the AI Training category, that usage would be also be allowed, as AI Training is a subset of the TDM category. In comparison, an explicit preference regarding AI Training might disallow that usage, while permitting other usage within the TDM category.

After processing a statement of preferences the recipient can assume that each category of use has a preference in one of three states: "allowed", "disallowed", or "unknown".

4. Vocabulary Definition

This section defines the categories of use in the vocabulary.

The figure below shows the relationship between these categories:

Opt-out vocabulary overview Text and Data Mining AI Training Generative AI Training [possibly]: additional use cases
Figure 1: NMS View of Device State

This list of specific use cases may be expanded in the future, should a consensus emerge between stakeholders, to include categories that address additional use cases as they emerge. In addition to these categories defined in the vocabulary, it is also expected that some systems implementing this vocabulary may extend this list with additional categories for their particular needs.

4.1. Text and Data Mining (TDM) Category

The act of using one or more assets in the context of any automated analytical technique aimed at analyzing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations.

The overarching TDM category is based on the definition of Text and Data Mining in Article 2(2) of [EUCD2019].

4.2. AI Training Category

The act of training machine learning models or artificial intelligence (AI).

The use of assets for AI Training is a proper subset of TDM usage.

4.3. Generative AI Training Category

The act of training General Purpose AI models that have the capacity to generate text, images or other forms of synthetic content, or the act of training other types of AI models that have the purpose of generating text, images or other forms of synthetic content.

The use of assets for Generative AI Training is a proper subset of AI Training usage.

5. Usage

The vocabulary is used by referencing the terms defined in the Section 4 section above, directly or via mappings, in accordance with how they are defined in this document.

5.1. More Specific Instructions

A recipient of a statement of preferences that follows this model might receive more specific instructions in two ways:

  • Extensions to the vocabulary might define more specific categories of usage. Preferences about more specific categories override those of any more general category.

  • Statements of preferences are general purpose, machine-readable statements that cannot override contractual agreements or more specific statements.

For instance, a statement of preferences might indicate that the use of an asset is disallowed for AI Training. If arrangements, such as contracts exist that explicitly permit the use of that asset, those arrangements likely apply, unless the terms of the arrangement explicitly say otherwise.

The vocabulary does not preclude the use of other specific categories. Any statement of preference based on this vocabulary shall not be interpreted as restricting the use of the work(s) strictly for the purpose of search and discovery as long as no restriction is declared through search-specific means such as [RFC9309].

5.2. Vocabulary Extensions

Systems referencing the vocabulary must not introduce additional categories that include existing categories defined in the vocabulary or otherwise include additional hierarchical relationships.

6. Security Considerations

TODO Security

7. IANA Considerations

This document has no IANA actions.

8. References

8.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
[RFC9309]
Koster, M., Illyes, G., Zeller, H., and L. Sassman, "Robots Exclusion Protocol", RFC 9309, DOI 10.17487/RFC9309, , <https://www.rfc-editor.org/rfc/rfc9309>.

8.2. Informative References

[EUCD2019]
European Union, "Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market", , <https://eur-lex.europa.eu/eli/dir/2019/790/oj>.

Acknowledgments

The following individuals have been involved in the drafting of the proposal:

Authors' Addresses

Paul Keller
Open Future
Martin Thomson (editor)
Mozilla