AI Preferences                                                 P. Keller
Internet-Draft                                               Open Future
Intended status: Standards Track                         M. Thomson, Ed.
Expires: 24 November 2025                                        Mozilla
                                                             23 May 2025


            A Vocabulary For Expressing AI Usage Preferences
                     draft-ietf-aipref-vocab-latest

Abstract

   This document proposes a standardized vocabulary for expressing
   preferences related to how digital assets are used by automated
   processing systems.  This vocabulary allows for the creation of
   structured declarations about restrictions or permissions for use of
   digital assets by such systems.  The vocabulary is agnostic to the
   means by which it is conveyed.  The definitions in the vocabulary
   facilitate a shared understanding between entities that express such
   preferences and those that use the associated digital assets.

About This Document

   This note is to be removed before publishing as an RFC.

   The latest revision of this draft can be found at https://ietf-wg-
   aipref.github.io/drafts/draft-ietf-aipref-vocab.html.  Status
   information for this document may be found at
   https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/.

   Discussion of this document takes place on the AI Preferences Working
   Group mailing list (mailto:ai-control@ietf.org), which is archived at
   https://mailarchive.ietf.org/arch/browse/ai-control/.  Subscribe at
   https://www.ietf.org/mailman/listinfo/ai-control/.

   Source for this draft and an issue tracker can be found at
   https://github.com/ietf-wg-aipref/drafts.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 24 November 2025.

Copyright Notice

   Copyright (c) 2025 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction
   2.  Conventions and Definitions
   3.  Statements of Preference
   4.  Vocabulary Definition
     4.1.  Text and Data Mining (TDM) Category
     4.2.  AI Training Category
     4.3.  Generative AI Training Category
   5.  Usage
     5.1.  More Specific Instructions
     5.2.  Vocabulary Extensions
   6.  Security Considerations
   7.  IANA Considerations
   8.  References
     8.1.  Normative References
     8.2.  Informative References
   Acknowledgments
   Authors' Addresses

1.  Introduction

   This document defines a common vocabulary of terms for automated
   systems that process digital assets.  The primary purpose of this
   vocabulary is to enable machine-readable expressions of preferences
   about how digital assets are used by automated processing systems, in
   the context of training AI models and other forms of text and data
   mining (TDM).

   The terms defined by the vocabulary can be used to describe, in a
   standardized way, the types of uses that a declaring party may wish
   to explicitly restrict or allow.  Preferences are then expressed as a
   grant or denial of permission concerning each of the types of use
   defined in the vocabulary.  This ensures that preferences can be
   communicated, processed, and stored in a consistent and interoperable
   manner.

   The vocabulary is neutral to the technical details of how systems act
   on preferences.  It is designed to ensure that preference information
   can be exchanged between different systems and consistently
   understood.

   The vocabulary is intended to govern the use of digital assets for
   the training of AI models and other forms of automated processing.
   It does not concern itself with the mechanisms involved in obtaining
   digital assets (i.e., crawling).

   The vocabulary is intended to be usable both where expressing
   preferences results in legal obligations and where there are no
   associated legal protections.  That is, preferences can be expressed
   to invoke specific protections, or they can be made without any
   presumption of specific legal consequences.  Potential legal
   obligations include rights reservations made by rightholders in
   jurisdictions with conditional exceptions on copyright protections.
   Expressing preferences is without prejudice to applicable laws,
   including the applicability of exceptions and limitations to
   copyright.

2.  Conventions and Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   This document uses the following terms:

   Asset:
      A digital file or stream of data, usually with associated
      metadata.
   Declaring party:
      The entity that expresses a preference with regards to an Asset.

3.  Statements of Preference

   The vocabulary is a set of categories, each of which is defined to
   cover a class of usage for assets.  Section 4 defines these
   categories in more detail.

   A statement of preference is made about an asset.  Statements of
   preferences can assign preferences to each of the categories of use
   in the vocabulary.  Preferences regarding each category can be
   expressed either to allow or disallow the usage associated with the
   category.

   A statement of preferences can express preferences about some, all,
   or none of the categories from the vocabulary.  This can mean that no
   preference is expressed for a given usage category.

   Some categories describe a proper subset of the usages of other
   categories.  A preference that is expressed for the more general
   category applies if no preference is expressed for the more specific
   category.

   For example, the TDM category might be assigned a preference that
   allows the associated usage.  In the absence of any statement of
   preference regarding the AI Training category, that usage would be
   also be allowed, as AI Training is a subset of the TDM category.  In
   comparison, an explicit preference regarding AI Training might
   disallow that usage, while permitting other usage within the TDM
   category.

   After processing a statement of preferences the recipient can assume
   that each category of use has a preference in one of three states:
   "allowed", "disallowed", or "unknown".

4.  Vocabulary Definition

   This section defines the categories of use in the vocabulary.

   Figure 1 shows the relationship between these categories:

    .-------------------------------------------------.
   |                                                   |
   |               Text and Data Mining                |
   |                                                   |
   |   .-------------------------------------------.   |
   |  |                .------------------------.   |  |
   |  |               |                          |  |  |
   |  |               |                          |  |  |
   |  |  AI Training  |  Generative AI Training  |  |  |
   |  |               |                          |  |  |
   |  |               |                          |  |  |
   |  |                '------------------------'   |  |
   |   '-------------------------------------------'   |
    '-------------------------------------------------'

              Figure 1: Relationship Between Categories of Use

   This list of specific use cases may be expanded in the future, should
   a consensus emerge between stakeholders, to include categories that
   address additional use cases as they emerge.  In addition to these
   categories defined in the vocabulary, it is also expected that some
   systems implementing this vocabulary may extend this list with
   additional categories for their particular needs.

4.1.  Text and Data Mining (TDM) Category

   The act of using one or more assets in the context of any automated
   analytical technique aimed at analyzing text and data in digital form
   in order to generate information which includes but is not limited to
   patterns, trends and correlations.

   The overarching TDM category is based on the definition of Text and
   Data Mining in Article 2(2) of [EUCD2019].

4.2.  AI Training Category

   The act of training machine learning models or artificial
   intelligence (AI).

   The use of assets for AI Training is a proper subset of TDM usage.

4.3.  Generative AI Training Category

   The act of training General Purpose AI models that have the capacity
   to generate text, images or other forms of synthetic content, or the
   act of training other types of AI models that have the purpose of
   generating text, images or other forms of synthetic content.

   The use of assets for Generative AI Training is a proper subset of AI
   Training usage.

5.  Usage

   The vocabulary is used by referencing the terms defined in the
   Section 4 section above, directly or via mappings, in accordance with
   how they are defined in this document.

5.1.  More Specific Instructions

   A recipient of a statement of preferences that follows this model
   might receive more specific instructions in two ways:

   *  Extensions to the vocabulary might define more specific categories
      of usage.  Preferences about more specific categories override
      those of any more general category.

   *  Statements of preferences are general purpose, machine-readable
      statements that cannot override contractual agreements or more
      specific statements.

   For instance, a statement of preferences might indicate that the use
   of an asset is disallowed for AI Training.  If arrangements, such as
   contracts exist that explicitly permit the use of that asset, those
   arrangements likely apply, unless the terms of the arrangement
   explicitly say otherwise.

   The vocabulary does not preclude the use of other specific
   categories.  Any statement of preference based on this vocabulary
   shall not be interpreted as restricting the use of the work(s)
   strictly for the purpose of search and discovery as long as no
   restriction is declared through search-specific means such as
   [RFC9309].

5.2.  Vocabulary Extensions

   Systems referencing the vocabulary must not introduce additional
   categories that include existing categories defined in the vocabulary
   or otherwise include additional hierarchical relationships.

6.  Security Considerations

   TODO Security

7.  IANA Considerations

   This document has no IANA actions.

8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

   [RFC9309]  Koster, M., Illyes, G., Zeller, H., and L. Sassman,
              "Robots Exclusion Protocol", RFC 9309,
              DOI 10.17487/RFC9309, September 2022,
              <https://www.rfc-editor.org/rfc/rfc9309>.

8.2.  Informative References

   [EUCD2019] European Union, "Directive (EU) 2019/790 of the European
              Parliament and of the Council of 17 April 2019 on
              copyright and related rights in the Digital Single
              Market", 17 May 2019,
              <https://eur-lex.europa.eu/eli/dir/2019/790/oj>.

Acknowledgments

   The following individuals have been involved in the drafting of the
   proposal:

   *  Cullen Miller, Spawing.ai

   *  Sebastian Posth, Liccium

   *  Leonard Rosenthol, Adobe

   *  Laurent Le Meur, EDRLab

Authors' Addresses

   Paul Keller
   Open Future
   Email: paul@openfuture.eu


   Martin Thomson (editor)
   Mozilla
   Email: mt@lowentropy.net