AI Preferences Working Group Materials

AI Preferences - IETF 124 Minutes

Monday, 3 November

mnot: slides. updated milestone August 2026, still in question

Goal: go through feedback on definitions for:

We need certainty on these terms first, will get to other things if we can.

Suresh: Productive discussions need to rely on these as a foundation

Overview (Martin) slides

Mark: Happy to talk about the four terms. Order is based on “solidity”. Goal is to get solid things out of the way first

Foundation Models

Roberta: foundation models related to the article foundational agents that construct the LLM models? There is a 300 page article (https://arxiv.org/abs/2504.01990).

Martin: No not familiar to that

Roberta: how is search part of this conference? It seems to be unused

Mark: added search early on because needed a way to accept search for that application so that intent was clear. In discussion, some want more fine-grained search controls but original intent was.

Martin: understanding is exception would be necessary. Those building search use foundation models. Some are exclusively foundation models. Have been tweaked but are still foundation models

Bradley Silver: question on fine tuning. right now focus is on foundation models because they have the broadest foundation upon others are based. For some the focus will be on fine tuning, creating different apps on top of the foundation model. These seem distinct enough to warrant being dealt with separately. The value/use of the data used to fine tune could be very different. Eg: medical journals as a fine tuning to a more generic model. The focus on foundation model could become less important over time

Alissa: if we imagine the foundation model production category was an AI production category, would the fan of prophesies be the total span of everything anyone can do with AI? ie is this a case of input + output = everything (please corroborate)

Martin: I think it would cover a lot but it might not cover everything

Alissa: if the AI output category was the foundation model category, would THAT cover it all?

Martin: I don’t know that there would be things that could be called foundation model but would be called AI. Maybe this is different for everyone

Alissa: other entities have defined these terms, if we can reuse we should

Martin: all of the definitions agree except some specific things might disagree (eg: number of parameters). Most are very much in agreement. Self-supervised learning is different as it uses the term “probably”

Alissa: because defn’s are in flux, a set of use cases to test every time they change that cover all the subjects of discussion would be helpful.

Martin: we all have our own tests, it would be helpful, but it might be early

Mark: wiki page would work?

Alissa: yes

Ted: IANAL but lawyers shepherdize. Go back through precedents to look at what is cited in context. Manual search had citations that you then followed at great pain. I worry we are focused on how and not why somebody would express a preference. We are not shipping an org chart but we are close. Given we want to express a preference, what is the middle ground of those who create and those who identify with a preference. Many small creators for example if asked whether an expression could be possible using short crisp words.

Three broad categories: 1) citations back from interaction. 2) Asking for an answer. 3) I don’t want either 1 or 2 I want an interaction. That is a conversation that can be had in places like a fanfic community. This is AI preferences, think about the people that need to be expressing.

Shay: came up with a couple of definitions: historical search, where direction exists to original content. 2) pulling data to permanently modify a model 3) summarize but dont modify

Is there a broad category? Don’t have, there are other definitions that need to happen first

Eric: thoughts similar to last 2 comments - can we be quantitative about dimensional use - eg, a lot of elements are how much does data get aggregated and mixed in, vs how much is preserved. Foundation model building mixes much together and context is lost, difference is less about things being separate as how many things have been pulled together. Can we have a quantitative scale of 1:1 in/out or “aggregated according to this scale”.

Nick: I like the proposed changes - a way to help decide could be, AI crawlers need to know themselves & which category they fit into. Teams in companies who build small tools may not know where they fit in, may cause deeper analysis of who does what, but if crawlers can self-categorize that is a good metric

Roberta: when you say other ML techniques I think about foundation models too, to take a foundational model and see it as an LLM it is a way to see how agents will interact within a system. When you put that into perspective there are people in chat talking about singular models, which is broader if we are talking about deep learning we are missing some things, is this a common issue?

Mark: there is a broader category

Any other feedback? May not be worth polling. Ted’s suggestion could be fruitful. A large part of complexity is that use cases are broader than the search & chat context Ted covered

Farzaneh: we need to be clear that by publisher we don’t only mean those with protected content. Eg: social media/media may be the owner of the content

Mark: interpreting as we need attachment use cases

Farzaneh: yes needs to be transparency from declarant to those doing the declaring

Timid Zehta: want more information on articulating IP/copyright of the declaring party. Seems counterproductive for expressing preference. While many want expression because they are copyright holder, others want cost(? please corroborate) of the provider making the declaration. Tying back to other legal rights might reduce efficacy and duplicate controls/legislation that already exists

See the slides for 4.4 Search

Martin: “I think the previous search definition was better” (audience laughs)

Alissa: It is easier to consider without nesting

Martin: getting sense that nesting doesn’t make sense, according to Ted’s principle

Meredith: Search is the only category that lay users don’t understand. Users care why you are using a tool, not what tool you are using. Is this competitive & substitutional. Search def is good, I don’t think it matters what you use, imprecise defs is fine, they care if you create a book that competes in the market, not that you create one for yourself, even if the same tool is used

Martin: tere is a 2nd order effect - something used innocently is then used destructively.

Meredith: can you distinguish a children’s book creator from another use such as a disabled person creating a simplified/altered version.

Martin: good point, counterpoint earlier was open white models have no hope/control over what someone does downstream, no ability to differentiate between good/competitive uses Maybe you mean we shouldn’t try

Meredith: little bit

Mirja: people who think they have a preference and want to express may get something else, no way to match to expectations. What is better is to point out two features - reference & asset.

Martin: I disagree

Mark: you say these two constraints are not adequate?\

Mirja: they might be but they also may not be

Martin: providing the location is the very definition of search, we are trying to narrow things down to something that may not match what people are intending to do

Mallory: Instead of search could we define the broader category as derivative? ie a summary or a blurb. CMS or other things train web admins to build metadata for search, eg title, thumbnail. Search different from any other kind of derivative, not just webpage but also attached metadata. This is important special case but other things are there too where no metadata taxonomy is needed, there might be more intelligent ways to smash things together. Important to create a derivative def’n that includes search but has more

Bradley: the 2nd bullet (asset can only be represented) gets us in trouble. Also concerned about bootstrapping training into these categories. Puts a blindfold on because those expressing don’t know what kind of training will be required. This dilutes the power of making an informed preference. Biggest problem with nesting: search is about saying “yes”, everything else is right to say “no”. This creates lack of clarity. Should be brought to higher level so it is clear what prefs are expressed.

Timid Robot: search is good - evocative but too generic. Solution: add a modifier eg indexing search, citation search etc.

Alissa: comments are addressed by Krishna’s draft. Encourage us to bring those concepts in.

Martin: we struggled to do that, thought it could be “this plus”additional constraints. Were not sure it was in the charter but it is a reasonable thing to do. Eg: an excerpt could be the entire thing.

Alissa: fact is we can’t ignore it, not worth doing something super simple if it fails to work in the broader context of search

Nate: I run a small travel blog, agree it is important to talk about why this is even important. This is a simple definition, I could explain it to my travel blogger friends. Concern is that lines between search and other AI applications is blurred. W/o the 2nd bullet point the definition is problematic. The Why is: is there a fundamental fair exchange where enough people make it bak to the website. Mere existence of a reference isn’t enough because no user will click 91 times (for 91 pages of results).

Farzaneh: Krishna’s draft should be reconsidere. Need to go back and see what can be taken, not perfect but… I am a small operator but somebody else expresses prefs. I don’t know what they do on my behalf when I get busy.

Mark: have been working with Krishna’s draft, he has just opened new issues in the draft. If you see things that are missing from Krishna’s drafts let us know.

Suresh: One more data point: mapRG - how many folks ever change the defaults. Concern exists over why we do the right things

Vitoria: many expressing preferences would find some things missing. Trying to nest search within AI output is confusing, we should separate it out.

Martin: we have overwhelming feedback, consider the separation done.

Timid Robot: asking for more clarity from Victoria, is unintended consequences more about the unlimiting

Victoria: all these definitions sweep up a lot but search tools now do not include verbatim but summarize helpfully for some users more than other. It isn’t just about search.

AI Output

Mark: this is new post-Zurich but we want any reflection we can get

Erin: the idea was to focus only on output, use of clients re-introduces the internals that we tried to exclude. There has to be a way to strongly define this as an external boundary not just internals talking together

Martin: we have a problem with agentic browsing in this context, how do you tell it is a human on the other end.

Erin: clients can be software too

Martin: does the entity that talks to the human have a mandate to protect what happens downstream? From the outside you seem to have a monolithic thing, but systems take a tiny piece and solve, potentially across administrative domains, which might be integrated etc. If one of those components takes in an asset that relies 3 levels downstream, how do those components fulfill the promise.

Erin: if you are going to express a pref about end state output, nobody should be trying to dictate actions of the internals

EKR: if I have a procedure, have a corpus of data. Spit out 50 exemplars & user hates it. Now foundation model is on, AI output is off. Is this legal. I don’t think org boundary is right - create a foundation model that generates code (true vibe coder). Is that AI output?

Roberta: the idea is the agents talk to each other, no human interaction in the middle. No visibility for the last node. Think it is a design issue. Building systems where AI is a human in the system. Not covering it such that opt out is clear, yet also not stopping progression of data. Hard to have all 3 groups agreeing at the same level of control

Elaine: this is written to be broad. Can someone explain intention. Don’t understand 3rd paragraph.

I think this is fairly new, if you have suggestions please send.

Martin: Erin might be better person to define intent

Elaine: Hard to offer suggestions when I don’t understand intent

Martin: happy to poke on specific concerns.

Mark: good feedback, becoming clear more work is needed

Elaine: if we delete that sentence, who screams, what would be missing.

Alissa: Have read 30 times but don’t know what it means. Having use cases helps. If there is no use case where a preference expressed doesn’t need the organizational boundary - eg if SaaS trains a model then the model is used, maybe there is no need to distinguish between the model training and the output of the model.

Martin: how would you test the difference between respecting a preference and not?

Alissa: how does it happen today?

Martin: there is a simple test - did this crawler request this resource

Alissa: translated to the real world there is a test - did my stuff end up in that output

Martin: in very specific circumstances can we test whether this expression was respected or not?

Alissa: everyone will make their own judgements.

Martin: but as a systems builder you need to now where the line sits. As an expresser you should be able to draw that line. Fundamental question: ‘in the generation of outputs to the system’ in 4.3 AI output: in this definition you could use this system to do anything else other than generating outputs, is that ok?

Ted: definitely more confused now than earlier. What changes if I take AI out of this? If I have prefs about how my prices are used, I don’t (might?) care about whether somebody builds a price comparator, if you build a weather system where my expression allows an aggregation of weather data, that is meaningful. All we can do is write what the preferences meant to those that expressed them, and then allow systems to try to interpret.

Mark: maybe we need to re-charter based on this

Martin: Bradley Silver’s draft is clear about intent, we haven’t had much discussion (ed: this is important but I didn’t understand context, please flesh out)

Elaine: do we need something about model training in AI output?

Martin: right now everything is in jeopardy, but that is a good point. Concern is if we don’t allow some training, but without search maybe it doesn’t matter.

Meredith: a lot of unintended consequences come back to substantive use may solve some issues but create downstream problems

Roberta: One example for AI prefs: if we are trying to get maximal data about medication information and provide to users. Whole system will have training etc, but medication might not be open to market yet still be crawled, information could put user at risk, they might know that the content was trained by an AI etc. Even internally, those models are classified as high risk. Behavior of someone who ignores preferences must be considered. Lots of crawlers ignore robots.txt. Is liability/enforcement out of scope?

Victoria: +1 to unintended consequences, not sure we adequately value all the use cases, or cost vs. benefit. That is not a task for standards bodies, that is policy etc.

Martin: section 3.2 might address your concerns.

Victoria: does address to some extent, but will still have impacts.

EKR: you suggested 3rd paragraph, the undesirable consequence: even if you stipulate this is nested, there are other types of training. Not saying it can’t be fixed but you will have trouble

Lila: I agree with those who advocate layperson understandability. Will try to come up with language. Also concerned about substitutive use, but competition is important for society, need to think the balance of tools give vs. narrowing what people can do on the web. Section 3.2 very important - need to maintain that there are good reasons not to follow the standards. Flexibility is important, more risk averse entities may not take advantage so concerns remain, especially for those needing accessibility.

Wednesday, 5 November

(Scribe: Christopher Patton)

Discussion on top-level category

Chair summary