nswd

alignment

Infrared contact lenses allow people to see in the dark, even with their eyes closed — Because they’re transparent, users can see both infrared and visible light simultaneously, though infrared vision was enhanced when participants had their eyes closed

What is persuasion and how does it differ from coercion, indoctrination, and manipulation? Which persuasive strategies are effective, and which contexts are they effective in? The aim of persuasion is attitude change, but when does a persuasive strategy yield a rational change of attitude? When is it permissible to engage in rational persuasion? In this paper I address these questions, both in general and with reference to particular examples. The overall aims are (i) to sketch an integrated picture of the psychology, epistemology, and ethics of persuasion and (ii) to argue that there is often a tension between the aim we typically have as would-be persuaders, which is bringing about a rational change of mind, and the ethical constraints which partly distinguish persuasion from coercion, indoctrination, and manipulation.

In the field of artificial intelligence (AI), alignment aims to steer AI systems toward a person’s or group’s intended goals, preferences, or ethical principles. It is often challenging for AI designers to align an AI system because it is difficult for them to specify the full range of desired and undesired behaviors. […] It can be slow or infeasible for humans to evaluate complex AI behaviors in increasingly complex tasks. Such tasks include summarizing books, writing code without subtle bugs or security vulnerabilities, producing statements that are not merely convincing but also true, and predicting long-term outcomes such as the climate or the results of a policy decision. More generally, it can be difficult to evaluate AI that outperforms humans in a given domain. To provide feedback in hard-to-evaluate tasks, and to detect when the AI’s output is falsely convincing, humans need assistance or extensive time. Scalable oversight studies how to reduce the time and effort needed for supervision, and how to assist human supervisors. [Previously: On the conversational persuasiveness of GPT-4]

Anthropic’s newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system and give it sensitive information about the engineers responsible for the decision

Google is slowly giving Gemini more and more access to user data to ‘personalize’ your responses

everything is AI here





kerrrocket.svg