alignment
22 Dec 2021 - 17 Jun 2023
- A buzzword from recent AI that refers to the practice or goal of making an artificial agent's goals compatible with human ones. This is not an obviously stupid idea. I happen to think that the way rationalism and AI think about goals is kind of stupid (see orthogonality thesis) and as a consequence, the efforts at alignment seem mostly misguided to me.
- See also
- (update Aug 29th, 2022 ) holy shit there is a lot of activity in this area now: (My understanding of) What Everyone in Technical Alignment is Doing and Why - LessWrong
- (update Jan 26th, 2023 ) You can actually get a job as an alignment engineer, which seems to involve making LLMs more agentlike and then fretting about it. Evaluations project @ ARC is hiring a researcher and a webdev/engineer - AI Alignment Forum
- (update Mar 17th, 2023 ) Holy fuck gpt-4-system-card.pdf
- Human Compatible, a recent book by Stuart Russell, a respected and non-cultist AI researcher.
Sudden realization that "alignment" is just autistic for "compassion".
— Sport of Brahma (@SportOfBrahma) March 17, 2023Genuinely baffled no one has tried this pic.twitter.com/9BYpTA7XjB
— Connor McCormick (@connormcmk) January 23, 2023
