A Ladder of Creativity: What Jazz Piano Taught Me about Generative AI
“Lesser artists borrow. Great artists steal.” — attributed to Igor Stravinsky
I never intended to become a computer scientist or a jazz pianist. But I think I was drawn to computer science and jazz piano for similar reasons: both of these exercised my creative skills. Whether I was thinking about how to make a system scalable or maintainable, or figuring out what to play next in my solo, I was challenged to find elegant solutions to new problems. There’s no right answer, but there sure are plenty of wrong ones.
I’ve had a lot of time to think about what it means for me to be “creative.” It’s still a bit surreal to me we have models that aren’t just seemingly capable of creative work, but also everywhere: ChatGPT feels like it’s always just one click away. But is it really being “creative?” Can a large language model (LLM) ever be creative?
It’s often said that “lesser artists borrow, greater artists steal.” So I’d like to propose — very informally — a ladder of creativity, based on my own experience as a classical and jazz pianist: (1) memorization, (2) borrowing, and (3) stealing. The first stage is memorization: the ability to reproduce, verbatim, something that follows exactly-specified instructions. Then, comes “borrowing:” the ability to reference, “sample,” or draw from past experiences fluently. And lastly, we have “stealing:” the ability to take borrowed ideas and make them yours, by synthesizing something that feels original. Let’s talk about these rungs, and where today’s models might sit.
I’ll admit this isn’t an empirically testable definition of creativity (yet), but I still think this exercise can give us insight. We’ll start with one of the main languages of piano: sheet music.
On Memorization
I started out, like many other pianists, with classical training. In classical piano you are expected to memorize your entire repertoire: an ever-growing collection of pieces. Every piece one can expect to learn is usually written down in some form that dictates how you should play the piece. When you “memorize” a piece, for starters, you recall every note from memory. You have to also think about how you approach every note to create something nice to listen to, which we call the “interpretation.” These choices are sometimes quite constrained based on the sheet music.
Figure 1: Sheet music for Nocturne, Op. 9 No. 2, F. Chopin. Published by Leipzig: Fr. Kistner (1880).
To see what I mean, I’ve placed a piece of music representative of what a classical pianist generally uses to learn music (Figure 1). This paper encodes a list of instructions for the pianist to execute. Each group of double lines — one for each hand — is filled with a set of notes: those little oval things with lines attached to them. The x-axis is time, and the y-axis is pitch, so you can tell when to press a key. The shape of the note’s “stem,”1 whether the “oval” is filled, and how many dots are next to the note tells you the note’s duration. Places where you don’t play anything are marked with special symbols called “rests.”
We’re missing many more pieces. A couple important ones are dynamics, or how loudly/softly to play, and tempo, or how “fast” the piece is. For historical reasons, these are generally written as vibes in Italian, ranging from “super duper soft” (pianississimo, abbreviated “ppp”) to “super duper loud” (fortississimo, abbreviated “fff”). Tempo markings are similarly vibey: we see “walking speed” in the above piece (Andante), i.e., “slow-ish,” “upbeat” (Allegro), “upbeat-ish” (Allegretto), or the refreshingly objective beats per minute. This is to say nothing of articulations like accents (“>”, emphasize note relative to others at current dynamic level), staccato markings (“.”, detach notes), or slurs (curved lines over multiple notes; no gaps in sound between notes), or keyboard forearm clusters.2
This is a lot of direction, and I’ve still left out many moving parts for simplicity. There is absolutely room for creativity in interpreting a piece, but all this direction not sends a message that if you execute all of the written instructions verbatim, you’ve completed the first step in making music. In other words, the first benchmark of mastery is your competence in someone else's ideas.
Are LLMs the same? One step in building LLMs is training them to predict the next word. In some cases, LLMs produce excerpts from books or articles verbatim — pure memorization — but more generally, LLMs are trained to learn what makes text statistically plausible, which isn’t necessarily creative. This setup rewards competence in reproducing existing distributions of text.
But I do want to caution against a false dichotomy between creativity, which feels like a philosophical term, and likelihood, a statistical notion. We’d never say that a classical musician simply regurgitates notes, and it would be unfair to say that LLMs regurgitate text verbatim. It feels like there should be something more than “just the words” when we think about LLM “creativity,” just like there’s something more than “just playing the notes” in classical music.
An alternative here is to entertain "performance" as a type of "inference.” Interpreting a piece of music would be like conditional generation, where we have a program of set pieces prepared by a musician for, say, a recital. Much like sheet music, there’s something vaguely algorithmic about this: the classical musician receives a “query” from their program (a piece). Conditioned on the piece, the musician recalls a well-practiced interpretation of the requested piece.
This feels a little closer, but I find this uninspiring: no one pays money to go see a player piano at the Lincoln Center.3 And few pay for LLM subscriptions just to produce facts or passages from memory. Clearly, there’s something more than invariable rote memorization at play in LLM outputs, just like there’s something more than “just playing the notes” in classical music.
But what is that “something?” Let’s see if we can learn something from jazz piano.
On Borrowing
A core tenet of jazz piano is improvisation. You are not expected to — and are discouraged from — memorizing every note of your performance, because there’s some expectation that you’re flexing some creative muscles in the moment. But there’s still some structure: instead of sheet music like classical piano, jazz piano uses lead sheets. I’ve put one below:
Figure 2: A jazz lead sheet for “All of Me.”
The jazz lead sheet (Figure 2) looks sparser, which is nice. Less instructions, easier to play, right?4 And, there’s only one set of lines, most of us have two hands — there’s some redundancy built-in. Lastly, the music is in what looks like the musical equivalent of Comic Sans. This doesn’t affect what you’re supposed to play, but it does give the music a delightfully informal air, as if to say: “Welcome! It’s okay if you don’t play this right! The only rule is to have fun! :D”
Sadly, jazz piano is not so simple. Let’s see what’s the same: the x-axis is still time, and the y-axis is pitch. In-filling of the notes and the stem shape still means length. There’s not much articulation, so that’s one less thing to worry about. Tempo markings are in English vibes rather than Italian vibes and tell you whether to “swing” (we’ll get back to that). There’s only one “row” of notes per line, but your free hand still has a job: that’s dictated by the chord symbols, or letters and exponent-looking numbers. In general, the left hand accompanies (“comp”) the right hand with some non-repeating permutation of notes belonging to the chord.5
To my classically-minded brain, when I was starting jazz, comping felt like solving competition math problems while playing catch, except the ball comes at you faster if your solution isn’t elegant enough. Fun!
Notation-wise, the big letter is the “root” (lowest note) of the chord, and the numbers are the positive y-offset of the highest note (5 if unspecified) from the root.6 Once you figure out which combination of notes to play, you have to move your hand. On average, you have a little under a second to do all of this mentally and physically, depending on the tempo and the pace of the chord changes, so the goal is to chain together an easy-to-play sequence of note-permutations that still sounds good. Once you’ve managed to get through the main tune, the song “loops,” and the chords repeat, while you solo, or play whatever melody you want, in the right hand, while continuing to comp with your left hand (or both hands if it’s someone else’s turn to solo).
So, even with seemingly fewer constraints than classical music, being creative isn’t trivial. Rules float in the background, but you’re deprived of the signposting afforded by classical piano. No, this problem is terrifyingly underdetermined: there's a universe of variations of the song consistent with the sheet music.7 No structure — pure entropy — gets you very unique solo, but I doubt many would enjoy this.8 Too much, and you risk receiving the underhanded compliment: “You must be a great classical pianist!”
So, how does one improve? You can practice tunes and various scales, but for me, the most helpful step was a bit of “imitation learning:” you look at experts doing “the right thing” and copy them verbatim. So, every day on my way to and back from the lab, I’d listen to Bill Evans and Oscar Peterson in the cat. Honestly, I didn’t set out to do imitation learning — I just wanted to listen to jazz. But a riff would catch my ear and I’d feel compelled to go home to learn it, note-for-note. Slowly, my jazz sense improved, and I was able to borrow these snippets of legendary jazz solos into my own playing on demand. Yet something still wasn’t there. Why wasn’t I playing “jazz?”
I think LLMs today stand at a similar creative plateau. Just like a professional musician has been exposed to a plethora of musical ideas, LLMs have been trained using massive textual corpora, which have provided these models with the building blocks for solving a variety of math problems, following instructions, writing code — even writing composition and editing. They’re able to even recombine these building blocks in novel ways and surpass the capabilities of most humans at something. There’s clearly some prior over “quality” that LLMs have successfully internalized.9
But building blocks alone aren’t enough. The coolest riffs and flashiest scales won’t self-assemble into a great jazz solo: reaching the proverbial mountaintop isn’t merely a matter of technical perfection or novelty. And neither do I believe that the textual building blocks of LLMs automatically yield creativity. Yes, LLMs can produce original texts and write coherent stories given a plot summary. It even has style — so much so that certain phrasings and vocabulary are typecast at "ChatGPT-generated.” This gives LLM-generated text its distinctive syntactical stench: word-by-word, each response is different, yet there’s an inescapable sameness to it all.
So — what’s the secret ingredient?
On stealing
Stealing is the act of taking possession of something that is not one's own. Here, I mean stealing in an artistic sense: when every group of notes seems to have some name, how can we make music sound original?10
In computer science, a Markov chain is a process defined over a set of “states.” The next “state” is solely dependent on the current state. Think of basic autocomplete: the next words are suggested based on the last word typed. If you’ve ever tried typing messages by mindlessly tapping the suggested words — you might’ve noticed how the text quickly becomes slop. It’s English-like, but not really meaningful. We can imagine a jazz pianist using the same principle: the “black-box jazz pianist.” The black-box jazz pianist studied thousands of riffs and can perfectly recall them. When a new chord progression comes, they pick a riff or turnaround mindlessly, at random, just like a Markov chain (Figure 3).
Figure 3: An example of a Markov chain with transition probabilities between different jazz riffs describing a black-box jazz pianist that chooses things to play at random that sound “locally” good, but weird to listen to in context. Riff/lick names auto-generated by ChatGPT-4o — I’m trying to make a point, not suggest a real generative model of a jazz solo.
I wish I was that good at jazz piano, but maybe I should be thankful: this type of pianist produces a flashy, but musically empty solo.11 Even with perfect technique, you either end up with awkward stilted music, stitched roughly riff to riff, or a wall of sound reminiscent of dial-up Internet.12 It’s the musical equivalent of hitting random autocomplete buttons on your phone. Flash without direction, no matter how well-prepared or memorized, is nothing artistic.
However, good solos are often composed of pre-existing riffs, well-known musical motifs, or even quotes from other songs. Books like Elements of the Jazz Language are dotted with annotated solos for study, not to mention the advent of YouTube transcriptions. While both a good jazz pianist and a black-box jazz pianist have a massive knowledge base of musical ideas, they’re clearly making different choices when putting them together.
I think this is what Stravinsky alluded to when he said that great artists steal (the good merely borrow). Nothing is new under the sun: many ideas, motifs, and musical riffs have already been played. But jazz piano is not about the notes you play (or don’t play) — it is about how you guide these ideas to create something new, yet pleasing: not mimicry, not randomness, but synthesis. People go to concerts to listen to the music, not to analyze music. I continue to work on this as a jazz musician — and I think this is precisely where today’s models fall short.
How do we describe the difference between a jazz solo and ChatGPT?
Can ChatGPT Play the Blues?
I don't think that today's AI models are creative — yet.
The sign of a good artist is not how well they emulate. Remember the wave of Studio Ghibli-themed photos earlier this year? Every person in AI — myself included — seemed to have a week where we all wanted to create Ghiblified versions of our photos. But I can’t help but feel that something was lost here. It’s not just that “everyone can make this, so it has no value” or “only the original has value,” or even “only humans can make art.” It’s not even that it didn’t look “good enough.”
I think my discomfort with calling this type of work “creativity” is deeper: the visuals aren’t the whole story. The model, in a sense, pulled off a masterful show of borrowing, but to me, it was “merely” an amazing technical achievement, rather than sparks of some nascent ChatGPT artistic movement. In a sense, it’s the artistic equivalent of some black-box jazz pianist: built on pieces of previous art and strung together coherently, but not creatively.
What makes me care about a piece of art? It’s certainly not because it's visually pleasing: that's a rather functionalist view of art. The purpose of art isn’t just to be “good-looking:” we don’t canonize forgers’ fakes in the art world. No, I find art valuable because it is a statement made with agency: art is communication, an imprint of a process left by someone who had something to say.13 To take shortcuts towards art, by memorizing a solo, by stringing riffs without direction — defeats the purpose of making art. Rather, I care about art because I find it fascinating. A good jazz solo, painting, or book captures my full attention: I don’t notice what it’s “based on” unless I’m trying to study the piece of art. It just “fits in” with the world.
I feel similarly when creating art. In jazz, we know not just where we’ve been — what I just played — but where we’re going — the chords I’ll need to tackle next. Thus, there's something delightfully non-Markovian about a good solo: no single riff or motif is new, but you choose what you play looking both forwards and backwards.14 They say that “there are no wrong notes in jazz” — and that’s true, so long as what you play next justifies this.
I don’t think we’re completely in the dark on generative AI creativity. Today’s models are already impressive, and are trained using a combination of techniques that reinforce some form of imitation (supervised learning), and still others that encourage some more abstract reward (e.g., RLVR, RLHF). It reminds me of this ladder we’ve laid out: memorizing and borrowing via emulating, before attempting to be original. But there’s a non-trivial jump between these techniques, which have produced models that are "black-box jazz musicians,” and teaching creativity. 15 That’s no surprise: “creativity” is hard to operationalize. How “random” is too “random?” How “unoriginal” is too much? How do we even measure these things?16
I don’t know how to formulate these yet.17 Maybe these elements are so unquantifiable and intangible that we’ll never make progress here. But I’d like to believe otherwise. When I solo, I try to criss-cross between moments of novelty and predictable moments, even quoting other songs at times. Solos build: many great solos start simply, with just a few notes here and there, building into displays of show-stopping technique. There’s contexts where certain artistic choices are better than others. Being a jazz pianist doesn’t mean discovering completely new riffs or chords: it’s about learning basic skills first, then figuring out which contexts to apply them in. And I think the same goes for being creative. All this is to say: there’s some learnable signal here that we haven’t figured out how to operationalize.
I want to end by observing: in the past year, as a field, we’ve squeezed the crap out of the reasoning/math/coding angle. This makes sense, because these are easy-to-verify tasks, and a lot of ML researchers are already familiar with these areas. We seem to be close to learning which contexts certain techniques are useful (cf. 2025 IMO Gold by DeepMind). This is in every respect a tremendous achievement. But can’t we dream even bigger?18
As I was once told in classical piano: “Once you’ve mastered all the notes, all the dynamics, everything that’s on the page — the real work begins.”
Acknowledgements
Special thanks to Victoria Stafford and Zheng Yan for providing extensive comments on drafts of this piece.
LLM usage: All ideas are "human-made:" the first version of this article was written without the assistance of LLMs to avoid hitting myself in the head with a giant irony hammer. ChatGPT-4o was lightly used during revisions to improve the writing clarity.
Comments? Questions? Angry reactions? Reach out to Trenton at ctrenton at umich dot edu.
About the author (or, how dare this guy talk like this?)
Trenton Chang is a (hopefully) final-year PhD student at the University of Michigan in Computer Science & Engineering. Academically, he studies steering AI models towards high-level principles and goals. Personally, he is attempting to figure out and steer towards his own high-level principles and goals. Trenton has been playing the piano for 20+ years, and used to be decent at classical piano (enough for national awards + showing up on NPR as a kid). Today, he sneaks out of the lab from time to time to perform at the Lowertown Bar as the keyboardist in the Argonauts Jazz Trio in Ann Arbor, MI.
“Stems” can also be connected to one another with lines. You’ll notice connected single, double, and triple black lines in the above example.
I’ve left a million things out: key signatures and time signatures, which I see as cognitive conveniences to “group” related keys (slices of the y-axis) and temporal information (slices of the x-axis) in common ways, accidentals, which give you slightly higher y-axis (pitch) resolution, tempo modulation (ritardando and accelerando) not to mention the myriad other bits of articulation and the entire concept of pedals (there are three, but we really just use the right one and sometimes the left one). This is not to mention, of course, special cases like two-note slurs, which should be played long-short, or exceptions to those special cases like the opening of Mozart’s K. 570, or three staves, which, contrary to notation, do not require the user to become Zaphod Beeblebrox. The beginning or aspiring musician is welcome to take an introductory music theory course to learn more. The pedantic musician is welcome to send further angry emails at the address specified.
You can make such a performance more exciting by checking np.random.rand() < 0.01 every few notes and playing a wrong note if it evaluates to True. Every concert, you increment the random seed. Just kidding. Actually, maybe not.
Of course, this isn’t true, even for classical piano! Exhibit A, from my experience, is this section from Ondine in Gaspard de la Nuit: you go from soft and flow-ey sounds, and then have to execute this nice, super slow 4 bar segment at a painfully soft volume, with a consistent sound, and then jump to extremely loud arpeggios up and down the keyboard. This section teaches you a lot about musicality, and holding your sneezes.
Unless you don’t want them to belong to the chord! You are always allowed to pick a different chord. The #2 most important thing is for this chord to sound good. The most important thing is to tell the bassist.
These offsets are with respect to what’s called a “scale,” or a named grouping of related notes. Sometimes you’ll also see a fraction-looking chord like Gmaj7/A, in which case the “denominator” is the root. I have pretended that augmented/diminished chords don’t exist for simplicity.
In machine learning we are also solving very underspecified problems. A common way to deal with this is regularization (a.k.a. shrinkage in statistics or economics), or choosing a model with the right “inductive bias” such that discovering a “good” solution is easier. It adds some structure, simplifying the final output. It reminds me of Chopin’s advice: “Simplicity is the final achievement. After one has played a vast quantity of notes and more notes, it is simplicity that emerges as the crowning reward of art.”
Here, I mean in a “mainstream,” pop-culture sense. And for the rest of y’all, uh, here, have a Klavierstucke IX from Stockhausen. Not enough entropy for ya? Just give the good ‘ol Schoenberg knob a whirl. Still want to debate aesthetics? Get back to me after this one.
Of course, LLMs are also trained today using much more sophisticated techniques than imitation learning. As a stretch, we could think of the standard self-supervised next-word prediction as a kind of learning from expert demonstrations, where the “expert demonstration” is “whatever text is out there.” We can even lump SFT/instruction-tuning under this umbrella, but there are many post-training techniques such as RLHF fall out of this scope. There, I wonder if our reward models are just not attuned to producing creative responses — alternatively, “Do Reward Models Have Taste?”
To be clear, I don't mean stealing as in accusations of stealing intellectual property and other creative labor in the curation of massive pre-training datasets. That's a serious issue, but out of scope here.
For the aspiring pianists: imagine that you’ve just finished a performance and you go talk to an audience member. Which would you rather hear: “wow, your fingers are really fast” or “wow, that was really beautiful?” I know which one I liked more, personally.
If you’re a jazz pianist, work through some Coltrane changes and pick a run of four random 8th notes to use as a motif for the entire head. Bring the speed up to ~260-300 BPM (quarter). See?
A fun thought experiment suggested by a friend: what if we “faked” this process with AI? Generated aesthetically pleasing art, prompted the model to generate some artistic persona/motivation behind this, and presented it initially as original work before the reveal? I actually have no idea how I feel about this.
Many musicians, myself included, sometimes sing what we play (or are about to play — the timing is unclear to me) when we reach a “flow state;“ i.e., are super locked-in. I’m not sure why this seems to help, but perhaps this has to do with the “lookahead,” non-Markovian mode of creativity. Maybe creativity is planning, and we need some “lookahead"/limited bidirectional context to “solve” creativity? Perhaps to sing, I need to anticipate; to anticipate, I need to plan for what the music will sound like in the next ~5-10s, or more. All of this is mere speculation but we should go stick some electrodes on musicians’ heads (ooooh! me!) and see if we notice anything.
The LLM-savvy reader will no doubt notice that, while the black-box jazz pianist is Markovian by construction, that’s not how modern LLMs sample text. Fair! Most modern LLMs operate based on some form of autoregressive generation, conditioning on all prior tokens. But my point stands: similarly to the black-box jazz pianist, there’s no inherent “lookahead” or plan built in, no? You might imagine a slightly more advanced black-box jazz pianist that goes “oh no, I already played this riff, let me try something different at random.” There’s no sense of destination even though such a pianist is no longer strictly memoryless/Markovian.
Thus, we introduce CREATIVE-BENCH, a collection of 1,523,647 procedurally generated pieces of music, and a simulated piano as an evaluation harness…sorry. I thought this was arXiv.
Of course there are ways to measure randomness or novelty. But when you optimize for these metrics, we impose, implicitly or explicitly, priors on what counts as “quality,” which may or may not align what (subjectively) counts. The question is, is there some “good enough” permutation of these/choices of metrics? How would we even know?
Hotter question: do incentives exist in our field for us to “dream even bigger?” I don’t know! Discuss!





Found this on Twitter and really liked this analysis, especially this: "Being a jazz pianist doesn’t mean discovering completely new riffs or chords: it’s about learning basic skills first, then figuring out which contexts to apply them in. And I think the same goes for being creative. All this is to say: there’s some learnable signal here that we haven’t figured out how to operationalize."
Thanks for posting, looking forward to the next one!