I Trained AI on My Writing for 14 Days — Did It Work?

I Trained an AI on My Writing for 14 Days. Did It Actually Sound Like Me?

The experiment

For 14 days I tried to build an AI that writes like me. Specifically: I wanted to know whether I could ingest my last 200 published essays and produce a system that, when given a new topic, would write a 1,500-word essay that my regular readers would think I wrote. The success criterion was a blind reader test at the end — show 5 long-time readers a mix of human-me and AI-me articles, see who could tell which was which. I’d been skeptical going in. I came out… still mostly skeptical, but with one specific area where I was genuinely impressed and a clearer view of where the limits actually live.

Day 0 baseline

Starting state: 200 essays, roughly 350,000 words of mine in writing. A Claude Pro subscription. A Custom GPT setup. A weekend of free time before work started consuming my evenings. The hypothesis: style is replicable; substance isn’t.

Day 1-3: data prep

Spent the first three days cleaning my corpus. Removed essays that weren’t representative (commissioned work I wrote in someone else’s voice, two early-career pieces I’d rather forget, transcripts of talks I gave). What was left was 152 essays of “my actual voice.”

I extracted: average sentence length (16.4 words, surprisingly), opener patterns (I open with a scene or anecdote 78% of the time), transition phrases I use too much (“And so,” is mine, apparently), banned phrases I never use (“In conclusion,” “Let’s dive in,” “Furthermore”). I made a style fingerprint document — about 600 words long — that summarized this.

Day 4-7: building the style fingerprint

Created a Claude Project with the style fingerprint plus 20 sampled essays (rotated each session) plus a system prompt telling Claude to write in my voice using these as anchors.

First test: I gave it the same prompt I’d give a freelancer. “Write a 1500-word essay on whether AI productivity claims hold up in real workflows.” Output: competent but generic. The voice was close but the cadence was off. Long sentences in the wrong places. Banned phrases sneaking in.

Second test, with explicit style rules (“never use ‘It’s important to note,’ favor specific anecdotes over general claims, vary sentence length aggressively”): noticeably better. The voice landed about 80% of the time.

Third test, fine-tuning the prompt with anti-tells (“Don’t open with ‘In a world where’ or ‘In today’s…’ or any throat-clearing”): voice landed maybe 85% of the time. Improving but not magic.

Day 8-11: iteration on output

The middle stretch was about getting from “sounds like me-ish” to “actually sounds like me.” Most progress came from three changes:

1. Sampling specific essays per topic. When the AI wrote about productivity, I gave it my productivity essays as the in-context examples. When it wrote about tools, I gave it tool review examples. Topic-matched samples beat general samples by a wide margin.

2. Adding the rejection prompts. “Read this draft. If any sentence sounds like AI marketing copy or a LinkedIn motivational post, rewrite it.” Second-pass cleanup made a bigger difference than improving the first-pass prompt.

3. Specific anecdote injection. I gave the AI access to a “stories” file — a few hundred specific anecdotes from my life it could draw on. With access to those, the output felt much more like me, because the specifics were mine even when the prose was Claude-aided.

Day 12-14: blind reader test

Five long-time readers. Six articles. Three I wrote, three the AI wrote with my coaching. Ranked from “definitely written by you” to “definitely not.”

The result: readers correctly identified me as the author 64% of the time. AI as the author 47% of the time (worse than coin-flip, meaning they were guessing wrong on those). The AI fooled the readers about half the time when fed a good topic with a strong anecdote seed.

The qualitative feedback was the more interesting part. Readers who could tell said:

  • “The voice was there but the *point* wasn’t yours.”
  • “Two of these felt like you on auto-pilot.”
  • “This one didn’t have anything I disagreed with, which is suspicious — you usually take a contrarian angle and these were all middle-of-the-road.”

That’s the limit. The voice was replicable. The judgment — what to argue, what to push against, what to refuse to soften — wasn’t.

The data

| Metric | Result | |—|—| | Corpus size | 152 essays, ~265,000 words | | Style fingerprint length | 600 words | | Topic-matched samples per generation | 20 | | Time invested | ~28 hours over 14 days | | API cost | ~$45 across iterations | | Blind reader accuracy on AI articles | 47% (worse than coin flip) | | Blind reader accuracy on human articles | 64% |

Would I use this in production?

For exploratory drafts and content I’d otherwise not write at all? Yes. Topic exploration that I can sharpen into something stronger? Yes. The AI version of me is a good first-draft writer if I supply the angle and the anecdotes.

For finished work that goes out under my name? No. Even at 85% voice match, the 15% gap is exactly where my readers come back to read me. Style is replicable. Judgment isn’t. That’s the line I won’t cross, and the experiment made the line clearer, not blurrier.

If you want to try this: the highest-leverage moves are the style fingerprint document (extract your patterns), topic-matched in-context examples, and the rejection-pass cleanup. Skip fine-tuning the model — for individual writers, in-context examples plus a tight style fingerprint beats fine-tuning at this scale.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top