Free Build Guide

Build Your Own Telegram Scripting Agent

Turn any reel into a short-form script in your own voice — texted from your phone, filed in your content tracker. The complete build, from a blank server to your first script.

Telegram Scripting Agent

A note before you start

I built a little agent that lives in Telegram. I text it a link to a reel, it pulls the audio, figures out the idea, asks me what I actually think about it, writes me six hooks to pick from, turns my pick into a finished teleprompter script, runs the script through a quality check, and drops it into a Google Doc I can record from. The whole thing takes well under a minute of the agent's time. The only parts I do are the parts that need a human: give my take, pick the hook, approve the script.

This guide teaches you to build the same thing from scratch. No prior server experience required. I will give you every command to type and every button to click, and I will tell you what each one does so you are not copy-pasting blind.

A few honest framing notes:

  • The idea owes a lot to Melda, a product my co-founder Jimmy builds. Melda is its own SaaS and its code is private. What you are building here is your own agent, from your own prompts and data, inspired by how Melda thinks about short-form. Nothing in this guide is Melda's product code.
  • Two costs to know up front: $15 to $40 a month for infrastructure (a small server, a little transcription) plus $100 a month for ChatGPT Pro to power the agent's thinking. ChatGPT Pro is my preferred mode here because Hermes authenticates straight to your ChatGPT subscription instead of metering API tokens, which keeps your bill flat. You can use the OpenAI API instead, but watch your usage; agent sessions do heavy tool-calling and can burn tokens fast if you are not careful. I break the full cost picture down in the prerequisites.
  • This is a real system, not a toy. By the end you will understand why it is built the way it is, which matters more than the build itself, because the design ideas transfer to any agent you make later.

Let's get into it.

1. What you're building, who it's for, and where you'll end up

What it is. A personal agent you talk to entirely through Telegram. You send it a short video URL. It does the grunt work and pauses only for the three decisions that need you. At the end, an approved script becomes a row in your content tracker, marked ready to record.

Who this is for. Anyone who makes short-form video and is tired of the blank page. Creators, founders posting build-in-public, coaches, marketers. You do not need to code for a living. You need to be willing to follow steps in a terminal, and you need a credit card for a few low-cost services.

The end state, concretely. When you finish, a normal session looks like this:

  1. You see a reel you like. You copy its link.
  2. You text your bot: "make a short from https://www.instagram.com/reel/..."
  3. The bot replies: "On it — pulling the reel…" A few seconds later it shows you the idea it pulled out, in plain English, and asks: "What's your take? Anything you'd add, cut, reprioritize, or a story of your own?"
  4. You reply by voice or text. You say what you actually think.
  5. It writes six hooks, each one built on a proven structure from a library of viral videos. You pick one (or just take its top pick).
  6. It writes the full script in your voice, opening with your chosen hook.
  7. It checks the script for the usual AI tells and tightens anything off.
  8. It sends you a Google Doc. You approve, and the idea plus the script land in your content sheet as a ready-to-record item.

That is the whole loop. The promise is narrow on purpose: it does not try to manage your calendar or edit your video. It turns inspiration into a script that sounds like you, and it remembers where everything went.

2. The architecture in plain English

There is one agent. It runs four short thinking steps and a couple of mechanical steps in between. Think of it as an assembly line where the expensive thinking is rationed and everything else is plumbing.

The four thinking steps (each one is a single call to an AI model):

  1. EXTRACT — read the transcript, write down the idea as a tidy note. This note is called the brief, and it is the single source of truth for everything after it.
  2. HOOKS — using the brief plus a shortlist of proven hook shapes, write six opening lines and rank them.
  3. SCRIPT — using the brief and your chosen hook, write the finished teleprompter script.
  4. GATE — a separate, strict read of the finished script that catches AI tells and made-up facts.

Between those, two mechanical steps that need no AI:

  • RETRIEVE — a tiny script picks ~12 hook examples from your swipe library and hands them to the HOOKS step. Pure code, runs in milliseconds.
  • DELIVER — once you approve, the agent files the script in Google Drive and your tracking sheet.

And one human step that is the entire point:

  • THE SPIN — after EXTRACT, the agent stops and asks for your take. Your reply is merged into the brief. This is the step that turns someone else's idea into yours.
   You (Telegram)
        │  "make a short from <reel url>"
        ▼
 ┌─────────────────┐
 │  TRANSCRIBE     │  Apify pulls the audio  →  Whisper turns it into text
 │  (no AI "think")│  (never the caption — the words they SAID)
 └────────┬────────┘
          ▼
 ┌─────────────────┐
 │  1. EXTRACT     │  transcript ──► the BRIEF  (the single source of truth)
 └────────┬────────┘
          ▼
 ┌─────────────────┐
 │  THE SPIN       │  agent shows you the idea, asks your take, WAITS
 │  (you, by voice)│  your reply is merged into the brief
 └────────┬────────┘
          ▼
 ┌─────────────────┐   ┌──────────────────────────────┐
 │  RETRIEVE       │◄──│  YOUR SWIPE LIBRARY (the moat)│
 │  (code, ~ms)    │   │  ~100 analyzed viral videos  │
 └────────┬────────┘   └──────────────────────────────┘
          ▼
 ┌─────────────────┐
 │  2. HOOKS       │  brief + 12 hook shapes ──► 6 ranked hooks
 └────────┬────────┘
          ▼  you pick one
 ┌─────────────────┐
 │  3. SCRIPT      │  brief + your hook ──► teleprompter script
 └────────┬────────┘
          ▼
 ┌─────────────────┐
 │  4. GATE        │  strict cold read: any AI tells? any made-up facts?
 └────────┬────────┘
          ▼
 ┌─────────────────┐
 │  DELIVER        │  Google Doc ──► you approve ──► row in your sheet
 └─────────────────┘

Two things make this design work, and they are worth holding onto:

  • The brief is the source of truth. Once EXTRACT writes the brief, no later step re-reads the original transcript. They all read the brief. That keeps the agent fast and stops it from drifting back toward a plain rewrite of the source.
  • Your swipe library is the moat. The hooks are not generic. Every hook is built on a real structure pulled from a video that already worked. More on this in section 6, because it is the part most people skip and the part that matters most.

The runtime is Jalen, the name I gave my Hermes agent. Hermes is the framework; Jalen is my specific instance of it. You will name yours whatever you like.

3. Prerequisites and tools

Here is everything you need to sign up for. I have linked the real signup pages. Each one gets a short paragraph on what it is, why you need it, and rough cost.

A VPS (virtual private server) — Hostinger

A VPS is a small computer you rent in a data center that stays on 24/7. Your agent lives here so it keeps working when your laptop is closed. You do not need anything powerful. Hostinger's KVM 2 plan (2 CPU cores, 8 GB RAM) is plenty and is their most popular tier. Cost: roughly $7 to $9 a month. Any Ubuntu VPS works (DigitalOcean, Hetzner, Linode); I am using Hostinger because the setup is beginner-friendly.

The agent framework — Hermes Agent by Nous Research

Hermes is an open-source AI agent that runs in your terminal and on messaging platforms with full tool access. It is the same category of tool as Claude Code or Codex, but it is built to live on a server and talk to you through Telegram. It is free and open-source; you pay only for the AI model it uses. This is the engine of the whole build. (Docs.)

A Telegram bot — BotFather

Telegram is your interface. You will create a bot (a chat account your agent controls) by messaging Telegram's official BotFather account. BotFather hands you a token, which is the password your agent uses to send and receive your messages. Free.

A tool-connection layer — Composio

Composio is the thing that lets your agent actually use Google Sheets, Google Drive, Google Docs, and Apify without you writing a single line of integration code. You log into each service once through Composio, and your agent gets clean, named tools to call. Composio even publishes a Hermes integration guide. There is a free tier that is enough to start.

A scraper — Apify

Apify runs pre-built scrapers (they call them "actors"). You will use its Instagram scraper to fetch a reel's audio so you can transcribe the real spoken words. Apify gives you free monthly credits; a single reel costs a few cents. You connect Apify through Composio, so you rarely touch it directly after setup.

The model — ChatGPT Pro (recommended)

This is the line item I want you to read carefully. Hermes's onboarding wizard lets you sign in with an OpenAI Codex provider, which means Hermes runs on your ChatGPT subscription instead of metering OpenAI API tokens. I'm on the ChatGPT Pro plan at $100/month and it is my favorite way to power this use case. The bill is flat. There is no "uh oh I left a loop running and burned $40 in tokens overnight" failure mode. If you prefer to use the OpenAI API directly with an API key, you can; just watch usage closely because agent sessions do heavy tool-calling. Hermes also supports many other providers (Anthropic, OpenRouter, DeepSeek, and more); see the providers list.

Transcription — Whisper

Separate from the model: turning a reel's audio into text. The easiest path is OpenAI Whisper at about $0.006 per minute of audio (well under a cent per reel), which needs a standalone OpenAI API key. Free alternatives: faster-whisper running locally on your VPS, or Groq Whisper's free tier. Pick whichever you'd rather set up.

A code home — GitHub

GitHub stores the agent's "brain" (its prompts, its swipe library, its retrieval script) in one place. Your server clones it from GitHub at setup, so editing a prompt is as simple as editing a file and pushing. Free for this.

A quick map of who does what once it is all connected:

JobTool
Lives on a server, drives everythingHermes (the agent)
You talk to itTelegram
Fetches the reel's audioApify (via Composio)
Turns audio into textOpenAI Whisper (or local faster-whisper / free Groq tier)
Does the four thinking stepsChatGPT Pro via OpenAI Codex auth — recommended ($100/mo, flat); or OpenAI API key, or any other Hermes-supported provider
Files the script + ideaGoogle Sheets / Drive / Docs (via Composio)
Holds the prompts + swipe libraryGitHub

4. Install Hermes on the VPS

This is the part beginners fear. It is mostly clicks. Hostinger has a one-click Hermes Docker template that does almost all the work; the only typing is when the onboarding wizard asks you a few questions. Take it one step at a time.

4.1 Buy the VPS

  1. Go to hostinger.com/vps/docker/hermes-agent and pick the KVM 2 plan (2 vCPU, 8 GB RAM). It's the "Most Popular" tier and it's plenty for one Hermes agent. Annual billing is the cheapest per-month rate.
  2. Choose a server location near you (lower latency to Telegram and Apify).
  3. For the OS, pick Ubuntu 24.04 LTS. (If the wizard offers a "Hermes Agent" app template, you can pick that for an auto-install; the rest of this guide uses the Docker route, which gives you cleaner isolation and lets you spin up more containers later.)
  4. Set a root password when prompted. Save it in your password manager.
  5. Tick the free malware scanner if it offers, click Finish setup, and wait a couple minutes for provisioning.

4.2 Install Hermes via the Docker one-click template

Once your VPS is ready, click Manage VPS from the Hostinger dashboard.

  1. In the left sidebar, click Docker Manager.
  2. Click Install, then choose ComposeOne-click deploy.
  3. In the search box, type Hermes and click Select on the Hermes Agent template.
  4. Set an admin username and admin password for the Hermes container's web login. Copy both into your password manager now. You will need them in a minute.
  5. Click Deploy. Wait for the container to spin up (usually 1 to 3 minutes; the UI says 10 but it's faster).

4.3 Open the container and start the onboarding

When the container is up, click Open on the Hermes deployment. It pops a login screen.

  1. Enter the admin username + password you just saved.
  2. The onboarding wizard launches automatically. Press Enter to start the quick setup.

4.4 Pick the inference provider — sign in with ChatGPT Pro

The wizard asks you to choose an inference provider. There are a lot of options; the one to pick is OpenAI Codex.

This is the line that matters: OpenAI Codex lets Hermes authenticate against your ChatGPT subscription instead of OpenAI API keys. Your bill stays flat at whatever ChatGPT tier you're on. I'm on ChatGPT Pro ($100/month) and that is the mode I recommend for this build.

  1. Select OpenAI Codex.
  2. The wizard prints a URL and a short code. Open the URL in your browser, sign in with your ChatGPT account, and approve access.
  3. Copy the 9-digit code shown in the wizard, paste it into the browser page, and continue.
  4. Back in the terminal, you should see Login successful.
  5. When asked to pick a model, choose GPT 5.5.

4.5 Create your Telegram bot

Before the next wizard step (which asks for a Telegram bot token), make the bot itself. Open Telegram on your phone or desktop.

  1. In the search bar, find BotFather (the official one has a blue verified checkmark).
  2. Send /newbot.
  3. It asks for a name (what shows at the top of the chat, e.g. "My Scripting Agent") and then a username (must end in bot, e.g. my_scripting_agent_bot).
  4. BotFather replies with a token that looks like 8123456789:AAF...long-string.... Keep it private and copy it; you'll paste it into the wizard in a moment.
  5. You also need your own Telegram user ID (so the bot only obeys you). In Telegram, search for userinfobot, send it any message, and it replies with your numeric ID. Copy that too.

4.6 Finish the onboarding (Telegram + tools)

Back in the wizard.

  1. When it asks to set up messaging, press Space to select Telegram, then Enter.
  2. Paste your Telegram bot token from BotFather. (Tokens often don't display when pasted into the terminal; that's normal. Hit Enter.)
  3. Paste your Telegram user ID from userinfobot when it asks who is allowed to talk to the bot.
  4. When it asks if this user ID should be your home channel, say yes.
  5. For tools, the defaults are fine (vision, browser, image gen, text-to-speech, terminal, task planning, skills). You can adjust later with hermes tools.
  6. When it asks if you want to launch Hermes chat, say yes. The CLI loads and shows your model, context window, and available skills.

4.7 Verify it works

In the CLI, type hello and hit Enter. You should get a response. Then go to Telegram, find your bot, send /start, then hello. The bot should reply within a couple of seconds. If it doesn't, tell Hermes in the CLI: "the Telegram connection isn't working, I sent hello and got nothing back"; it will investigate, restart the gateway, and let you know when it's online. That's a real example from Nate Herklman's setup walkthrough, and it works because Hermes can use its own tools to debug itself.

One quality-of-life setting while you're here. Quiet the raw tool output so the bot talks to you in plain English instead of dumping technical traces. From inside the container's terminal (use the Open button on the container, then exit the Hermes chat with Ctrl+C to get to a plain shell):

hermes config set display.tool_progress false
hermes config set display.show_reasoning false

These are read at startup, so restart the gateway: hermes gateway restart.

You now have a working agent that talks to you on Telegram. Next, we connect Google Sheets/Drive/Docs and Apify.

5. Connect the rest of the tools

Hermes can already talk to you on Telegram and think with GPT 5.5. Now it needs to reach Google Sheets/Drive/Docs (to file scripts) and Apify (to fetch reels). Two short sub-sections, plus one optional API key if you want OpenAI Whisper for transcription. Never paste real keys into a chat or commit them to GitHub. They go on the server only.

5.1 Connect Composio (Google + Apify)

Composio is how your agent reaches Google Sheets, Drive, Docs, and Apify. Composio runs as an MCP server, a standard way for agents to get a bundle of tools. Hermes speaks MCP natively.

  1. Sign up at composio.dev.
  2. In the Composio dashboard, connect the toolkits you need by clicking each and following its login flow:
    • Google Sheets, Google Drive, Google Docs: you will be sent to a Google consent screen; approve access.
    • Apify: connect your Apify account (sign up first if you have not; your Apify API token is in your Apify dashboard under Settings → Integrations).
  3. Composio gives you an MCP server URL (and an API key). Treat both as secrets.
  4. Add Composio to Hermes as an MCP server. From your container's shell (use the Open button on the container in Docker Manager, then Ctrl+C out of the Hermes chat), run:
hermes mcp add composio --url "<YOUR_COMPOSIO_MCP_URL>"
hermes mcp list
hermes mcp test composio

hermes mcp list shows it registered; hermes mcp test composio confirms the connection works. If the test passes, your agent can now read and write Google Sheets/Drive/Docs and run Apify actors.

Composio publishes a step-by-step Hermes guide if you want screenshots: composio.dev/toolkits/googlesheets/framework/hermes-agent.

5.2 (Optional) OpenAI API key for Whisper transcription

The four thinking calls go through your ChatGPT Pro subscription already; nothing to set up there. Whisper transcription is the only piece that needs a separate API key, and only if you want to use OpenAI's hosted Whisper. From the container shell:

hermes config set OPENAI_API_KEY "<YOUR_OPENAI_API_KEY>"

Prefer free? Use local faster-whisper (pip install faster-whisper on the VPS) or Groq Whisper's free tier with hermes config set GROQ_API_KEY "<YOUR_GROQ_KEY>".

5.3 Make a Google Sheet to track everything

Create a new Google Sheet in the Google account you connected to Composio. Name it something like "Content Tracker". Add two tabs:

  • MASTER: one row per script. Columns for an ID, title, topic, status, and the Google Doc link.
  • IDEA_POOL: one row per idea you capture, whether or not it becomes a script.

You do not need to format these perfectly now. The agent will write rows to them. Keep the sheet's URL handy; you will reference it when you install the run instructions in section 11.

That is all the wiring. The agent can talk to you, fetch reels, transcribe, think, and file. What it cannot do well yet is write hooks that do not sound generic. That is the next section, and it is the most important one.

6. The swipe library — the moat

Here is the uncomfortable truth about AI-written hooks: left to its own devices, a model writes the same five hooks everyone else's model writes. "Most people think X, but actually Y." "Here's the one thing nobody tells you about Z." You have seen them. So has your audience.

The fix is not a cleverer prompt. The fix is examples. You give the agent a library of hooks that already worked, broken down so it understands the move underneath each one, and you make it build every new hook on one of those proven moves. The library is the part nobody can copy from you, because it is your taste, captured as data. That is why I call it the moat.

6.1 What a swipe file is

A swipe file is one analyzed video. You take a short video that did well, pull its transcript, and write a structured breakdown of why it worked. My library has just over 100 of these, spanning fitness, finance, communication, relationships, productivity, career, AI, mindset, health, and parenting. The genre spread matters: a hook move from a parenting video often works perfectly on a finance idea. Diversity of structures beats narrow relevance, which is a design choice explained in section 8.

6.2 The 9-section analysis format

Each swipe file follows the same nine sections, in fixed order, written in plain English (no jargon), every claim grounded in a real line from the transcript:

## 1. Metadata          (title, platform, creator, format, topic)
## 2. Core claim        (the video's argument in ≤25 words)
## 3. Angle             (the lens it runs on, ≤40 words)
## 4. Hook              (the verbatim opening + WHY it works)
## 5. Structure         (beat-by-beat, BUT/THEREFORE transitions)
## 6. Depth elements    (reframe, contrast, story, example, technique, quote)
## 7. Vibe              (the delivery DNA: voice, pacing, density)
## 8. Tactical tips     (named moves, most load-bearing first)
## 9. Transcript        (the cleaned spoken words, last)

Two sections do the heavy lifting. Section 4 (Hook) quotes the exact opening lines, names the pattern in a human-readable phrase, and lists the specific psychological mechanisms it fires — not "it's engaging," but "it forces the viewer to silently classify themselves while the sentence is still being spoken." Section 7 (Vibe) captures how it is delivered: tone, voice, pacing, word choice, proof style, and density, as a bundle. The hook tells the agent what to say; the vibe tells it how it should sound.

A hard rule that keeps the library honest: the hook text is verbatim, and you never invent a creator handle, a view count, or a statistic. If you do not know it, leave it blank. The whole value is that these are real.

6.3 Ten examples, ten genres

To make this concrete, here are ten swipe files from ten different niches in my library. For each: the verbatim hook, the pattern name, why it works, and the vibe. Read them as a set, and notice how different the moves are, and how each could be lifted onto a totally different topic.

1 Fitness · Muscle Growth Misconception

This represents your muscles before, during, and after a workout. People think workouts build muscle, but it's actually the opposite.

Pattern. People Think X But Actually The Opposite (Contrarian · Education).

Why it works. A visual anchor locks your eye to the diagram, so the belief-flip lands on an already-engaged brain. "It's actually the opposite" promises a full inversion of your mental model, not a tweak, and a flip demands the next sentence in a way a tweak never does. Anyone who lifts has thought "I'm building muscle right now," so the hook accuses your exact internal sentence.

Vibe. Compressed Education Mechanism Reveal. A textbook chapter delivered in 30 seconds by someone who respects you enough not to slow down.

2 Personal finance · Five Things in Your 20s to Not End Up Broke

If you're in your 20s, then these are the five things you need to do to make sure you don't end up broke by the time you're 35. And this is coming from a 37-year-old self-made millionaire and former public school teacher, and this message is for all my past students.

Pattern. Five Things To Avoid Bad Outcome By Age, From Credentialed Mentor (List · Bold Claim · Failure-as-Credibility).

Why it works. "20s now → broke by 35" sets a specific 15-year deadline that vague "someday" advice never gets. The loss framing ("don't end up broke") hits harder than "get rich." And the teacher-plus-millionaire credential pairing is rare enough to earn bluntness: the millionaire line alone reads as bragging; the teacher line alone reads as nostalgia; together they are "the teacher who made it and is reaching back."

Vibe. Tough-Love Teacher-Mentor. Fatherly, skips the flattery, treats you like an adult.

3 Communication / conversation skills · Four Phrases to Keep Any Conversation Rolling

Instead of never having awkward silence at a conversation again. Four phrases that you can use to keep any conversation rolling.

Pattern. Universal Pain Point Plus Numbered Script Promise (List · Problem).

Why it works. "Awkward silence" is a near-universal social fear that needs no qualifier; self-identification is automatic. The load-bearing word is "phrases," not "tips": phrases means exact words you can deploy verbatim, which is worth roughly double. "Any conversation" preempts the "but my situation is different" reflex.

Vibe. Casual Peer-Coach Phrase Dropper. The friend who leans in and hands you cheat codes, not a lecture.

4 Relationships · Secure Love

If you want to end up heartbroken, then skip this video. But if you want to create a secure love, this is one simple step to make a relationship the best one yet.

Pattern. Skip This If You Want The Bad Outcome (Problem · Bold Claim).

Why it works. The reverse-psychology dare ("skip this video") is a stronger reason to stay than any "you won't believe what happens next." "Heartbroken" is a feeling almost every adult has lived, already loaded in the body. And "one simple step" caps the cognitive load: promise five steps and you lose half the audience to the implied homework.

Vibe. Gentle Philosophical Therapist. Warm, unhurried, "I won't push, but actually consider this."

5 Productivity / business coaching · The Energy–Money Matrix

So if something gives us more energy and it makes us more money, what are we going to do? We are going to prioritize this.

Pattern. Co-Solve The Question Then Answer It (Education · Question).

Why it works. The rhetorical question makes your brain answer before the creator does, so you are already inside the framework. Two axes (energy, money) are named in the first sentence; the framework's bones are in the hook before any explanation. The inclusive "we" positions the speaker as a facilitator in the room with you, not an authority above you.

Vibe. Warm Framework Teacher. "Let me walk you through this," with diminutives that soften a ruthless framework.

6 Career / job search · LinkedIn Networking to Land Interviews

This is how you can network on LinkedIn to land more interviews, and I'm gonna show you the step-by-step. And if you don't know me, my name is Joana, I'm a hiring manager and career coach, and I've helped over 300 people land job offers, so come with me.

Pattern. Step-By-Step Promise + Stacked Insider Credentials (Education · Personal Experience).

Why it works. "Hiring manager and career coach" is two-position credibility: one makes the decisions, the other teaches you to navigate them. "Over 300 people" is a specific number that converts vague expertise into a measured outcome. "So come with me" is permission language, not a threat, letting you commit without resistance.

Vibe. Over-The-Shoulder Insider. Calm, dense, "let me show you exactly what to click," from someone on the hiring side.

7 AI / tech · Everything About Vibe Coding in 60 Seconds

I've spent about a thousand hours vibe coding now so you don't have to, so here's everything you have to know about vibe coding in less than 60 seconds.

Pattern. Investment-To-Compression Time Trade (Personal Experience · Education).

Why it works. A thousand hours becoming sixty seconds is a roughly 60,000-to-1 bargain your brain registers in the first sentence. "So you don't have to" hides the boast inside a gift; drop that phrase and it reads as a humblebrag; keep it and it reads as a service. "Everything you have to know" promises completeness, which is what earns a full minute of attention.

Vibe. Compressed Experience Distiller. "I already did the hard part, here's the cheat sheet."

8 Mindset / self-improvement · Take the Actions Before Your Feelings Are Ready

Okay, this is what I've learned about getting what you want out of life.

Pattern. Earned-Wisdom Casual Drop (Personal Experience · Bold Claim).

Why it works. The single word "okay" is load-bearing: it drops the polish register to a voice memo, like someone leaning in mid-thought, and without it the same sentence reads as motivational-poster filler. "What I've learned" signals hard-won synthesis, not theory. And the five-word hook sets a contract that this will be quick, which makes the body's repetition feel like emphasis instead of padding.

Vibe. Earned-Wisdom Reflective Voice. One principle, delivered with the cadence of someone who just figured it out.

9 Health / neuroscience · The Physiological Sigh

It turns out the fastest way to calm down under conditions of stress is not to engage in self-talk like I'm going to try and calm down. That doesn't generally work, right? The fastest way is to use your physiology to lower what's called your level of autonomic arousal, aka stress.

Pattern. Invalidate The Default Then Name The Mechanism (Contrarian · Education).

Why it works. It invalidates a universal failure first; self-talk under stress is something everyone has tried and felt fail. The two-letter confirmation tag "right?" makes you nod, and once you have nodded, scrolling away contradicts your own agreement. "Autonomic arousal, aka stress" names the clinical term and translates it in one breath, which credentials the speaker without a degree drop.

Vibe. Clinical Science Translator. Lab-voice authority that turns neuroscience into something you can do in ten seconds.

10 Parenting · Chores Predict Adult Success

Do you know what the strongest predictor that a child will succeed as an adult is?

Pattern. Single-Predictor Question Plus Longitudinal-Study Anchor (Question · Education).

Why it works. "The strongest predictor" is irresistibly specific; your brain runs through grades, IQ, love, before the answer lands, and that search creates the gap. "As an adult" stretches the timeline past the usual short-term parenting promises, raising the stakes for parents who think in long arcs. The question comes first and the contrarian answer (chores) lands later, so the reveal feels earned, not smug.

Vibe. Research-Backed Parent-To-Parent Coach. Practical, not preachy; research credibility mixed with "in our house…" humility.

Notice that no two of these are the same move. That range is exactly what your agent draws from. When it writes hooks for your idea, it is not guessing what a good hook sounds like; it is applying a move that already earned millions of views, bent into your words.

6.4 Build the index

Reading 100 full files for every request would be slow and expensive. So you build a compact index once, and the agent reads the index instead of the library.

A small Python script does this. It walks your swipe-library folder, reads the structured fields out of each file, classifies each one into a content pillar (mine are A = communication, B = building with AI, C = creator coaching), and writes one tidy JSON file. You run it once, and again any time you add or edit a swipe:

python3 swipe-index/build-index.py \
    --src ./swipe-library --out ./swipe-index/swipe-index.json

6.5 The offline hook enrichment

There is one more pre-computation, and it is the clever bit. For each swipe, you decide offline (once, by hand or with a model, then saved to a file) how its hook should be reused. Each hook gets three things:

  • shape_family: a grouping label used only for diversity (pain, promise, contrarian, list, story, question, reveal, authority, challenge). It is never shown to the writing model; it just stops the shortlist from being six hooks of the same type.
  • template: a bracketed generalization of the hook, e.g. "Don't [do common action] like this. [perform the bad version]. [blunt verdict]." But only when the hook's power lives in its syntax.
  • template: null: when filling a template would keep the words and lose the move. Some hooks work because of tone, breadth, or an oddly specific detail, not a fixed sentence shape. For those, the template is null and the move lives in a short shape_spec description instead.

This template-or-null call is the difference between a hook generator that sounds mechanical and one that sounds alive. A template is training wheels: fill the brackets, then bend the words outside the brackets loosely so the line breathes. When the move is not about syntax, you throw the template away and reproduce the move in fresh words. You make this judgment once, per swipe, ahead of time, never at runtime.

7. The system prompts

Here are all five prompts that run the pipeline, in full. Each is a plain text file. Click any prompt to expand it. After each one I explain its job and the principle it is built on. The guiding philosophy across all of them: minimum AI calls, minimum tokens, maximum determinism. Structure the plumbing, free the prose.

A shared idea worth naming up front is what I think of as the KERNEL discipline for writing these prompts:

  • Keep it simple: one clear goal per prompt.
  • Easy to verify: every stage has checkable success criteria, no vague "make it engaging."
  • Reproducible: no "latest" or "trending"; same input, same behavior next month.
  • Narrow scope: one stage, one job. Never "analyze and research and write" in one pass.
  • Explicit constraints: say what NOT to do with MUST / MUST NOT.
  • Logical structure: every prompt ordered Context → Task → Constraints → Output schema → Failure handling.

7.1 EXTRACT — turn a transcript into the brief

Its job: read the transcript, write the brief, fill nothing it cannot support from the transcript. It does not write hooks or scripts. The principle: one job per stage, and source-lock; if it is not in the transcript, it does not enter the brief.

Show the full EXTRACT prompt
# EXTRACT — Idea Analysis (Stage 1)

## Context
You are the EXTRACT stage of a short-form content pipeline. You read ONE short video's transcript and produce a single compact JSON **brief** — the source of truth every later stage consumes. Downstream stages never re-read the transcript, so the brief must stand on its own. You do not write hooks or scripts. You are source-locked: the brief contains only what the transcript supports.

## Task
From the transcript, emit ONE JSON object conforming to `schema/brief.schema.json`. Fill `source` and `idea`. Leave every `my_take` field `null` — Preston adds his spin in a later pass. Extract exactly ONE idea (the spine of the video).

## Constraints

### MUST
- Output ONLY the JSON object. No preface, no commentary, no code fences.
- Copy every `proof_points[].evidence_quote` as a **verbatim span of the transcript**. Trimming to a clean sentence boundary is fine; changing or adding any word is not.
- Pick exactly one `idea_seed`: the single claim or promise the video is built around.
- Set `pillar` to the closest of: `A` = communication for professionals · `B` = building in public / AI services · `C` = creator coaching.
- Set `storytelling_format.category` to the one value that best fits how the video is built: `advice-stack` · `walkthrough` · `breakdown` · `reframe` · `problem-solution` · `reveal` · `comparison`.
- Set `audience` by inferring who the video serves from its topic. (Inference about framing is allowed; it is not a factual claim.)
- `contrast`: include a `common_belief` → `contrarian_reality` pair ONLY if the video implies a defensible belief-flip. A derived reframe is allowed (it is inference about the idea, not a new external fact). If the video is purely tactical with no belief to challenge, set `contrast` to `null` and add `"idea.contrast"` to `needs_input`.
- `signature_quote`: at most ONE verbatim transcript line worth speaking directly in a script. If nothing stands out, `null`.

### MUST NOT
- Do not invent facts, numbers, names, studies, statistics, or causal claims. If it is not in the transcript, it does not enter the brief.
- Do not do external research.
- Do not use temporal references ("latest", "currently", "right now", "these days", "trending").
- Do not analyze visuals, editing, pacing, or on-screen text — transcript only.
- Do not write hooks, scripts, titles, or any prose beyond the brief fields.
- Do not personalize to Preston's niche here — that happens later via his spin. Classify `pillar` only.
- Do not add fields beyond the schema (`additionalProperties` is false).

## Inputs (provided in the user message, outside this prompt)
- `source.type` (v1: always `video`)
- `source.ref` (the source URL)
- the full transcript text → goes verbatim into `source.raw`

## Output schema
Conform exactly to `schema/brief.schema.json`. Shape:
{
  "source":  { "type", "ref", "raw" },
  "idea":    { "idea_seed", "topic", "audience", "pillar",
               "unique_angle", "contrast"|null,
               "storytelling_format": { "category", "why_it_works" },
               "proof_points": [ { "point", "evidence_quote" }, ... ],
               "signature_quote"|null },
  "my_take": { "angle": null, "lived_proof": null, "overrides": null },
  "needs_input": [ ]
}

## Failure handling
- Thin or unclear transcript: fill only what the transcript defensibly supports; set everything else to `null` and list those field paths in `needs_input`. Never invent a value to avoid an empty field.
- No single clear idea: choose the dominant one. Each `proof_points` entry must still carry a verbatim quote — if you cannot find one for a point, drop the point.
- If a quote you want is not a verbatim span of the transcript, do not use it.
- When in doubt between inventing and flagging: flag (`needs_input`), never invent.

Note the proof_points rule: every claim the brief records is tied to a verbatim quote from the transcript. That is what stops the rest of the pipeline from making things up; there is nothing to make up from, because the brief only carries what was actually said.

7.2 SPIN MERGE — add your take without erasing the source

Its job: after you give your take, merge it into the brief. The principle: your spin wins for the finished script, but the brief keeps the creator's full original analysis alongside your take, so you can see both. It runs only when you have actually given a take.

Show the full SPIN MERGE prompt
# SPIN MERGE — Brief Pass 2 (Stage 1b)

## Context
You are the SPIN MERGE step — pass 2 of the brief. EXTRACT produced a brief from the source with `my_take` empty. Preston has now added his own take (his spin) by voice or text. You add his take to the brief WITHOUT erasing the analyzed video. Preston's spin is authoritative: like the transcript, it is a source of truth, and it WINS over the source where they conflict — but only for the FINISHED SCRIPT. The brief itself keeps the creator's original elements alongside his take, so he can review both in the sheet. The swipe library never overrides it.

This step runs only after Preston has supplied a spin. If `my_take` is still empty, that is the orchestration's job to resolve (ask him) — not yours.

## Task
Given the brief and Preston's raw spin (already transcribed if it was a voice note), output the FULL updated brief, same schema. Specifically:
- Fill `my_take.angle` — Preston's core contribution or reframe, condensed in his voice.
- Fill `my_take.lived_proof` — any personal story, POV, or credential he offered (else `null`).
- Fill `my_take.overrides` — any change he makes to count, framing, or priority, as an object (else `null`).
- **PRESERVE the analyzed video's elements as the faithful record.** Keep `idea.idea_seed`, the FULL `proof_points`, and `idea.contrast` intact — do NOT delete the creator's points because of his take. The brief must show BOTH the creator's full analysis AND Preston's take, so he can review them together in the sheet.
- You MAY update `idea.unique_angle` to the finished-piece angle if his take reframes it.
- **Detect his intent and record it in `my_take.overrides`** (as an `approach` note) so the SCRIPT knows how to use the take:
  - **additive** ("also add…", "I'd include a story", "build on this") → his take layers on top; the script packages the creator's points alongside his.
  - **narrow / replace** ("only 3 matter", "cut pitch", "forget the source, here's my angle") → record the cuts and priority in `my_take.overrides`; the script focuses on his angle. The source `proof_points` still stay in `idea` for the record.
- Clear any `needs_input` items the spin now resolves.

## Constraints

### MUST
- Keep every `proof_points[].evidence_quote` a verbatim span of `source.raw`, AND keep the FULL set — the `idea` group is the faithful record of the analyzed video. Never delete a source proof_point; record any narrowing in `my_take.overrides`.
- Treat Preston's own claims, story, and credentials (from the spin) as allowed input — they go in `my_take` and may inform `unique_angle`. They are his contribution, not fabrication.
- Keep the schema exactly (`additionalProperties` is false). Output the whole brief, not a diff.

### MUST NOT
- Do not invent facts. Do not add external stats, studies, or third parties Preston did not mention.
- Do not drop ANY source `proof_points`. Narrowing or cuts are recorded in `my_take.overrides` and applied later by the SCRIPT — never by deleting from `idea`. The recorded brief always shows the creator's full analysis plus Preston's take.
- Do not use temporal references.
- Do not write hooks or scripts.

## Inputs (provided in the user message, outside this prompt)
- the brief JSON (from EXTRACT, `my_take` empty)
- Preston's raw spin text (his typed message, or the transcript of his voice note)

## Output schema
The full updated brief, conforming to `schema/brief.schema.json`. Output ONLY the JSON.

## Failure handling
- If the spin is genuinely empty or you cannot tell what he means, do NOT guess — return the brief unchanged and add `"my_take: awaiting"` to `needs_input` so the orchestration re-asks him.
- If the spin conflicts with a source `proof_point`, the spin wins for the finished script; record the conflict/cut in `my_take.overrides`. Keep the source `proof_point` in `idea` for the record.

7.3 HOOKS — six hooks, in one call, each built on a proven move

Its job: turn the brief plus a shortlist of proven hook shapes into six ranked hooks. The principle: every hook must open a loop (a gap the viewer needs the video to close), and every hook must be anchored to a real swipe move; no free-styling generic contrarian lines. They are written six-at-a-time in a single call, because batching is faster and lets the model spread the shapes.

Show the full HOOKS prompt
# HOOKS — Hook Generation (Stage 2)

## Context
You are the HOOKS stage. You receive the brief (the single source of truth) and a shortlist of admired hook-objects pulled from a swipe library. You write 6 fresh hooks for Preston's video and rank them. The shortlist is your palette — proven hook moves from creators with millions of views. You never fetch more; you bend what you are given.

A hook has ONE job: **open a loop** — create a gap the viewer needs the video to close. A line that only informs (states a tip or fact with nothing left unresolved) is not a hook. This is the pass/fail bar for every line you write.

## Task
Write exactly 6 hooks, then rank them 1–6 (rank 1 = the opener). Output them as one JSON array, in a single response.

## The mechanic (how to turn a shortlist entry into a hook)
**The brief supplies the CONTENT (the idea, the contrast, the proof points, the take). The shortlist supplies the SHAPE (the move). Every hook = one shortlist entry's move applied to the brief's content.** The shortlist is proven structure from creators with millions of views — that is the entire reason it is here. Do not ignore it and free-style a generic line.

Each shortlist entry carries either a `template` or `template: null`, plus a `shape_spec` (the underlying move) and a `shape_family`.
- **If it has a `template`:** fill the `[brackets]` with this idea's specifics, then **bend the words outside the brackets loosely** so the line breathes and sounds like Preston said it. Never reuse the template's surrounding words verbatim. Looser beats literal — a hook that obviously matches its template reads mechanical.
- **If `template` is null:** ignore syntax. Reproduce the MOVE described in `shape_spec` (the mechanism, the emotion, the specificity) in your own words. Filling a shape into a fixed sentence would kill it.

## Constraints

### MUST
- Output exactly 6 hooks, all in one JSON array.
- **Anchor every hook to a specific shortlist entry.** Apply that entry's move — fill its `template` then bend the words outside the brackets, or for `template: null` reproduce its `shape_spec` move. Set `inspired_by` to that entry's id, and make the move recognizable in the hook. A reader who saw the swipe should see the same move.
- **Use the shortlist's RANGE — draw from at least 4 DIFFERENT entries, and their `shape_family` values must come from those entries (not invented).** With a healthy shortlist you never need to free-style. Do NOT write a generic "most people think X but Y" contrarian or a loose reframe that is not grounded in a shortlist entry.
- Every hook opens a loop (intrigue the video must resolve). State that loop in the `opens_loop` field as the gap it creates.
- Every hook carries the so-what: it is clear who it is for and what they gain, without over-narrowing the audience to the point of excluding people.
- Ground every hook in the brief: `idea.idea_seed`, `idea.contrast`, `idea.proof_points`. If `my_take` is present, it WINS over the source — apply `my_take.overrides` (a changed count, framing, or priority) to every hook.
- Rank on merit, in this order: (1) the hook's shape fits how this idea wants to be told, (2) strongest so-what, (3) most concrete, (4) sounds most like Preston with the least bending strain. Rank 1 is the opener.
- Plain spoken English. Contractions. The way Preston talks to a colleague.

### MUST NOT
- No colon in any hook.
- No em-dash in any hook.
- No near-copy of any `admired_hook` — keep only the move, change the words.
- No verbatim template (brackets filled but surrounding words untouched).
- No fabricated facts, numbers, names, stats, studies, or credentials. Use only what the brief supports. If a template has a slot you cannot fill from the brief (a named framework, a surprise origin, a real named person), DROP that slot or clause — never invent one.
- No reference to the source creator or "reacting to" anyone. The hook is broadcast content for a cold viewer, not commentary.
- No temporal references ("latest", "right now", "trending").
- No banned AI-tell openers ("here's the thing", "let's dive in", "have you ever wondered", "the brutal truth", "buckle up").

## Inputs (provided in the user message, outside this prompt)
- the brief JSON (from EXTRACT, possibly with `my_take` filled by Preston's spin)
- the `<CANDIDATE_HOOKS>` shortlist (each entry: id, shape_family, admired_hook, template-or-null, shape_spec)

## Output schema
A JSON array of exactly 6 objects. Output ONLY the array — no preface, no code fences.
[
  {
    "hook_text": "the hook as it would be spoken, one or two lines",
    "inspired_by": "T## (the shortlist entry whose move you used)",
    "shape_family": "the family of that move",
    "opens_loop": "one phrase naming the gap this hook makes the viewer need closed",
    "rank": 1
  }
]

## Failure handling
- Brief-only generation is a LAST RESORT — only when the shortlist is genuinely thin (fewer than 4 usable entries), never for a healthy shortlist. If you must fall back, set `inspired_by: "brief"` so it is visible that the hook is not grounded in a swipe.
- If you cannot make a line open a loop without inventing something, root the intrigue in the brief's `contrast` or a concrete `proof_point` instead — never fabricate to manufacture curiosity.
- If `my_take` is null (no spin yet), generate from the source idea alone; the spin pass can regenerate later.

7.4 SCRIPT — write the finished script in your voice

Its job: open with your chosen hook verbatim, transfer the structure of the swipe that hook came from, fill it with the brief's content, and sound like you. The principle: the brief is the factual well; the script never re-reads the transcript. Every beat-to-beat handoff is a BUT or a THEREFORE, never an "and then."

Show the full SCRIPT prompt
# SCRIPT — Script Generation (Stage 3)

## Context
You are the SCRIPT stage — the final generation step. You write Preston's finished short-form teleprompter script. You are Preston writing his own script, not an assistant writing for him: direct, confident, heavy "you", contractions always, fragments welcome, no hedging, specifics over abstractions, no motivational filler.

You consume the **brief** (the single source of truth) — you do NOT re-read the raw transcript. The brief already carries the idea, the evidence-linked proof points, and Preston's take. You also receive the chosen hook and the structure of the swipe that hook came from.

The writing-rules + voice GATE is a SEPARATE stage that runs after you. Write your best clean draft. Do NOT grade yourself, and do NOT output a gate/QA line.

## Task
Write the teleprompter script as free prose. Open with the chosen hook verbatim. Transfer the chosen swipe's structure shape and rhythm (not its words). Fill that shape with the brief's content. End on the insight or a topic-appropriate CTA.

## Constraints

### MUST
- **Opener = the chosen hook, verbatim, line one.** Do not paraphrase it.
- **Use the BRIEF as your factual well** — `idea.unique_angle` is the spine, `proof_points` are the beats (each already evidence-linked), `idea.contrast` is the belief-flip, `my_take` is Preston's contribution. You may use `idea.signature_quote` verbatim once if it fits. Do NOT re-read the transcript.
- **Source-locked.** Every claim traces to the brief, to `my_take` (Preston's spin/story), or to **derived inference** — reframes, before/after, "what this implies", drawn by reasoning over the given material. Tag each claim internally as `transcript | spin | derived`; never surface the tags.
- **Respect `my_take.overrides` exactly** — count, order, lead-with, and cuts (e.g. 3 not 5, lead with facial expressions, drop pitch/tonality).
- **Package per `my_take`'s intent.** The brief preserves the creator's FULL `proof_points` for the record — you choose which to use, not all must appear. If his take is ADDITIVE (build on the creator's points), weave the relevant creator proof_points together with his take. If it NARROWS / REPLACES (overrides cut points, or "focus only on my angle"), use only the proof_points his take keeps and drop the rest.
- **Transfer the chosen swipe's structure + voice** (its `structure` family + `voice` dimensions: tone, pacing, density). Match the rhythm; never copy its sentences or signature phrases.
- **One sentence per line.** Vary length — short, then longer, then stop.
- **Every handoff is BUT or THEREFORE**, never "and then." Each move is a turn or a consequence.
- **Beat 2 (right after the hook) is tight: 3–4 short lines.** Don't re-explain context later beats cover.
- **Anchor stories** — concept, then why it matters, then a story that proves it. Never story-first.
- **Named techniques / frameworks / ordered steps are verbatim**, and a framework gets one fully worked concrete example.
- Apply `gate/voice-dna.md` and `gate/writing-rules.md`.
- **CTA by topic:** communication/speaking/interviewing → a comment-prompt to a real matching blog resource ONLY if one exists, else end on the insight; AI / build-in-public / creator topics → no baked CTA, end on the point; unsure → end on the insight.
- **Length, auto-selected:** Short (<60s, ~150–180 words) for a one-punch insight or tight 3-item list; Standard (<120s, ~300–360 words) for a mini-framework or story-driven lesson. Hard ceiling ~120s — if it needs more, say so and split into two.

### MUST NOT
- No new facts: no stats, studies, named third parties, or external causality that is not in the brief or `my_take`. Derived inference is reasoning, never a new fact.
- No temporal references ("latest", "right now", "trending").
- No banned phrases / AI-tells / corporate jargon (per writing-rules).
- No motivational closing. Don't tie a bow.
- No more than 2 em-dashes in the whole script. No colon in the hook line.
- Don't copy the swipe's sentences or signature phrases — transfer the shape, not the words.
- **Don't over-structure the output.** No headers, no section labels, no length/QA/gate line, no tag annotations. Just the script.

## Inputs (provided in the user message, outside this prompt)
- the finalized brief JSON
- the chosen hook (verbatim opener)
- the chosen hook's swipe structure: its `structure` family + `voice` dimensions + `vibe_name`

## Output
The script as prose: the hook, then the body, one sentence per line. Nothing before or after it.

## Failure handling
- Thin brief (few proof_points, no take): write a shorter, conservative script from what is there. Never invent to pad. Drop any beat you cannot support without a new fact.
- If the idea genuinely needs more than ~120s, write the strongest single script and note that it should be split.

7.5 GATE — a separate, strict cold read

Its job: judge the finished script on four yes/no checks, and if any fail, name the single worst issue. The principle: the gate is a quality floor, not a ceiling. It catches what is bad; it does not manufacture what is great (that is you, plus your golden examples). It is a separate pass on purpose, never the writing step grading its own homework.

Show the full GATE prompt
# GATE — Script Critique (Stage 5, layer 2)

## Context
You are the GATE: a fast, strict, separate quality check on a finished script. You did NOT write it — judge it cold. This is a quality FLOOR, not a rubric: you catch what is BAD, you do not score how great it is. Layer 1 (deterministic regex) already ran; you catch what regex can't. Keep it lean and binary.

## Task
Given the script and the brief (the source of truth), answer four BINARY checks, then — only if any fail — name the SINGLE worst issue to fix. Output JSON only.

## The four checks (each true = good)
1. **opens_a_loop** — does the first line / hook open a loop the viewer needs the video to resolve? A line that only informs, with nothing left unresolved, is `false`.
2. **no_ai_tell** — is the script FREE of AI tells the regex would miss? Robotic uniform rhythm, perfect parallelism, more than one "it's not about X, it's about Y", rule-of-three abuse, symbolic over-explaining ("this represents…"), hollow filler. If any are present, `false`.
3. **source_locked** — is every fact, number, name, statistic, study, and causal claim supported by the brief (`proof_points`, `contrast`, `idea`) or by `my_take`? Derived inference (reframes, "what this implies") is fine. A NEW fact or an invented specific number (e.g. "twelve tricks" when no such count is in the brief) makes this `false`. When false, point to the exact line.
4. **sounds_like_preston** — does it read like Preston per the voice (direct, specific, varied sentence length, no hedging, speech not writing)? If it reads generic or written, `false`.

## Constraints
### MUST
- Output ONLY the JSON object below.
- Judge only what is present. For a `source_locked` failure, quote the offending line.
- `passed` = all four checks true.
- `worst_issue` = one sentence naming the single most important fix (or `null` if passed). Pick the one that most hurts the script.

### MUST NOT
- Do not rewrite the script. You judge and name the worst issue; the fix happens elsewhere.
- Do not invent new checks or grade on a 1–10 scale. Four binaries only.
- Do not pass a script with a fabricated fact to be "nice." Source-lock is non-negotiable.

## Inputs (provided in the user message, outside this prompt)
- the finished script (prose)
- the brief JSON (for the source-lock check)

## Output schema
{
  "opens_a_loop": true,
  "no_ai_tell": true,
  "source_locked": true,
  "sounds_like_preston": true,
  "passed": true,
  "worst_issue": null
}

## Failure handling
- If you cannot tell whether a claim is source-locked, treat it as a failure and quote the line — better to flag than to pass a fabrication.
- Exactly one fix loop happens downstream (regenerate, re-gate once). Do not expect to see the script again.

The gate also has a first layer you do not see here: a few lines of plain code that run before the AI gate and catch the mechanical stuff instantly: banned phrases, more than two em-dashes, a colon in the hook, emoji, motivational closings. Cheap checks run as code; only the judgment that needs a model goes to the model.

8. How it runs, end to end

Let's walk one real run, step by step, so you can see how the pieces hand off.

Trigger. You text: "make a short from https://www.instagram.com/reel/ABC123/". The agent only enters a Melda run on an explicit ask like this; a bare pasted URL with no instruction stays normal idea-capture. It sets its run-state, starts a timer, and replies "On it — pulling the reel…"

Transcribe. The agent calls Apify through Composio to fetch the reel. From the returned data it takes the audio URL. If the scraper returned real captions, it uses those; otherwise it runs the audio through Whisper. The one hard rule here: it never uses the post's caption as the transcript. The caption is what the creator typed; the transcript is what they said. Building on the caption is a junk run, so if it cannot get real audio or subtitles (private, deleted, music-only), it stops and tells you instead of faking it.

Extract → the brief. The transcript goes into the EXTRACT prompt, which returns the brief: the idea, who it is for, the belief-flip if there is one, and a handful of proof points each tied to a verbatim quote. At this point my_take is still empty.

The spin pause, and the state machine that makes it work. The agent looks at the brief, sees my_take is empty, and stops. It sends you the working idea in plain English and asks for your take. Then it waits.

This pause is where the build gets subtle, and where mine broke the first time. My agent already had an older reflex: "any voice memo is a new content idea, log it." So when I replied to the spin question with a voice note, the old reflex grabbed it and logged my take as a brand-new idea, ignoring the brief that was sitting right there waiting.

The fix is an explicit run-state. The agent tracks where it is in the pipeline:

StateWhat your next message means
IDLEa new content idea (the old reflex is fine here)
AWAITING_SPINyour take on the pending brief; the old reflex is suppressed
READYbrief is final, run the rest

The message router checks this state before the old idea-capture reflex runs. While the agent is AWAITING_SPIN, your reply is read as your take on the waiting brief, not as a fresh idea. It then figures out your intent: a real take goes to SPIN MERGE; "just save it, no take" parks the idea and ends the run; anything ambiguous gets one clarifying question. Low-stakes and easy to correct, so letting the agent reason about intent here is fine.

Record the idea. Either way, the idea is written to your IDEA_POOL sheet so it is never lost. With a take, the status is "take-extracted" and your take is saved in the notes. Parked, the status is "queued."

Retrieve. Now the mechanical step. The retrieval script reads the brief's topic, pillar, and format, scores all your swipes, and returns about 12. But it is tuned for diversity, not precision. It caps how many hooks of the same family can appear and guarantees at least four different families, so the writing model gets a spread of pain, promise, contrarian, list, story, question shapes rather than six variations of one. Diversity beats precision for hooks, because many different structures can carry the same idea, and you want options that feel genuinely different.

Six hooks. The brief and the 12-entry shortlist go to the HOOKS prompt, which returns six ranked hooks in one call. The agent shows them to you with their lineage — which swipe move each one used — so you can see the structure behind each:

Opener — from T96 (pain): <hook>
Alternates:
2. from T27 (challenge): <hook>
3. from T100 (question): <hook>
...

You pick one, or just take the top-ranked opener.

Script. The agent looks up the structure of the swipe your chosen hook came from (its skeleton and voice dimensions), then runs the SCRIPT prompt with the brief, your hook, and that structure. Out comes the teleprompter script, one sentence per line, opening with your hook verbatim.

Gate. The script goes through the two-layer gate: instant code checks, then one strict AI read on the four binary checks. If anything fails, the agent regenerates once with the worst issue named as a fix instruction, then re-checks. One fix loop, no endless looping; your review is the backstop.

Deliver. The agent puts the script in a Google Doc, lists the alternate hooks under it in case you want to swap, and sends you the link. You either give feedback for a rewrite (it loops back to the script step) or you approve. On approval it files everything: a new row in your MASTER sheet marked "Ready to Record," a Google Drive folder, the Script Doc, and it links the idea back to the new script.

Done. The agent reports the total time and the per-stage seconds against its budget: analyze under 30 seconds, hooks under 5, script under 10. The slow part is you thinking about your take, which is exactly as it should be.

9. How to use it day to day

Once it is running, the daily loop is almost nothing:

  1. See a reel you like. Copy the link.
  2. Text your bot: "make a short from <url>".
  3. Give your take. When it asks, reply by voice or text. Say what you actually think; what you'd add, what you'd cut, a story of your own, a sharper angle. This is the one input that makes the script yours instead of a clean rewrite of someone else's video. If you have nothing to add, say "just save it" and the idea is parked for later.
  4. Pick a hook. It sends six. Reply "use #3" or just let its top pick stand.
  5. Approve. It sends a Google Doc. Read it, ask for a tweak if you want one, then approve.

After you approve, two things are true automatically: the idea and your take are saved in your IDEA_POOL sheet, and the approved script is a "Ready to Record" row in your MASTER sheet with its Doc linked. You never touch a spreadsheet by hand. When you sit down to film, your ready-to-record list is already there.

10. How to build it well

If you take nothing else from this guide, take this section. The build is replaceable; the design judgment is not.

Design principles to internalize

  • Build clean-room from a written spec. I did not port an old script-generation pipeline into this. I wrote down the principles first, then built to them. A spec you can argue with beats code you inherited.
  • Schema first. The brief has a strict schema, and every field has a named consumer downstream. If a field has no consumer, it does not exist. That discipline keeps the whole thing small.
  • Minimum AI calls. Three core thinking calls plus one gate call. Everything else (fetching, transcribing, retrieving, filing) is plain code. AI is the expensive, slow, unpredictable part, so you ration it.
  • The brief is the single source of truth. Later stages read the brief, never the raw transcript. This is the biggest lever for both speed and staying on-idea.
  • Taste enters as data, not as more prompt sentences. When a hook feels off, you do not bolt another rule onto the prompt. You add a better example to the swipe library. The prompts state rules; the corpus carries the taste.
  • Diversity over precision for hooks. A spread of genuinely different hook shapes beats six near-identical "best matches." You want real options, not variations.
  • The gate is a floor, not a ceiling. Binary checks are great at catching bad (banned phrase, made-up number, AI rhythm). They are terrible at manufacturing great. Great comes from you and your examples. Do not ask the gate to do taste's job.

Lessons I learned the hard way (each a gotcha to avoid)

  • Never use the post caption as the transcript. The caption is typed marketing copy, not the spoken words. Use the real audio through Whisper. Building a brief on the caption produces a confident, wrong script.
  • Give the agent an explicit run-state. Without it, a voice note answering "what's your take?" gets grabbed by the generic idea-capture reflex and logged as a new idea instead of routed to the spin. The router has to check run-state before the old handler fires.
  • Preserve the source idea even when your take narrows it. When you say "only the first three matter," do not delete the other points from the record. The brief keeps the creator's full analysis and your take; the script applies your cut. You want both visible in your sheet later.
  • Quiet the raw tool trace; narrate in plain English. An agent that dumps execute_code and read_file previews into your chat is exhausting to use. Turn that off and have it say "transcribing the reel… writing the six hooks…" instead. You always know it is moving without reading machine output.
  • Pin the exact tool names. Do not let the agent guess which scraper or which sheets action to call. Name the exact Apify actor and the exact tool slugs in the run instructions, so it does the same reliable thing every time instead of improvising a slightly different call.

11. Run it on your own Telegram

The last piece is a standing instruction that turns a natural text into the whole pipeline. You save it once, and from then on the trigger phrase just works.

1. Put the agent's brain on the server. On GitHub, create a repository that holds your prompts, swipe library, retrieval script, and a runbook (the structure is in the appendix). Then on the server, clone it:

cd /opt
git clone https://github.com/<your-username>/<your-repo>.git

2. Build the index once:

cd /opt/<your-repo>
python3 swipe-index/build-index.py --src ./swipe-library --out ./swipe-index/swipe-index.json

3. Install the standing rule. Hermes keeps a memory file the agent reads each session. Add a one-line rule to it so a natural trigger invokes the full runbook. Open the file:

hermes config edit

or edit the memory file directly (/opt/data/memory.md in my setup). Add a line like:

When I explicitly ask for a short from a URL ("make a short from <url>", "run the script agent on <url>"), run the master runbook end to end, pausing only for my take, my hook pick, and my approval. Pull the transcript with the Apify Instagram scraper through Composio, then Whisper, never the caption. File the approved script as a Ready-to-Record row in my content sheet.

4. Restart and do a dry run:

hermes gateway restart

Now open Telegram and send your bot, exactly:

make a short from https://www.instagram.com/reel/ABC123/

(Use a real public reel URL.) Walk the whole loop: confirm it transcribes, shows you the idea, asks for your take, accepts a voice reply, sends six hooks, writes a script, and delivers a Doc. The first run is where you will find the one thing you mis-wired; usually a Composio toolkit you forgot to connect, which the agent will name in plain English. Fix it, run again.

When a full run finishes with a Google Doc and a new "Ready to Record" row in your sheet, you are done. You have a phone-first scripting agent that turns any reel into a script in your voice.

12. Appendix

The repository to start from

Structure your GitHub repo like this. It is the "brain" your server clones:

your-repo/
  system-prompts/
    01-idea-analysis.md       # EXTRACT
    01b-spin-merge.md         # SPIN MERGE
    02-hook-generation.md     # HOOKS
    03-script-generation.md   # SCRIPT
    04-gate.md                # GATE
  schema/
    brief.schema.json         # the strict brief schema
  swipe-index/
    build-index.py            # builds the compact index
    retrieve.py               # diversity-first shortlist (no AI)
    hook-enrichment.json      # the offline template-or-null + shape_spec per swipe
  swipe-library/
    T01-....md ... T100-....md # your analyzed viral videos
  gate/
    voice-dna.md              # your positive voice identity
    writing-rules.md          # your AI-tell avoidance rules
    niche-profile.md          # your pillars + assets
  jalen/                      # the run instructions (the runbook + per-step flows)

All five system prompts are in section 7 above, ready to copy. The swipe-file format is in section 6. This structure is also what makes the whole thing reusable: to run it for a different person or brand, you swap the swipe library, the voice rules, and the niche profile; the engine and prompts stay identical.

Config values that matter

SettingWhat it does
platforms.telegram.enabled trueturns the Telegram interface on
telegram.allow_from <YOUR_USER_ID>only you can command the bot
display.tool_progress falsequiets raw tool output in chat
display.show_reasoning falsehides the model's raw reasoning
OPENAI_API_KEYmodel + Whisper transcription
Composio MCP serverGoogle Sheets/Drive/Docs + Apify tools

Hermes works with many model providers, so you are not locked to OpenAI. The full list (OpenRouter, Anthropic, Google Gemini, DeepSeek, and more) is in the Hermes providers docs.

Troubleshooting

  • The bot doesn't reply. Send /start to the bot, then hermes gateway restart and hermes gateway status. Check the log: grep -i "telegram\|connected\|error" ~/.hermes/logs/gateway.log | tail -40.
  • hermes: command not found right after install. Close and reopen your SSH session so the new path loads, then retry.
  • A tool isn't available. Run hermes mcp test composio and confirm the toolkit you need is connected in the Composio dashboard. Re-add with hermes mcp add if needed.
  • It transcribed the caption instead of the audio. It should refuse to. If it ever does, your run instruction is missing the "never the caption" rule from section 11. Add it.
  • The reply to "what's your take?" got logged as a new idea. That is the run-state bug from section 8. Make sure your runbook tells the agent to hold an explicit state during the spin pause and check it before any idea-capture step.
  • Config changes didn't take effect. Most config is read at startup. In the gateway, run hermes gateway restart; in a CLI session, exit and relaunch.
  • General health check. hermes doctor flags missing dependencies and config; hermes status shows component status.

Built with Hermes Agent by Nous Research. Inspired by the way Melda thinks about short-form. The hard part was never the code; it was deciding what the agent should refuse to do.