Most AI chatbots live on web widgets that get ignored. The ones people actually use show up where they already are — and on iPhone, that's iMessage. Blue bubbles get opened, replied to, and screenshotted. They feel like a person.
The good news is, putting GPT-4o, Claude, or any other model on iMessage is mechanically simple. The hard part is making the experience feel native at scale — typing indicators that fire at the right moment, replies that arrive in human-sized chunks, conversations that remember context, and webhooks that don't double-send when something retries.
This post walks through the full architecture for an iMessage AI bot in 2026 — first the minimum viable version, then the six production patterns that separate a demo from a product.
The architecture in one diagram
Every iMessage AI bot — ours, Sendblue's, OpenClaw, every internal one we've seen — boils down to the same five hops:
- User sends iMessage → lands on a Blooio-managed Apple ID
- Blooio webhook → POSTs
message.receivedto your server - Your server → reads conversation history, calls the LLM
- LLM response → your server posts back to Blooio's REST API
- Blooio delivery → message lands in the user's Messages app as a native blue bubble
The whole loop is stateless except for the conversation store. You never need a Mac, you never touch Apple's infrastructure, and Blooio absorbs all the iMessage delivery, retry, and read-receipt mechanics.
The 50-line version
Here's the minimum viable bot. Webhook in, Claude in the middle, iMessage out.
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const BLOOIO_TOKEN = process.env.BLOOIO_TOKEN!;
export async function POST(req: Request) {
const event = await req.json();
if (event.type !== "message.received") return new Response("ok");
const { from, body, conversation_id } = event.data;
const reply = await anthropic.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 400,
messages: [{ role: "user", content: body }],
});
const text = reply.content[0].type === "text" ? reply.content[0].text : "";
await fetch("https://backend.blooio.com/v2/messages", {
method: "POST",
headers: {
"content-type": "application/json",
"authorization": `Bearer ${BLOOIO_TOKEN}`,
},
body: JSON.stringify({ to: from, body: text }),
});
return new Response("ok");
}Deploy that to a Cloudflare Worker, Vercel function, or any HTTPS endpoint, register it as a webhook in the Blooio dashboard, and you have a working iMessage AI bot. Text your Blooio number — Claude replies.
That's it for "does it work". Now the hard part.
Production pattern 1: Typing indicators
A bare reply lands instantly. That feels robotic — humans don't compose 400 tokens in 80ms. Fire a typing indicator the moment the webhook arrives, then let the LLM call complete. Blooio's typing endpoint stops automatically when you send the next real message:
await fetch("https://backend.blooio.com/v2/typing", {
method: "POST",
headers: { "authorization": `Bearer ${BLOOIO_TOKEN}`, "content-type": "application/json" },
body: JSON.stringify({ to: from, typing: true }),
});The result is a UX that mirrors a real human: tap to send, the "…" indicator appears, then the reply lands. Engagement in our customer base goes up materially on bots that do this versus ones that don't.
Production pattern 2: Message splitting
LLM replies are often three paragraphs in one block. iMessage culturally is short, multi-bubble. A single 600-character bubble screams chatbot. A response delivered as three 200-char bubbles, each separated by a short delay, reads like a person typing.
Split on sentence boundaries, cap at 280 chars per bubble, then send sequentially with a 700–1200ms jitter between sends. Combined with typing indicators between sends, the bot feels alive.
function splitReply(text: string, maxChars = 280): string[] {
const sentences = text.match(/[^.!?]+[.!?]?/g) ?? [text];
const bubbles: string[] = [];
let current = "";
for (const s of sentences) {
if ((current + s).length > maxChars && current) {
bubbles.push(current.trim());
current = s;
} else {
current += s;
}
}
if (current.trim()) bubbles.push(current.trim());
return bubbles;
}Production pattern 3: Conversation memory
The 50-line version sends a single message to Claude with no history. That works for one-shot Q&A and nothing else. Real bots need state.
The simplest store is Postgres or Cloudflare D1: one row per conversation keyed by phone number, holding the rolling message list. Cap it at the last 20 messages (or ~8k tokens) to keep costs predictable.
const history = await db.query(
"SELECT role, content FROM messages WHERE phone = $1 ORDER BY created_at DESC LIMIT 20",
[from],
);
const reply = await anthropic.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 400,
system: "You are a helpful assistant texting on iMessage. Keep replies short and casual.",
messages: [...history.reverse(), { role: "user", content: body }],
});For longer-running agents — anything past ~50 messages — switch to summarised memory: keep the last 10 verbatim and replace older turns with a rolling summary the model regenerates every ~20 turns.
Production pattern 4: Idempotent webhook handling
Every webhook system retries on a non-2xx response. Blooio retries up to 3 times with exponential backoff. If your handler crashes after replying but before returning 200, the user gets the reply twice.
Dedupe on event.id before any side effect:
const inserted = await db.query(
"INSERT INTO webhook_log (event_id) VALUES ($1) ON CONFLICT DO NOTHING RETURNING event_id",
[event.id],
);
if (inserted.rowCount === 0) return new Response("ok"); // already processedRun this first, before the LLM call. It's the single biggest source of "the bot sent me the same thing four times" complaints we hear.
Production pattern 5: Read receipts and presence
Reading the user's message before replying makes the interaction feel real. iMessage's read receipts (the small "Read 9:41 AM" label) only show if the recipient has them enabled, but Blooio gives you the inbound signal regardless. Mark messages as read on the server side as soon as you start processing:
await fetch(`https://backend.blooio.com/v2/messages/${event.data.message_id}/read`, {
method: "POST",
headers: { "authorization": `Bearer ${BLOOIO_TOKEN}` },
});Pair it with typing indicators and you get the full human-feel sequence: ✓ read → "…" → bubble → bubble → bubble.
Production pattern 6: Image and voice handling
Modern users send screenshots. Modern models can read them. If your bot ignores attachments, the whole experience collapses the moment someone sends a receipt or a meme.
Blooio webhooks include attachment URLs alongside the text body. Pipe the image straight into a multi-modal model:
const content: Anthropic.MessageParam["content"] = [];
if (body) content.push({ type: "text", text: body });
for (const attachment of event.data.attachments ?? []) {
if (attachment.mime_type?.startsWith("image/")) {
content.push({
type: "image",
source: { type: "url", url: attachment.url },
});
}
}
const reply = await anthropic.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 400,
messages: [{ role: "user", content }],
});For voice memos, transcribe with Whisper or Gemini's native audio first, then prepend the transcript to the text message.
Claude MCP: when the agent is the bot
Everything above assumes you're writing the orchestration code. In 2026 there's a second pattern: the agent itself drives the API through the Model Context Protocol.
Blooio runs a hosted MCP server at https://mcp.blooio.com/v4. Add it to Claude Desktop, Claude Code, Cursor, or any MCP client and the model gets send_message, list_conversations, and read_messages as native tools. Now you can tell Claude "text my brother and let him know I'll be 20 minutes late" and it does — through your real iMessage thread, blue bubble and all.
For developer agents this collapses the bot into one prompt. For consumer bots, the webhook pattern above still wins because you control the orchestration loop.
Real-world: 139K+ AI iMessages on Blooio
This isn't theoretical. Aneu runs a consumer-facing Social AI that lives entirely inside iMessage. The architecture is exactly what's described above — Blooio webhook in, Claude in the middle, Blooio out — with all six production patterns layered on.
In its first 60 days the bot delivered 139,000+ messages with 99.96% delivery reliability and absorbed 568% month-over-month conversation growth without a single infrastructure change. Users complete entire onboarding flows, ask questions, and share images inside iMessage threads they already use for friends and family. The retention difference versus the same product on a web widget was, in Aneu's words, "not even close."
Frequently asked questions
- Do I need a Mac to run an iMessage AI bot?
- No. Blooio handles all the iMessage infrastructure on its own Mac fleet, so your server only needs to be a public HTTPS endpoint. You can deploy on Cloudflare Workers, Vercel, AWS Lambda, or a single VPS — no Apple hardware required.
- Is iMessage subject to A2P 10DLC registration?
- No. 10DLC applies to SMS sent over US carrier networks. iMessage travels over Apple's infrastructure end-to-end, so your AI chatbot bypasses carrier filtering and the multi-week 10DLC registration process entirely. You still have to follow Apple's anti-spam guidelines.
- Which LLM works best for iMessage bots?
- Any chat-capable model works. Claude Sonnet 4.5 is our default for tone and instruction-following in short replies; GPT-4o is strong on tool-calling; Gemini 2.0 Flash is the cheapest at scale. For image-heavy conversations, all three handle inline image inputs.
- How do I handle conversation memory at scale?
- Store the last 20 messages per phone number verbatim. Past that, replace older messages with a rolling summary regenerated every 20 turns. This caps token cost while keeping the bot aware of the full conversation arc. Postgres, D1, or any KV store works.
- How do I prevent double-replies on webhook retries?
- Dedupe on the webhook event ID before any side effect. Insert the event ID into a deduped table (Postgres ON CONFLICT or Redis SETNX) and only proceed if the insert was new. Run this check first — before the LLM call, not after.
- Can the bot read images and screenshots?
- Yes. Blooio webhooks include attachment URLs alongside the message body. Pass them to a multi-modal model like Claude Sonnet 4.5 or GPT-4o as image_url content parts and the model will see what the user sent.
- What does this cost to run?
- Blooio pricing starts at $29/month for a dedicated iMessage number. LLM cost is typically $0.001–$0.02 per conversation turn depending on the model and history length. A bot handling 1,000 conversations per day with Claude Sonnet runs roughly $20–$60/day in LLM spend.
Ship your iMessage AI bot this weekend
Get a Blooio number, plug in your LLM, and start handling real conversations in blue bubbles by Monday.
Start free trialWhat we'd build differently in 2026
If we were starting an iMessage AI product from scratch today: lean harder on Claude MCP for any developer-facing use case (it removes 80% of the glue code), default to multi-modal from day one (users send screenshots constantly), and treat the conversation store as a first-class product surface — not a Postgres table buried behind the LLM.
iMessage is no longer the experimental channel. It's the one users open first, reply to fastest, and trust the most. Build for that, and the bot wins on engagement before the model even matters.

