Do I need a Mac to run an iMessage AI bot?

No. Blooio handles all the iMessage infrastructure on its own Mac fleet, so your server only needs to be a public HTTPS endpoint. You can deploy on Cloudflare Workers, Vercel, AWS Lambda, or a single VPS — no Apple hardware required.

Is iMessage subject to A2P 10DLC registration?

No. 10DLC applies to SMS sent over US carrier networks. iMessage travels over Apple's infrastructure end-to-end, so your AI chatbot bypasses carrier filtering and the multi-week 10DLC registration process entirely. You still have to follow Apple's anti-spam guidelines.

Which LLM works best for iMessage bots?

Any chat-capable model works. Claude Sonnet 4.5 is our default for tone and instruction-following in short replies; GPT-4o is strong on tool-calling; Gemini 2.0 Flash is the cheapest at scale. For image-heavy conversations, all three handle inline image inputs.

How do I handle conversation memory at scale?

Store the last 20 messages per phone number verbatim. Past that, replace older messages with a rolling summary regenerated every 20 turns. This caps token cost while keeping the bot aware of the full conversation arc. Postgres, D1, or any KV store works.

How do I prevent double-replies on webhook retries?

Dedupe on the webhook event ID before any side effect. Insert the event ID into a deduped table (Postgres ON CONFLICT or Redis SETNX) and only proceed if the insert was new. Run this check first — before the LLM call, not after.

Can the bot read images and screenshots?

Yes. Blooio webhooks include attachment URLs alongside the message body. Pass them to a multi-modal model like Claude Sonnet 4.5 or GPT-4o as image_url content parts and the model will see what the user sent.

What does this cost to run?

Blooio pricing starts at $29/month for a dedicated iMessage number. LLM cost is typically $0.001–$0.02 per conversation turn depending on the model and history length. A bot handling 1,000 conversations per day with Claude Sonnet runs roughly $20–$60/day in LLM spend.

Can I use the same code for RCS and SMS fallback?

Yes. Blooio's REST API accepts the same outbound message payload for iMessage, RCS, and SMS — the API picks the best available channel per recipient. Your bot logic doesn't change; the bubble color does.

How to Build an iMessage AI Bot in 2026 (Step-by-Step)

Most AI chatbots live on web widgets that get ignored. The ones people actually use show up where they already are — and on iPhone, that's iMessage. Blue bubbles get opened, replied to, and screenshotted. They feel like a person.

The good news is, putting GPT-4o, Claude, or any other model on iMessage is mechanically simple. The hard part is making the experience feel native at scale — typing indicators that fire at the right moment, replies that arrive in human-sized chunks, conversations that remember context, and webhooks that don't double-send when something retries.

This post walks through the full architecture for an iMessage AI bot in 2026 — first the minimum viable version, then the six production patterns that separate a demo from a product.

The architecture in one diagram

Every iMessage AI bot — ours, Sendblue's, OpenClaw, every internal one we've seen — boils down to the same five hops:

User sends iMessage → lands on a Blooio-managed Apple ID

Blooio webhook → POSTs message.received to your server

Your server → reads conversation history, calls the LLM

LLM response → your server posts back to Blooio's REST API

Blooio delivery → message lands in the user's Messages app as a native blue bubble

The whole loop is stateless except for the conversation store. You never need a Mac, you never touch Apple's infrastructure, and Blooio absorbs all the iMessage delivery, retry, and read-receipt mechanics.

The 50-line version

Here's the minimum viable bot. Webhook in, Claude in the middle, iMessage out.

TypeScript

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const BLOOIO_TOKEN = process.env.BLOOIO_TOKEN!;

export async function POST(req: Request) {
  const event = await req.json();
  if (event.type !== "message.received") return new Response("ok");

  const { from, body, conversation_id } = event.data;

  const reply = await anthropic.messages.create({
    model: "claude-sonnet-4-5",
    max_tokens: 400,
    messages: [{ role: "user", content: body }],
  });

  const text = reply.content[0].type === "text" ? reply.content[0].text : "";

  await fetch("https://backend.blooio.com/v2/messages", {
    method: "POST",
    headers: {
      "content-type": "application/json",
      "authorization": `Bearer ${BLOOIO_TOKEN}`,
    },
    body: JSON.stringify({ to: from, body: text }),
  });

  return new Response("ok");
}

Deploy that to a Cloudflare Worker, Vercel function, or any HTTPS endpoint, register it as a webhook in the Blooio dashboard, and you have a working iMessage AI bot. Text your Blooio number — Claude replies.

That's it for "does it work". Now the hard part.

Production pattern 1: Typing indicators

A bare reply lands instantly. That feels robotic — humans don't compose 400 tokens in 80ms. Fire a typing indicator the moment the webhook arrives, then let the LLM call complete. Blooio's typing endpoint stops automatically when you send the next real message:

TypeScript

await fetch("https://backend.blooio.com/v2/typing", {
  method: "POST",
  headers: { "authorization": `Bearer ${BLOOIO_TOKEN}`, "content-type": "application/json" },
  body: JSON.stringify({ to: from, typing: true }),
});

The result is a UX that mirrors a real human: tap to send, the "…" indicator appears, then the reply lands. Engagement in our customer base goes up materially on bots that do this versus ones that don't.

Production pattern 2: Message splitting

LLM replies are often three paragraphs in one block. iMessage culturally is short, multi-bubble. A single 600-character bubble screams chatbot. A response delivered as three 200-char bubbles, each separated by a short delay, reads like a person typing.

Split on sentence boundaries, cap at 280 chars per bubble, then send sequentially with a 700–1200ms jitter between sends. Combined with typing indicators between sends, the bot feels alive.

TypeScript

function splitReply(text: string, maxChars = 280): string[] {
  const sentences = text.match(/[^.!?]+[.!?]?/g) ?? [text];
  const bubbles: string[] = [];
  let current = "";
  for (const s of sentences) {
    if ((current + s).length > maxChars && current) {
      bubbles.push(current.trim());
      current = s;
    } else {
      current += s;
    }
  }
  if (current.trim()) bubbles.push(current.trim());
  return bubbles;
}

Production pattern 3: Conversation memory

The 50-line version sends a single message to Claude with no history. That works for one-shot Q&A and nothing else. Real bots need state.

The simplest store is Postgres or Cloudflare D1: one row per conversation keyed by phone number, holding the rolling message list. Cap it at the last 20 messages (or ~8k tokens) to keep costs predictable.

TypeScript

const history = await db.query(
  "SELECT role, content FROM messages WHERE phone = $1 ORDER BY created_at DESC LIMIT 20",
  [from],
);

const reply = await anthropic.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 400,
  system: "You are a helpful assistant texting on iMessage. Keep replies short and casual.",
  messages: [...history.reverse(), { role: "user", content: body }],
});

For longer-running agents — anything past ~50 messages — switch to summarised memory: keep the last 10 verbatim and replace older turns with a rolling summary the model regenerates every ~20 turns.

Production pattern 4: Idempotent webhook handling

Every webhook system retries on a non-2xx response. Blooio retries up to 3 times with exponential backoff. If your handler crashes after replying but before returning 200, the user gets the reply twice.

Dedupe on event.id before any side effect:

TypeScript

const inserted = await db.query(
  "INSERT INTO webhook_log (event_id) VALUES ($1) ON CONFLICT DO NOTHING RETURNING event_id",
  [event.id],
);
if (inserted.rowCount === 0) return new Response("ok"); // already processed

Run this first, before the LLM call. It's the single biggest source of "the bot sent me the same thing four times" complaints we hear.

Production pattern 5: Read receipts and presence

Reading the user's message before replying makes the interaction feel real. iMessage's read receipts (the small "Read 9:41 AM" label) only show if the recipient has them enabled, but Blooio gives you the inbound signal regardless. Mark messages as read on the server side as soon as you start processing:

TypeScript

await fetch(`https://backend.blooio.com/v2/messages/${event.data.message_id}/read`, {
  method: "POST",
  headers: { "authorization": `Bearer ${BLOOIO_TOKEN}` },
});

Pair it with typing indicators and you get the full human-feel sequence: ✓ read → "…" → bubble → bubble → bubble.

Production pattern 6: Image and voice handling

Modern users send screenshots. Modern models can read them. If your bot ignores attachments, the whole experience collapses the moment someone sends a receipt or a meme.

Blooio webhooks include attachment URLs alongside the text body. Pipe the image straight into a multi-modal model:

TypeScript

const content: Anthropic.MessageParam["content"] = [];
if (body) content.push({ type: "text", text: body });
for (const attachment of event.data.attachments ?? []) {
  if (attachment.mime_type?.startsWith("image/")) {
    content.push({
      type: "image",
      source: { type: "url", url: attachment.url },
    });
  }
}

const reply = await anthropic.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 400,
  messages: [{ role: "user", content }],
});

For voice memos, transcribe with Whisper or Gemini's native audio first, then prepend the transcript to the text message.

Claude MCP: when the agent is the bot

Everything above assumes you're writing the orchestration code. In 2026 there's a second pattern: the agent itself drives the API through the Model Context Protocol.

Blooio runs a hosted MCP server at https://mcp.blooio.com/v4. Add it to Claude Desktop, Claude Code, Cursor, or any MCP client and the model gets send_message, list_conversations, and read_messages as native tools. Now you can tell Claude "text my brother and let him know I'll be 20 minutes late" and it does — through your real iMessage thread, blue bubble and all.

For developer agents this collapses the bot into one prompt. For consumer bots, the webhook pattern above still wins because you control the orchestration loop.

Real-world: 139K+ AI iMessages on Blooio

This isn't theoretical. Aneu runs a consumer-facing Social AI that lives entirely inside iMessage. The architecture is exactly what's described above — Blooio webhook in, Claude in the middle, Blooio out — with all six production patterns layered on.

In its first 60 days the bot delivered 139,000+ messages with 99.96% delivery reliability and absorbed 568% month-over-month conversation growth without a single infrastructure change. Users complete entire onboarding flows, ask questions, and share images inside iMessage threads they already use for friends and family. The retention difference versus the same product on a web widget was, in Aneu's words, "not even close."

Frequently asked questions

Do I need a Mac to run an iMessage AI bot?: No. Blooio handles all the iMessage infrastructure on its own Mac fleet, so your server only needs to be a public HTTPS endpoint. You can deploy on Cloudflare Workers, Vercel, AWS Lambda, or a single VPS — no Apple hardware required.
Is iMessage subject to A2P 10DLC registration?: No. 10DLC applies to SMS sent over US carrier networks. iMessage travels over Apple's infrastructure end-to-end, so your AI chatbot bypasses carrier filtering and the multi-week 10DLC registration process entirely. You still have to follow Apple's anti-spam guidelines.
Which LLM works best for iMessage bots?: Any chat-capable model works. Claude Sonnet 4.5 is our default for tone and instruction-following in short replies; GPT-4o is strong on tool-calling; Gemini 2.0 Flash is the cheapest at scale. For image-heavy conversations, all three handle inline image inputs.
How do I handle conversation memory at scale?: Store the last 20 messages per phone number verbatim. Past that, replace older messages with a rolling summary regenerated every 20 turns. This caps token cost while keeping the bot aware of the full conversation arc. Postgres, D1, or any KV store works.
How do I prevent double-replies on webhook retries?: Dedupe on the webhook event ID before any side effect. Insert the event ID into a deduped table (Postgres ON CONFLICT or Redis SETNX) and only proceed if the insert was new. Run this check first — before the LLM call, not after.
Can the bot read images and screenshots?: Yes. Blooio webhooks include attachment URLs alongside the message body. Pass them to a multi-modal model like Claude Sonnet 4.5 or GPT-4o as image_url content parts and the model will see what the user sent.
What does this cost to run?: Blooio pricing starts at $29/month for a dedicated iMessage number. LLM cost is typically $0.001–$0.02 per conversation turn depending on the model and history length. A bot handling 1,000 conversations per day with Claude Sonnet runs roughly $20–$60/day in LLM spend.

Ship your iMessage AI bot this weekend

Get a Blooio number, plug in your LLM, and start handling real conversations in blue bubbles by Monday.

Start free trial

What we'd build differently in 2026

If we were starting an iMessage AI product from scratch today: lean harder on Claude MCP for any developer-facing use case (it removes 80% of the glue code), default to multi-modal from day one (users send screenshots constantly), and treat the conversation store as a first-class product surface — not a Postgres table buried behind the LLM.

iMessage is no longer the experimental channel. It's the one users open first, reply to fastest, and trust the most. Build for that, and the bot wins on engagement before the model even matters.