Durable retries
Configure exponential backoff, custom retry predicates, and per-error policies. A 500 from OpenAI no longer wakes up your on-call rotation at 4 a.m.
Trigger.dev runs your background jobs and AI workflows in plain TypeScript — with durable retries, fair queues, cron without timeouts, and traces engineers actually read.
No DSL. No YAML. No separate workflow language to maintain. Your tasks live in your repo, deploy with one CLI command, and stay in lock-step with the code that calls them.
Built-in idempotency keys mean duplicate triggers (webhooks, retries, Stripe events) never produce duplicate side effects.
Runs that started on v3.4.1 finish on v3.4.1. Mid-flight code changes can’t corrupt long-running workflows.
npx trigger.dev@latest deploy ships your tasks. No separate infra service to maintain alongside your app.
// tasks/ai-research-agent.ts import { task, retry } from "@trigger.dev/sdk/v3"; export const aiResearchAgent = task({ id: "ai-research-agent", retry: { maxAttempts: 3, factor: 2 }, queue: { concurrencyLimit: 50 }, run: async (payload: { brief: string }) => { const sources = await retry.onThrow( () => tavily.search(payload.brief) ); const notes = await openai.extract(sources); const summaries = await batch(notes, summarise); await notion.append({ pageId, summaries }); await slack.post({ channel: "#research" }); }, });
Everything teams reach for the moment a job has to outlive a serverless request — bundled, versioned, and observable from the same dashboard.
Configure exponential backoff, custom retry predicates, and per-error policies. A 500 from OpenAI no longer wakes up your on-call rotation at 4 a.m.
Durable schedules that run for hours, not ten minutes. Cron syntax, lives in your repo, traced in the same UI as the rest of your event-driven tasks.
Per-tenant concurrency limits, priority lanes, and fairness across customers — so a single noisy account can’t starve the rest of your fleet.
Every step, every argument, every retry, every log line streamed live to your dashboard. Diff two runs side-by-side to find the regression in seconds.
Tool-calling chains that survive 30-minute LLM reasoning loops. Stateful, resumable, expressible in plain async/await — no DAG language to learn.
Email, Slack, PagerDuty, webhook. Filter by task, version, deploy, or error class — so the noisy ones never make it through to your phone.
Set a concurrency limit per tenant, per plan, or per priority lane. Trigger.dev guarantees one customer’s spike won’t starve the others — without you writing a single line of throttling logic.
Every run emits OpenTelemetry spans by default. Pipe them into Datadog, Honeycomb, or Grafana and keep the dashboards your team already trusts.
Apache 2.0 core. Run it on your own Kubernetes, or stay on managed cloud — same SDK, same dashboard, same SLO targets.
Open a run, scrub the timeline, jump to the failing span. The fastest path from “something looks off” to “we found it.”
A side-by-side with the DIY stack most teams reach for first — cron + Redis + BullMQ + Sentry — before they realise the maintenance debt.
Run the free tier in production. Move up only when concurrency, retention, or volume calls for it. Self-host the open-source core forever, on us.
Audited monthly. Public status page back to 2024.
Median time from trigger() to worker pick-up, measured eu-west-2.
Single attempt cap on Scale. Resumable indefinitely via checkpoints.
GDPR-aligned, ISO 27001 controls, HIPAA available.
BullMQ is a great primitive — you still have to build the orchestration on top: deployments, versioning, a UI, alerts, fair queues, retention, multi-region failover. Trigger.dev is that whole layer, plus a TypeScript SDK that lets you express multi-step jobs without inventing a state machine.
Yes. The OSS core is Apache 2.0 and ships with a Helm chart. You can run the entire platform on your Kubernetes, keep the data on your side, and pay nothing — or move to managed cloud later with zero code changes. Several Series-B teams run self-hosted in production today.
Runs pin to the SDK version they started on. A 90-minute job that began on v3.4.1 will finish on v3.4.1, even if you ship three new deploys while it’s in flight. New runs pick up the latest version. No mid-flight corruption, no “why is half my queue using last week’s schema?”
Both. The hard upper bound for a single attempt is 24 hours on Scale, 1 hour on Pro. With Trigger.dev’s checkpointing, a job that lives longer than its attempt window is automatically suspended and resumed across infrastructure boundaries. Customers run multi-day workflows in production.
Pay only for runs you execute. No per-seat charges — every plan includes unlimited team members. The Pro tier covers 100k runs / month; beyond that you pay a per-run rate that drops as volume increases. Scale plans get committed-use discounts and dedicated capacity lanes.
Yes. Every run emits OTel spans by default. We provide first-party exporters for Datadog, Honeycomb, New Relic, Grafana Tempo and any OTLP-compatible collector. You keep the dashboards your team already trusts — we add the span data they were missing.
First-class TypeScript and JavaScript SDK (Node, Bun, Deno). The platform also exposes a runtime API so you can invoke tasks from any language. Build extensions let you bundle Python scripts, Prisma, FFmpeg, Puppeteer and apt packages directly into your task environment.
An agent task is just a regular task that calls an LLM with tools. Trigger.dev gives it the durability the rest of your tasks have — per-tool retries, span-level tracing, idempotent tool calls, structured outputs, and resumable conversations that outlive any single request. No DAG language to learn, no separate orchestration runtime.
SOC 2 Type II audited annually, GDPR-aligned, ISO 27001 controls in place. HIPAA and FedRAMP are available under Scale agreements. EU customers can pin all data to eu-west-2; US customers to us-east-1. VPC peering and bring-your-own-cloud are available on Scale.
Yes. Our solutions team has migration playbooks for each. The TypeScript surface is intentionally close enough that an Inngest function ports in 10-15 minutes. Temporal workflows take longer because of the durable-promise model — we cover both patterns in our migration docs.
Trigger jobs from a webhook handler in your app (one HTTP call) and let the platform absorb the burst. Fair queues guarantee no single tenant can starve your fleet. We’ve seen 50k+ events / minute land cleanly with linear horizontal scaling on Scale plans.
Yes — 10,000 runs / month, 5 concurrent, 24h retention, indefinitely. Plus the entire OSS core is Apache 2.0, free to self-host with no functional limits. Most teams keep their staging environments on the free tier permanently.
Ship your first durable task in under five minutes. Free until you scale. No credit card. No sales call.