Build vs. buy · A field report from year six

What it actually costs to build your own virtual classroom.

We've been building one since 2020. Ten million sessions later, here's the math nobody runs before they start — and the long tail of work nobody warns you about until you've already hired the team.

Run the math yourself See what's actually in the stack

No marketing tricks. No fake totals. Numbers below are sourced and dated; assumptions are disclosed.

TL;DRUpdated May 1, 2026 · By Ayush Agrawal, CEO & Co-founder, Pencil Learning Technologies

Should you build or buy a virtual classroom platform in 2026? Almost always: buy.

Building a production-grade virtual classroom — one that runs at 99.99% reliability across every browser, every device, and every school network, with the compliance posture a district will sign off on — costs ~$30M over three years with a 50-person engineering team, across 80 distinct production components. The hard parts aren't the WebRTC or the SFU; they're SOC 2 Type II evidence (which has to age in time), native iOS and Android apps, the testing matrix across browsers and devices, and 24×7 on-call. Even Class Technologies, after raising $160M+ in venture funding, built their UX on top of Zoom rather than build their own real-time video.

Pencil Spaces, by comparison, handles this from $3,000/month on Scale (and from $6/month on our entry tier) — for the same SFU, whiteboard, recording, transcription, AI summaries, native mobile, scheduling, identity, and compliance posture. We've been operating it in production since 2020, with 10M+ sessions delivered across customers in 60+ countries at 99.95% measured uptime.

Read the full math: the $30M anchor · the 80-component stack · the 50-person team · infrastructure economics · the CPaaS / BBB shortcut math · the interactive calculator.

Total cost of build · Three-year horizonBay Area · 99.99% reliability · 2026

$30M

USD 30,000,000 · estimated, methodology disclosed below

To stand up a virtual classroom platform that's roughly comparable to what you can buy from us today — running at 99.99% reliability across every browser, every device, every school network. Three years. Fifty people you'll need to hire and retain. Infrastructure that bills by the second. Compliance that takes longer to pass than your cap table takes to close.

Time

36 mo

From kickoff to a production-grade platform

Headcount

50 FTE

Engineering, design, security, support — loaded by Year 3

Surface area

Components in the production stack you can't skip

Annual burn (Y3+)

$16M

Ongoing burn from year three onward, every year

Source: Pencil Learning Technologies operational data and field analysis, 2020–2026. Compensation bands per Levels.fyi (SF Bay Area, accessed May 2026). Full methodology in §02–§04.

First · the question we get most

"But with Cursor and APIs, can't I just vibe-code this?"

We get this email a couple of times a month. Almost always from a smart engineer who just shipped a polished prototype over a weekend — Cursor in one window, a managed video SDK, a transcription API, a CRDT library, an auth provider. It works. They feel ~80% done.

We've been that engineer. So have a lot of our customers, before they became our customers. So this isn't a lecture — it's the thing we wish someone had said to us in 2020.

What AI & APIs actually shorten

The prototype. A working four-person room with a shared whiteboard is genuinely a weekend.

The boilerplate. Auth flows, settings, billing UI, marketing site — fast.

The integration glue. Wiring four services together is something Claude does well.

The first drafts. Tests, docs, runbooks. Quality is good.

−

What AI & APIs cannot shorten

The compliance posture. APIs are SOC 2 vendors; you are still the auditee. Your auditor signs off after evidence operates for nine months. AI doesn't compress time.

The mobile binaries. Two app stores, two audio stacks, two device-fragmentation taxes. There's no API that ships native apps for you.

The composition tax. Every API you use is a vendor whose SLA, pricing, and roadmap you don't control. At scale you accept the lock-in or you rebuild.

The integration surface. Five APIs is twenty integration edges — error states, version drift, retry storms, billing reconciliation when one is down.

The on-call. When an SFU pod thrashes at 4pm Eastern, the pager doesn't get answered by Claude. A human reads logs.

The procurement cycle. A district CTO accepts the auditor's letter, not your confidence. Your prototype isn't in the procurement packet.

You can prototype this in a weekend. You can't ship it in a year.You can compose the parts. You can't compose the operations underneath them.

If you've read that and still feel certain your team is the exception, run the calculator below with your numbers. We'll happily lose the argument if your model produces a smaller number than ours — and if it does, we'd genuinely like to hear about it. We learn from the engineers who pull this off; they're rare, but they exist, and they're not the buyers we're talking to here.

§ 01 · The estimation gap

What founders think they need vs. what ships.

The first — and largest — cost overrun on every internal video build is the gap between the Friday-afternoon whiteboard estimate and the actual surface area of a production system.

The whiteboard estimate

"How hard could it be?"

We'll grab an open-source video SDK, drop in a whiteboard, plug it into our LMS, and ship it in a quarter. Two engineers and a designer. Maybe $300K?

AWebRTC video calls (4–8 participants)

BA whiteboard component (we'll use tldraw)

CAuth and a Postgres database

DA scheduling page on top of Cal.com

ECall it a v1 and iterate from feedback

Not wrong about the components. Wrong about everything else — the percentage of work each one represents, the operational tax of running them, and the second-derivative cost of maintaining them once you have real users.

The shipping reality

What actually has to exist.

You can ship a working prototype in a quarter. You cannot ship a platform a tutoring program will sign a multi-year contract for, in less than three years.

01Real-time media: SFU cluster, signaling, TURN/STUN, fallbacks for restrictive networks

02Whiteboard sync: CRDT engine, conflict resolution, persistence, version history

03Cloud recording: compositor, encryption, multi-region storage, retention policy

04Live transcription, AI summaries, action item extraction — per call, at scale

05Mobile: native iOS, native Android, App Store review, OS-version regression

06SSO, SAML, FERPA contracts, SOC 2 Type II, COPPA review, GDPR, pen tests

07Observability, on-call rotation, status page, customer support tooling

08The seventy-two other things on the next page

The naive list is roughly 15% of the work. The other 85% is where startups die — quietly, six quarters in, after the founders have already told the board it's “almost done.”

§ 02 · The architecture you can't avoid

Eighty components. None of them optional.

These are the actual systems we run in production. Skip any one of them and you ship a prototype, not a platform. Most teams discover them after their first outage.

01 · Real-time media8

SFU cluster (managed or self-hosted)

Signaling server with reconnect semantics

TURN servers, geo-distributed

STUN, ICE, NAT traversal logic

Adaptive bitrate, simulcast, SVC

Echo cancellation, AGC, noise suppression

Active speaker detection

Bandwidth estimation, congestion control

02 · Whiteboard & co-creation7

Vector canvas engine, hardware-accelerated

CRDT-based real-time sync (Yjs, Automerge)

Conflict resolution & offline merge

Persistence, snapshots, time travel

Pressure-sensitive ink, palm rejection

PDF / image / equation embeds

Export pipeline (PDF, PNG, SVG)

03 · Recording & playback6

Server-side compositor (FFmpeg pipeline)

Multi-track audio mixing

Layout switching, screenshare overlay

Encrypted storage, signed playback URLs

Adaptive streaming player (HLS / DASH)

Retention, deletion, FERPA-compliant audit

04 · AI & transcription6

Streaming speech-to-text pipeline

Speaker diarization, multi-channel

Session summary & action items

Searchable transcript index

Real-time captions, multi-language

Coaching analysis, talk-time ratios

05 · Native mobile5

iOS app, Swift / SwiftUI, WebRTC bindings

Android app, Kotlin, libwebrtc, Pixel + Samsung + Xiaomi parity

App Store / Play Store review & appeals

Push notifications, deep links, calendar handoff

Background audio, CallKit, ConnectionService

06 · Identity & access5

OAuth, SAML, OIDC, WorkOS integration

SCIM provisioning, deprovisioning

Roles, permissions, multi-tenancy

Magic links, MFA, session management

Audit logs, exportable, immutable

07 · Scheduling & calendars5

Booking engine, availability, recurring rules

Two-way Google / Outlook / Apple calendar sync

Timezone, DST, RRULE correctness

Reminders: email, SMS, push, with delivery proof

No-show, reschedule, cancellation flows

08 · Compliance & security5

SOC 2 Type II audit, evidence collection

FERPA contracts & data flows

COPPA, GDPR, CCPA, state-specific student privacy

Encryption at rest, in transit, key rotation

Pen testing, vulnerability scanning, bug bounty

09 · Ops & observability4

Logs, metrics, traces (managed or self-hosted)

On-call rotation, runbooks, postmortems

Public status page, incident comms

Multi-region failover, DR drills

10 · Billing & admin3

Usage metering (per-user-hour, per-recording)

Stripe integration, taxes, invoices, dunning

Org admin: seats, billing, exports

11 · Support tooling3

Help center, in-app help, AI support agent

Session replay, debug bundle export

Customer-facing dashboards, exportable reports

12 · i18n & accessibility3

Localization framework, RTL, plurals

WCAG 2.1 AA, screen reader, keyboard parity

Live captioning surfaced through UI

13 · Trust & safety5

Recording disclosure & consent capture (parental, where required)

In-session reporting flows: report a participant, post-session reports

Content moderation: file-upload scanning, link safety, screenshare guardrails

Abuse detection: keyword filters, ML escalation, human review queue

Background-check integration for tutors (Checkr or equivalent)

14 · Integrations & rostering5

LTI 1.3: deep linking, Names & Roles, Assignment & Grade Service

Clever, ClassLink, Google Classroom rostering

OneRoster CSV + REST sync, district-by-district edge cases

Public API: rate limits, OAuth client management, SDK upkeep

Webhooks: signing, retries, dead-letter queue, replay

15 · Reporting & analytics5

Per-student & per-cohort engagement reporting

Parent reports: weekly recap, attendance, time-on-task

Admin dashboards: program health, tutor utilization, no-show rates

Scheduled report delivery (email, CSV, PDF)

Event pipeline → warehouse → dbt → BI tool

16 · Internal tooling & ops5

Feature flags, gradual rollouts, kill switches per tenant

Tenant impersonation (with audit log) for support

Migration tools: import from prior platform, rollback safety

Internal admin console: billing exceptions, refunds, manual provisioning

Tenant-level config: branding, retention policy, allowed integrations

distinct subsystems, each owned by an engineer who knows it deeply

components, none of which a tutoring program will pay for if you skip it

3–5x

average underestimation of timeline by founders we've talked to about building

100%

of teams who skipped #08 (compliance) at v1 told us they regretted it

§ 03 · The team you'll need to hire

Fifty people by year three.None of them junior.

What it actually takes to run a system at 99.99% reliability across every browser, every device, every school network. The asterisks aren't on whether you'll hire these roles — they're on whether you'll be able to retain them in a market where every one of these humans is being recruited weekly by every other company that's serious about real-time.

Loaded compensation reflects SF Bay Area senior IC ranges (Levels.fyi P75) plus ~1.4× loaded multiplier (employer taxes, benefits, equity, equipment, recruiting). We've left specific dollar figures off the table on purpose — they're public on Levels.fyi if you want them. Three-year payroll alone: ~$23M. And no, this isn't the team that ships your differentiated product. This is the team that just ships the plumbing underneath it.

§ 04 · The infrastructure bill

Per-hour economics, before you've added any feature.

Public list pricing from each vendor as of 2026. Assumes a modest scale of 10,000 user-hours per month after Year 1 — the rough size of a 200-tutor program running five hours a day. Bigger operations get worse ratios.

TURN bandwidth (relay fallback)~$0.40 / GB · managed TURN

When direct peer connections fail — school networks, corporate firewalls, symmetric NATs — media routes through TURN servers. ~15% of traffic in our experience. A four-person hour on TURN is roughly 540 MB.

10,000 user-hr/mo × 540 MB × 15% × $0.40/GB
=~$3,200 / mo

SFU compute & bandwidth~$0.004 / participant-min · managed SFU

Either you operate your own media-server cluster — which means an SRE on-call — or you buy a managed service. At 10K user-hours that's 600K participant-minutes minimum, and rises with multi-party.

10,000 user-hr × 60 min × $0.004
=~$2,400 / mo

Live transcription~$0.0043 / sec · streaming STT

Streaming speech-to-text for live captions and post-session transcripts. Billed per second of audio per channel. This is the line item that surprises every buyer.

10,000 hr × 3,600 sec × $0.0043
=~$155,000 / mo

Recording compute & storage~$0.50 / hr compute + $0.023 / GB / mo

Compositor needs dedicated CPU. Output averages ~600 MB per recorded hour. Storage compounds because most programs keep recordings for the academic year minimum.

10,000 hr × $0.50 + (rolling 60 TB × $0.023)
=~$6,400 / mo

AI session summaries~$0.05 / session · LLM API

Per-session summary, action items, talk-time analysis. Cheap per call. Adds up at volume, and the prompt engineering is real product work.

~3,300 sessions/mo × $0.05
=~$165 / mo

Whiteboard sync & persistence~$2K / mo · managed CRDT

CRDT sync as a managed service, or self-hosted on your own infrastructure (which costs an engineer instead). At scale, both options converge in price.

Pro tier + connection-based overage
=~$2,500 / mo

General cloud (AWS / GCP)Postgres, Redis, CDN, queues, KMS

Application servers, database (with read replica for compliance reporting), Redis, S3, CloudFront, WAF, Secrets Manager, Lambda, queues. Nothing fancy.

Conservative production budget at this scale
=~$15,000 / mo

Vendors & toolingemail, SMS, observability, error tracking, …

Transactional email, SMS, push notifications, observability, error tracking, status page, secrets vault, security scanning, CI/CD. Each one is small. Together they're a salary.

Combined SaaS line items
=~$8,000 / mo

in infrastructure alone before you ship a single differentiated feature. Roughly $2.3M/year — rising with usage, not falling. And this is at modest scale. Multiply by 10x when you sell to a school district.~$192,000/month

$2.3Minfrastructure / yr at 10K hrs/mo

§ 05 · The compliance tax

Things you can't ship without. Things that take a year.

Edtech buyers will not sign a contract without these. School districts have a procurement checklist; tutoring programs are inheriting it. Each line below has a real audit, a real lead time, and a real invoice.

Requirement

Why it exists, and what passing it actually means

Lead time

Year-1 cost

SOC 2 Type II

Controls audit covering security, availability, confidentiality. Type II requires a six- to twelve-month observation window, evidence of every control operating continuously. Every enterprise buyer asks for it.

9–12 months

$170,000

FERPA framework

Education records privacy law. Requires data flow mapping, retention policies, a Data Processing Agreement template, and counsel review. Districts will not sign without a FERPA-compliant DPA on letterhead.

6–8 weeks

$15,000

COPPA review

Children's online privacy. Required if any user is under 13, which in tutoring is most of them. Requires verifiable parental consent flows and counsel sign-off on data practices.

4–6 weeks

$10,000

GDPR + UK / EU DPA

Standard contractual clauses, transfer impact assessment, DPO appointment if you sell into Europe. Once you do, a single non-conforming export can cost you the account.

8–12 weeks

$25,000

State student-data laws

California SOPIPA, Illinois SOPPA, NY Ed Law 2-d, Texas SB 820, plus 30+ others. Each state has its own DPA template; vendors maintain a contract library.

Rolling

$30,000

External penetration test

Third-party security firm attempts to break into your system, files a report, you remediate, they retest. Many enterprise buyers want one within the last 12 months.

6 weeks + remediation

$35,000

Cyber insurance

Required by most enterprise contracts. Premiums depend on your data volumes and SOC 2 status. A real claim is rare; the certificate is the product.

2 weeks

$25,000

Privacy counsel retainer

Outside counsel for DPA negotiations, breach response readiness, contract red-lines, regulatory updates. You will need them more than you think.

Ongoing

$30,000

Year-one compliance cost

Plus ~400 engineering hours implementing controls, evidence pipelines, and audit automation.

Concurrent

~$340,000

§ 06 · Hidden costs & war stories

The bugs nobody warns you about.

Every one of these is a specific incident we, or peers running production WebRTC, have personally debugged. They are not in the architecture diagram. They are why real-time video has been called the hardest commodity-grade software to ship.

Safari, 2024

The getUserMedia race condition that only fired on iOS 17.4 with non-Apple Bluetooth headsets.

Audio device selection nondeterministically picked the wrong input. Reproduced in production for ~3% of mobile users. Took two senior engineers six weeks to root-cause — the bug was in WebKit, not our code. Workaround required redesigning our audio capture flow.

You don't fix this with a Stack Overflow answer. You fix it by building enough logging infrastructure to see one bad call out of every thirty.

Engineering cost: ~480 hours · Discovered: 14 months in

Android, ongoing

Echo cancellation behaves differently on every Samsung, Pixel, and Xiaomi device.

Hardware AEC quality varies by chipset. Some devices ship aggressive noise gates that cut quiet voices entirely. Some Xiaomi MIUI builds bypass standard audio APIs. Mid-range Samsung devices in India have tested differently than Samsung devices in Korea.

You ship a software AEC fallback. You build a device-specific config table. You buy twenty cheap Android phones and keep them on a shelf. You hire someone whose job is this table.

Hardware lab budget: $40K/yr · Permanent ownership: 0.5 FTE

Network edges

The codec negotiation that crashes when one user is on Firefox and another is on Safari 14.

H.264 vs. VP8 vs. VP9 vs. AV1 negotiation across browsers, OS versions, and hardware acceleration paths is a maze of partial support. Older Safari only does H.264 baseline. Some Firefox builds require an AV1 fallback. The spec says one thing; implementations do another.

You write a compatibility matrix. You maintain it. You re-test it every Chrome release. The matrix lives in your codebase and silently grows.

Browsers in test matrix: 23 · OS combinations: 41

Production, 4pm Eastern

The 4pm Eastern spike that exposes every concurrency bug at once.

Tutoring is a 4–7pm business. Every weekday, your concurrent session count jumps 8× in 20 minutes. You discover that your SFU autoscale group warms slower than the surge, that your Postgres write queue has a hot row, that your recording cluster runs out of file descriptors.

None of this is visible in dev. It is visible at 4:08pm Eastern when a tutor can't get into a session and emails the founder.

Concurrent sessions, peak: ~1,400 · Off-peak: ~80

Cellular & jitter

The student in São Paulo on cellular data with 28% packet loss.

Real student. Real session. The connection survives because of bandwidth estimation, adaptive bitrate, retransmission, FEC, and a custom jitter buffer that we tuned for six months on a Brazilian cellular profile we'd never seen before.

On a vibe-coded MVP, this session drops at the 90-second mark. The student gives up. The tutoring program loses the contract. You never know.

Network conditions tested: 18 profiles · Tuning iterations: 40+

School networks

The school district that blocks UDP. All of it.

Many K-12 networks restrict UDP entirely or rate-limit aggressively, which kills standard WebRTC. You need TCP fallback through TURN-over-TLS on port 443 — the same port as HTTPS — or your sessions don't connect at all.

This isn't optional. It's an entire customer segment. You discover it the first time you do a district pilot.

% of K-12 networks affected: ~22% · Engineering effort: 6 weeks

Sunday night, 11:14pm

The email from a tutoring director who can't get into her session and has 4,000 students booked tomorrow.

AI cannot answer this email. AI cannot SSH into the affected pod, read the logs, notice that a transcoder worker has wedged on a malformed audio packet, restart it, and send a reassuring reply at 11:23pm.

That requires a human. That human is on your payroll. You did not put them on your spreadsheet because you didn't know yet that this email exists.

Avg P1 response time we hold ourselves to: <15 min · On-call rotation: 4 engineers

Chrome release notes

The Chrome 121 update that silently changed how MediaStreamTrack constraints behave.

Chrome ships a major release every six weeks. Most are quiet. Some break your code. You don't see this in your CI; you see it on a Wednesday morning when your error rate goes from 0.4% to 9%.

You read every Chrome release note. You subscribe to webrtc-discuss. You file bugs upstream. This is permanent overhead and it never reduces.

Major browser releases / yr: 32 · Average breaking changes: 2–4

§ 07 · Run the math

Your numbers. Your scale. The same answer.

Adjust the sliders to your situation. The model uses Bay Area senior comp, public list pricing for infrastructure, and 2026 vendor rates. Try lowering everything as far as the model allows — the number doesn't get small.

Monthly user-hours

Total billable user-hours across all your sessions per month. A 200-tutor program running 5 hours/day lands around 10,000.

10,000 user-hours / month

Time horizon

Most build programs reach a serviceable v1 in year three. We're modeling cumulative cost across the buildout.

3 years

Build approach

Lean = wrap a vendor SDK and skip mobile / compliance for v1. Standard = the team breakdown above. Enterprise = adds redundancy, security depth, and dedicated support.

Compliance scope

SOC 2 + FERPA is the floor for any K-12 buyer. Anything less means you can't sign your first real contract.

Total cost of build

Over 3 years, at 10,000 hrs/mo

$30M

Engineering payroll, loaded $23M

Infrastructure & vendors $6.9M

Compliance & legal $0.78M

vs. buy · Pencil Spaces Pro tier

Same workload on Pencil Spaces Pro: ~$43K total over the same horizon.
You'd save $30M — and three years.

Pro tier: $6/month base + $0.12 per user-hour. Scale tier (the right fit for production programs) starts at $3,000/month, custom-quoted at higher volumes.

Model assumes Bay Area loaded comp, modest mobile scope, and standard ed-tech compliance. Real builds tend to overrun this by 20–40%; this is the optimistic case.

§ 08 · The vibe-coding reckoning

"But I'll just have AI write it."

We use AI every day. We ship faster because of it. And we are the team most qualified to tell you what AI cannot do for a production virtual classroom — not because it's incapable, but because the work it can't do is the work that matters.

What AI does shorten

Scaffolding the prototype. A working video + whiteboard prototype is a weekend with Claude or Cursor. You'll feel like you're 80% done. You're 5%.

Boilerplate components. Auth flows, settings pages, billing UI, marketing site, dashboards — AI is genuinely fast at these. So fast that boilerplate is no longer a moat.

API glue and CRUD. Wiring services together, writing migrations, drafting endpoints. Routine. AI excels at routine.

First drafts of tests. Claude can generate a full test scaffold for code it just wrote. The fixtures are usually fine.

Documentation, runbooks, internal docs. Hours of work compressed into minutes. Quality is genuinely good.

−

What AI cannot shorten

The Safari bug only 3% of users hit. Reproducing a non-deterministic WebRTC bug in WebKit isn't a coding problem. It's an instrumentation problem you didn't know you had.

The 4pm Eastern surge. Autoscaling tuned to a real traffic pattern, with a real cost ceiling, is a months-long observability project. Not a prompt.

SOC 2 Type II. An auditor needs evidence of controls operating for nine months. AI cannot fast-forward time. You wait, or you don't ship to enterprise.

The relationship with the school district. The first sale to a real customer is months of conversations, redlines, security questionnaires, and trust. Not a prompt.

The 11pm email from a panicked director. The reply requires reading the logs, knowing the system, knowing the person. None of that is something you can outsource to an LLM in 2026.

The taste decisions. What goes in the product. What doesn't. Where the line is between “feature” and “noise.” This is the part the founder has to do, and it doesn't scale with model size.

AI takes the prototype from weeks to weekends. It does not take the platform from years to months.

— What we tell every founder who asks

§ 09 · The opportunity cost

The thing your competitor builds, while you build this.

Every dollar and engineer-month you sink into rebuilding undifferentiated infrastructure is a dollar and engineer-month not going to your actual product. Same starting line. Two different finishes.

Year three, building it yourself

Where most teams who chose to build are, on month thirty-six.

Month 06Prototype works in the demo. Cracks in the first real pilot.

Month 12First mobile app submitted. Apple rejects. Twice.

Month 18First district pilot. UDP-blocked network. Sessions don't connect.

Month 24SOC 2 audit kicks off. Six months of evidence collection ahead.

Month 30First real outage. Public postmortem. Two engineers leave.

Month 36Platform is roughly comparable to Pencil Spaces. Differentiated product: not yet started.

Year three, buying it

Where teams who bought infrastructure and built differentiation are, on month thirty-six.

Month 01Pencil Spaces or Carbon API live in production. Real sessions running.

Month 03First differentiated workflow shipped — the thing only your team could build.

Month 09First school district contract closed. Compliance inherited from us.

Month 15Product-market fit visible in retention curves. Hiring around the product.

Month 24Second product line. Engineering bandwidth available because nobody is on call for an SFU.

Month 36Category leader in your vertical. Three years of compounding focus on what only you can do.

§ 10 · Why we know

We've spent six years so you don't have to.

These aren't projections. They're the operating envelope of the platform you can buy from us today — running in production, every day, since 2020.

10M+

virtual classroom sessions delivered, across customers in 60+ countries

99.95%

platform uptime measured against an external synthetic monitor

~6 yrs

of compounding investment in the real-time stack you'd be re-deriving

full-time team members building this — engineers, designers, support, ops — distributed across multiple countries

I've been the founder on the other side of this conversation. In 2020 my co-founder Amogh and I walked away from senior tech-leadership roles at companies like Meta and Google to make the bet on building a real-time, virtual-classroom-grade platform from first principles. The decision wasn't wrong — we had a thesis about what tutoring needed that nothing on the market did. The cost — in years, in headcount, in the long tail you read above — was real.

The thing nobody told us — and the thing I now tell every founder who asks — is that the infrastructure isn't the product. It's the part of the product you're forced to ship before you get to ship the part you actually care about.

Six years later, that infrastructure is what we sell. As a full-stack platform, called Pencil Spaces. As an embeddable API, called Carbon. We built it once, at unsustainable cost, because we had to. You don't.

If you want to build the differentiated product on top of it — the curriculum, the matching engine, the reporting layer, the AI workflow — we'd love to hear about it. That's where the actual moat is. The video plumbing isn't.

And lest this sound like just our story —

What the well-funded teams who tried this actually had to raise.

We're not the only company that's tried to build a virtual classroom platform from scratch. Two of the most prominent venture-backed attempts of the last five years, and what they had to raise to ship something:

Engageli

$47.5M

total raised, two rounds

Series A: $33M led by Maveron, Corner Ventures, et al.

Founded by veteran ed-tech operators with deep institutional backing. Focused on higher-ed and enterprise; built much of the real-time stack from first principles.

Class Technologies (class.com)

$160–$169M

total raised, four-to-five rounds

Series B: $105M led by SoftBank Vision Fund 2 (July 2021).

Built virtual-classroom UX on top of Zoom's Meeting APIs — outsourced the hardest layer of the stack to a third party, and still had to raise nine figures to ship around it.

Read that second card carefully. Class raised more than five times this page's build-cost anchor — and didn't even build their own real-time video. They wrapped Zoom and built UX on top. Even that was a $160M+ undertaking. Engageli, building closer to the metal, raised $47.5M and runs leaner. Neither story is a knock on the founders — both companies have credible operators. It's a knock on the assumption that this category is cheap to ship. It isn't. The receipts above are receipts; the receipts on this card are the receipts the rest of the industry has filed.

Sources: public funding announcements, Crunchbase, Tracxn, PRNewswire (Class Series B, July 2021). Verify before relying.

Authorship & methodAyush AgrawalCEO and co-founder, Pencil Learning Technologies. Numbers above are sourced from public list pricing for managed real-time infrastructure, Levels.fyi P75 senior IC compensation for SF, and our own operating experience running PencilSpaces and Carbon.dev. The model is conservative and intentionally favors the build case where assumptions are ambiguous — because we've watched real builds overrun even the optimistic version of this math.

§ 11 · Two ways to skip the build

If you're not building it, buy the right version.

Two paths, depending on whether you want a finished product or the infrastructure underneath one. We sell both, because we built both.

For tutoring programs & districts

Pencil Spaces

The full virtual classroom, branded for your program, ready in a week. Everything in the stack diagram — we run it; you teach.

The full real-time stack — video, whiteboard, recording, transcription, AI summary

Native iOS and Android apps

SSO, SCIM, FERPA-compliant DPAs, SOC 2 Type II

Custom branding, your domain, your colors, your name on the door

Unlimited seats, billed by the user-hour

From $6/month for solo tutors, custom for Scale

See pricing Talk to us

For developers building their own product

Carbon

The same infrastructure, exposed as an API. Embed real-time video, whiteboard, and persistence into your own app. Skip the build.

WebRTC video, persistent whiteboard, co-browsing, chat

Drop-in components, or compose your own UI on the SDK

Cloud recording, transcription, AI summary — opt-in

Single integration replaces Zoom Meeting APIs, Whereby, Hyperbeam, self-hosted tldraw

Same proven infrastructure, billed by the participant-minute

Pencil Spaces is built on Carbon — that's the proof point

Visit Carbon ↗Read the docs

§ 12 · The shortcuts you'll ask about

"But what if we just buy minutes... or use BigBlueButton?"

Two paths come up on every Scale call. Each looks dramatically cheaper than building from scratch — and is, by ~5%. Here's where the other 95% reappears.

Shortcut #1 · Buy minutes from a CPaaS

Twilio. Agora. Daily. 100ms. Chime SDK. Same answer for all of them.

The pitch is reasonable: instead of operating your own real-time media stack, you pay a managed vendor by the participant-minute and they handle the SFU, the TURN, the scaling, the codec wars, the surge load on the third Tuesday in October. Public list pricing across the category, as of 2026:

Vendor

HD video, $ / participant-min

Worth knowing

Twilio Programmable Video

$0.004

Twilio announced End-of-Life for December 2024, extended to December 2026, then reversed in October 2024 and reinstated the product. Three years of strategic uncertainty for any program built on it.

Agora HD

$0.004

Per-resolution pricing — HD is $3.99/1000 min, Full HD is $8.99/1000, UltraHD higher still. Audio billed separately at $0.99/1000.

Daily.co

$0.004

Flat per-participant-minute, audio-only at $0.00099. Recording adds $0.01349/min, storage another $0.003/min.

100ms

$0.004

Acquired by Razorpay; 10K free participant-minutes/mo; recording $0.0135/min, transcription $0.017/min on top.

Amazon Chime SDK

$0.0017

Cheapest in the category. Note: parent service Amazon Chime was shut down February 2026; the SDK survives, for now. Media capture and replication billed separately.

Public list pricing as of early 2026. Standard volume rates; enterprise discounts apply at scale. Verify current rates before relying.

At our reference scale of 10,000 user-hours per month — that's 600,000 participant-minutes — the CPaaS line item lands around $2,000–$2,400/month at the category-standard $0.004/min, or ~$1,000/month on Chime SDK. Year one: ~$25K. Cheap. Cheaper than running your own SFU cluster, by a meaningful margin.

Now what that doesn't include. Recording is separate. Transcription is separate. The whiteboard is separate. Mobile SDKs are separate. Compliance is separate. Scheduling, billing, identity, support tooling, reporting, integrations — all the other 79 components in §02 — are separate. The CPaaS replaces cluster 01 of sixteen. You still owe the other fifteen, and you still owe the engineering team in §03 to build them, integrate them, and operate them.

The vendor-strategy tax.Twilio's path is the cautionary tale. December 2023: EOL announced for December 2024. Customers begin migrating, mid-stride, to Zoom Video SDK (Twilio's recommended replacement). March 2024: extended to December 2026. Customers roll back planning. October 2024: reversed entirely — Twilio Video stays. Three years of strategic uncertainty hanging over every tutoring program built on the API. Picking a CPaaS isn't picking software; it's picking which vendor's strategic mood your roadmap will track. Chime's parent product, Amazon Chime, was shut down in February 2026. The SDK is still there. For now.

Pencil Spaces handles this workload from $1,200/month on our entry tier ($6/month base + $0.12 per user-hour overage) — less than the CPaaS would charge for just the video minutes. Scale tier, the right fit for production programs, starts at $3,000/month (custom-quoted at higher volumes) and includes the SFU plus the other 79 components: whiteboard, recording, transcription, AI summaries, mobile apps, scheduling, identity, compliance posture, on-call, and the integration glue between all of it. The arithmetic is plain: same money or less than the CPaaS for ~80× the surface area, and the strategic-direction risk is ours, not yours.

Shortcut #2 · Run BigBlueButton (or any open-source classroom)

BBB's license is free. The operations are not.

BigBlueButton is a credible, mature open-source virtual classroom. We use it in tests. We respect the project. It's a good answer to a specific kind of question: “can we run small-scale online classes on a budget without paying a per-seat fee?” Yes. You can. We're not going to pretend otherwise.

It's a worse answer to the question this page is actually about, which is: “can we operate a serious tutoring program at 99.99% reliability across every browser, every device, every school network, with the compliance posture a district can sign off on, without funding a build?” The honest cost stack at moderate scale (200 concurrent users, modest customization, K-12 use case):

AWS c5.2xlarge or equivalent + 500 GB storage + bandwidth. Public list rates.Infrastructure.

~$400 / mo

Someone has to run BBB upgrades, patch security, watch the pager. Loaded SF comp.DevOps engineer (50% allocation).

~$155K / yr

SOC 2, FERPA, COPPA, state student-privacy laws — responsibility now, not BBB's. Year-one ramp; ongoing maintenance after.Compliance posture.your

~$200K Y1

The moment your needs diverge from BBB defaults, you fork. Once you fork, every upstream upgrade is a test-merge-deploy operation against your changes — forever.Customization engineering.

~$100–200K / yr

BBB's mobile experience is functional but not at the bar a tutoring program needs. Building native iOS/Android is its own project — see §02 cluster 05.Native mobile.

~$300K Y1

Same surface area as a build, because BBB doesn't ship these.Trust & safety, integrations, reporting, support tooling.

~$200K Y1

Year-one all-in — the well-run version

~$1M

Ongoing, year two onward

~$500K / yr

BBB is “free” in the same way a sailboat is free if a friend hands you the keys: the boat is real, the wind is real, and so is everything you didn't budget for — the slip, the insurance, the haul-out, the surveyor, the diesel, the time. Most teams that try this path underestimate the DevOps line and the compliance line. Both are load-bearing. Neither is negotiable.

The fork problem.If BBB's defaults are 100% what you need, this is a viable path — many universities run it well. If BBB is 90% there and you customize the rest, you fork, and the fork is yours to maintain forever. Every upstream BBB release becomes a merge-and-test operation. By year two, most teams discover they're moving at BBB's roadmap speed, not their own.

Pencil Spaces Scale starts at $36K/year for this workload (custom-quoted at higher volumes), full-stack, modern UX, mobile apps shipped and maintained, compliance handled, no fork to merge against, and a roadmap that moves at your pace because the platform is purpose-built. Our entry tier — Pro at $6/month + $0.12 per user-hour overage — runs smaller programs for less than $15K/year, full-stack. The “free license” framing is one of those headlines that doesn't survive a CFO's spreadsheet.

We're not pretending these shortcuts don't exist or that they never work. CPaaS works for teams whose differentiation is exclusively in the application UI, who are happy to treat video as commodity infrastructure, and who can absorb vendor-strategy risk. BBB works for teams with a DevOps function to spare, whose UX needs map closely to BBB defaults. For everything else — especially K-12 programs that need to clear SOC 2, FERPA, native mobile, and district procurement — the math doesn't favor either path. We've watched it not favor them, repeatedly, for six years.

§ 13 · Objections we hear, answered

"But our case is different."

It usually isn't. Here are the most common reasons buyers tell us they think the math above doesn't apply to them.

Spend the three years on what only you can build.

We'll handle the video, the whiteboard, the compliance, the on-call.
You build the thing nobody else can.

See Pencil Spaces pricing Explore Carbon API

Questions about your specific situation?hello@pencilspaces.com