Build vs. buy · A field report from year six

What it actually costs to build your own virtual classroom.

We've been building one since 2020. Ten million sessions later, here's the math nobody runs before they start — and the long tail of work nobody warns you about until you've already hired the team.

No marketing tricks. No fake totals. Numbers below are sourced and dated; assumptions are disclosed.

TL;DRUpdated May 1, 2026 · By Ayush Agrawal, CEO & Co-founder, Pencil Learning Technologies

Should you build or buy a virtual classroom platform in 2026? Almost always: buy.

Building a production-grade virtual classroom — one that runs at 99.99% reliability across every browser, every device, and every school network, with the compliance posture a district will sign off on — costs ~$30M over three years with a 50-person engineering team, across 80 distinct production components. The hard parts aren't the WebRTC or the SFU; they're SOC 2 Type II evidence (which has to age in time), native iOS and Android apps, the testing matrix across browsers and devices, and 24×7 on-call. Even Class Technologies, after raising $160M+ in venture funding, built their UX on top of Zoom rather than build their own real-time video.

Pencil Spaces, by comparison, handles this from $3,000/month on Scale (and from $6/month on our entry tier) — for the same SFU, whiteboard, recording, transcription, AI summaries, native mobile, scheduling, identity, and compliance posture. We've been operating it in production since 2020, with 10M+ sessions delivered across customers in 60+ countries at 99.95% measured uptime.

Read the full math: the $30M anchor · the 80-component stack · the 50-person team · infrastructure economics · the CPaaS / BBB shortcut math · the interactive calculator.

Total cost of build · Three-year horizonBay Area · 99.99% reliability · 2026
$30M
USD 30,000,000 · estimated, methodology disclosed below

To stand up a virtual classroom platform that's roughly comparable to what you can buy from us today — running at 99.99% reliability across every browser, every device, every school network. Three years. Fifty people you'll need to hire and retain. Infrastructure that bills by the second. Compliance that takes longer to pass than your cap table takes to close.

Time
36 mo
From kickoff to a production-grade platform
Headcount
50 FTE
Engineering, design, security, support — loaded by Year 3
Surface area
80
Components in the production stack you can't skip
Annual burn (Y3+)
$16M
Ongoing burn from year three onward, every year
Source: Pencil Learning Technologies operational data and field analysis, 2020–2026. Compensation bands per Levels.fyi (SF Bay Area, accessed May 2026). Full methodology in §02–§04.
First · the question we get most

"But with Cursor and APIs, can't I just vibe-code this?"

We get this email a couple of times a month. Almost always from a smart engineer who just shipped a polished prototype over a weekend — Cursor in one window, a managed video SDK, a transcription API, a CRDT library, an auth provider. It works. They feel ~80% done.

We've been that engineer. So have a lot of our customers, before they became our customers. So this isn't a lecture — it's the thing we wish someone had said to us in 2020.

+

What AI & APIs actually shorten

The prototype. A working four-person room with a shared whiteboard is genuinely a weekend.
The boilerplate. Auth flows, settings, billing UI, marketing site — fast.
The integration glue. Wiring four services together is something Claude does well.
The first drafts. Tests, docs, runbooks. Quality is good.

What AI & APIs cannot shorten

The compliance posture. APIs are SOC 2 vendors; you are still the auditee. Your auditor signs off after evidence operates for nine months. AI doesn't compress time.
The mobile binaries. Two app stores, two audio stacks, two device-fragmentation taxes. There's no API that ships native apps for you.
The composition tax. Every API you use is a vendor whose SLA, pricing, and roadmap you don't control. At scale you accept the lock-in or you rebuild.
The integration surface. Five APIs is twenty integration edges — error states, version drift, retry storms, billing reconciliation when one is down.
The on-call. When an SFU pod thrashes at 4pm Eastern, the pager doesn't get answered by Claude. A human reads logs.
The procurement cycle. A district CTO accepts the auditor's letter, not your confidence. Your prototype isn't in the procurement packet.
You can prototype this in a weekend. You can't ship it in a year.You can compose the parts. You can't compose the operations underneath them.

If you've read that and still feel certain your team is the exception, run the calculator below with your numbers. We'll happily lose the argument if your model produces a smaller number than ours — and if it does, we'd genuinely like to hear about it. We learn from the engineers who pull this off; they're rare, but they exist, and they're not the buyers we're talking to here.

§ 01 · The estimation gap

What founders think they need vs. what ships.

The first — and largest — cost overrun on every internal video build is the gap between the Friday-afternoon whiteboard estimate and the actual surface area of a production system.

The whiteboard estimate

"How hard could it be?"

We'll grab an open-source video SDK, drop in a whiteboard, plug it into our LMS, and ship it in a quarter. Two engineers and a designer. Maybe $300K?

AWebRTC video calls (4–8 participants)
BA whiteboard component (we'll use tldraw)
CAuth and a Postgres database
DA scheduling page on top of Cal.com
ECall it a v1 and iterate from feedback

Not wrong about the components. Wrong about everything else — the percentage of work each one represents, the operational tax of running them, and the second-derivative cost of maintaining them once you have real users.

The shipping reality

What actually has to exist.

You can ship a working prototype in a quarter. You cannot ship a platform a tutoring program will sign a multi-year contract for, in less than three years.

01Real-time media: SFU cluster, signaling, TURN/STUN, fallbacks for restrictive networks
02Whiteboard sync: CRDT engine, conflict resolution, persistence, version history
03Cloud recording: compositor, encryption, multi-region storage, retention policy
04Live transcription, AI summaries, action item extraction — per call, at scale
05Mobile: native iOS, native Android, App Store review, OS-version regression
06SSO, SAML, FERPA contracts, SOC 2 Type II, COPPA review, GDPR, pen tests
07Observability, on-call rotation, status page, customer support tooling
08The seventy-two other things on the next page

The naive list is roughly 15% of the work. The other 85% is where startups die — quietly, six quarters in, after the founders have already told the board it's “almost done.”

§ 02 · The architecture you can't avoid

Eighty components. None of them optional.

These are the actual systems we run in production. Skip any one of them and you ship a prototype, not a platform. Most teams discover them after their first outage.

01 · Real-time media8
SFU cluster (managed or self-hosted)
Signaling server with reconnect semantics
TURN servers, geo-distributed
STUN, ICE, NAT traversal logic
Adaptive bitrate, simulcast, SVC
Echo cancellation, AGC, noise suppression
Active speaker detection
Bandwidth estimation, congestion control
02 · Whiteboard & co-creation7
Vector canvas engine, hardware-accelerated
CRDT-based real-time sync (Yjs, Automerge)
Conflict resolution & offline merge
Persistence, snapshots, time travel
Pressure-sensitive ink, palm rejection
PDF / image / equation embeds
Export pipeline (PDF, PNG, SVG)
03 · Recording & playback6
Server-side compositor (FFmpeg pipeline)
Multi-track audio mixing
Layout switching, screenshare overlay
Encrypted storage, signed playback URLs
Adaptive streaming player (HLS / DASH)
Retention, deletion, FERPA-compliant audit
04 · AI & transcription6
Streaming speech-to-text pipeline
Speaker diarization, multi-channel
Session summary & action items
Searchable transcript index
Real-time captions, multi-language
Coaching analysis, talk-time ratios
05 · Native mobile5
iOS app, Swift / SwiftUI, WebRTC bindings
Android app, Kotlin, libwebrtc, Pixel + Samsung + Xiaomi parity
App Store / Play Store review & appeals
Push notifications, deep links, calendar handoff
Background audio, CallKit, ConnectionService
06 · Identity & access5
OAuth, SAML, OIDC, WorkOS integration
SCIM provisioning, deprovisioning
Roles, permissions, multi-tenancy
Magic links, MFA, session management
Audit logs, exportable, immutable
07 · Scheduling & calendars5
Booking engine, availability, recurring rules
Two-way Google / Outlook / Apple calendar sync
Timezone, DST, RRULE correctness
Reminders: email, SMS, push, with delivery proof
No-show, reschedule, cancellation flows
08 · Compliance & security5
SOC 2 Type II audit, evidence collection
FERPA contracts & data flows
COPPA, GDPR, CCPA, state-specific student privacy
Encryption at rest, in transit, key rotation
Pen testing, vulnerability scanning, bug bounty
09 · Ops & observability4
Logs, metrics, traces (managed or self-hosted)
On-call rotation, runbooks, postmortems
Public status page, incident comms
Multi-region failover, DR drills
10 · Billing & admin3
Usage metering (per-user-hour, per-recording)
Stripe integration, taxes, invoices, dunning
Org admin: seats, billing, exports
11 · Support tooling3
Help center, in-app help, AI support agent
Session replay, debug bundle export
Customer-facing dashboards, exportable reports
12 · i18n & accessibility3
Localization framework, RTL, plurals
WCAG 2.1 AA, screen reader, keyboard parity
Live captioning surfaced through UI
13 · Trust & safety5
Recording disclosure & consent capture (parental, where required)
In-session reporting flows: report a participant, post-session reports
Content moderation: file-upload scanning, link safety, screenshare guardrails
Abuse detection: keyword filters, ML escalation, human review queue
Background-check integration for tutors (Checkr or equivalent)
14 · Integrations & rostering5
LTI 1.3: deep linking, Names & Roles, Assignment & Grade Service
Clever, ClassLink, Google Classroom rostering
OneRoster CSV + REST sync, district-by-district edge cases
Public API: rate limits, OAuth client management, SDK upkeep
Webhooks: signing, retries, dead-letter queue, replay
15 · Reporting & analytics5
Per-student & per-cohort engagement reporting
Parent reports: weekly recap, attendance, time-on-task
Admin dashboards: program health, tutor utilization, no-show rates
Scheduled report delivery (email, CSV, PDF)
Event pipeline → warehouse → dbt → BI tool
16 · Internal tooling & ops5
Feature flags, gradual rollouts, kill switches per tenant
Tenant impersonation (with audit log) for support
Migration tools: import from prior platform, rollback safety
Internal admin console: billing exceptions, refunds, manual provisioning
Tenant-level config: branding, retention policy, allowed integrations
16
distinct subsystems, each owned by an engineer who knows it deeply
80
components, none of which a tutoring program will pay for if you skip it
3–5x
average underestimation of timeline by founders we've talked to about building
100%
of teams who skipped #08 (compliance) at v1 told us they regretted it
§ 03 · The team you'll need to hire

Fifty people by year three.None of them junior.

What it actually takes to run a system at 99.99% reliability across every browser, every device, every school network. The asterisks aren't on whether you'll hire these roles — they're on whether you'll be able to retain them in a market where every one of these humans is being recruited weekly by every other company that's serious about real-time.

Loaded compensation reflects SF Bay Area senior IC ranges (Levels.fyi P75) plus ~1.4× loaded multiplier (employer taxes, benefits, equity, equipment, recruiting). We've left specific dollar figures off the table on purpose — they're public on Levels.fyi if you want them. Three-year payroll alone: ~$23M. And no, this isn't the team that ships your differentiated product. This is the team that just ships the plumbing underneath it.
§ 04 · The infrastructure bill

Per-hour economics, before you've added any feature.

Public list pricing from each vendor as of 2026. Assumes a modest scale of 10,000 user-hours per month after Year 1 — the rough size of a 200-tutor program running five hours a day. Bigger operations get worse ratios.

TURN bandwidth (relay fallback)~$0.40 / GB · managed TURN
When direct peer connections fail — school networks, corporate firewalls, symmetric NATs — media routes through TURN servers. ~15% of traffic in our experience. A four-person hour on TURN is roughly 540 MB.
10,000 user-hr/mo × 540 MB × 15% × $0.40/GB
=~$3,200 / mo
SFU compute & bandwidth~$0.004 / participant-min · managed SFU
Either you operate your own media-server cluster — which means an SRE on-call — or you buy a managed service. At 10K user-hours that's 600K participant-minutes minimum, and rises with multi-party.
10,000 user-hr × 60 min × $0.004
=~$2,400 / mo
Live transcription~$0.0043 / sec · streaming STT
Streaming speech-to-text for live captions and post-session transcripts. Billed per second of audio per channel. This is the line item that surprises every buyer.
10,000 hr × 3,600 sec × $0.0043
=~$155,000 / mo
Recording compute & storage~$0.50 / hr compute + $0.023 / GB / mo
Compositor needs dedicated CPU. Output averages ~600 MB per recorded hour. Storage compounds because most programs keep recordings for the academic year minimum.
10,000 hr × $0.50 + (rolling 60 TB × $0.023)
=~$6,400 / mo
AI session summaries~$0.05 / session · LLM API
Per-session summary, action items, talk-time analysis. Cheap per call. Adds up at volume, and the prompt engineering is real product work.
~3,300 sessions/mo × $0.05
=~$165 / mo
Whiteboard sync & persistence~$2K / mo · managed CRDT
CRDT sync as a managed service, or self-hosted on your own infrastructure (which costs an engineer instead). At scale, both options converge in price.
Pro tier + connection-based overage
=~$2,500 / mo
General cloud (AWS / GCP)Postgres, Redis, CDN, queues, KMS
Application servers, database (with read replica for compliance reporting), Redis, S3, CloudFront, WAF, Secrets Manager, Lambda, queues. Nothing fancy.
Conservative production budget at this scale
=~$15,000 / mo
Vendors & toolingemail, SMS, observability, error tracking, …
Transactional email, SMS, push notifications, observability, error tracking, status page, secrets vault, security scanning, CI/CD. Each one is small. Together they're a salary.
Combined SaaS line items
=~$8,000 / mo
in infrastructure alone before you ship a single differentiated feature. Roughly $2.3M/year — rising with usage, not falling. And this is at modest scale. Multiply by 10x when you sell to a school district.~$192,000/month
$2.3Minfrastructure / yr at 10K hrs/mo
§ 05 · The compliance tax

Things you can't ship without. Things that take a year.

Edtech buyers will not sign a contract without these. School districts have a procurement checklist; tutoring programs are inheriting it. Each line below has a real audit, a real lead time, and a real invoice.

Requirement
Why it exists, and what passing it actually means
Lead time
Year-1 cost
SOC 2 Type II
Controls audit covering security, availability, confidentiality. Type II requires a six- to twelve-month observation window, evidence of every control operating continuously. Every enterprise buyer asks for it.
9–12 months
$170,000
FERPA framework
Education records privacy law. Requires data flow mapping, retention policies, a Data Processing Agreement template, and counsel review. Districts will not sign without a FERPA-compliant DPA on letterhead.
6–8 weeks
$15,000
COPPA review
Children's online privacy. Required if any user is under 13, which in tutoring is most of them. Requires verifiable parental consent flows and counsel sign-off on data practices.
4–6 weeks
$10,000
GDPR + UK / EU DPA
Standard contractual clauses, transfer impact assessment, DPO appointment if you sell into Europe. Once you do, a single non-conforming export can cost you the account.
8–12 weeks
$25,000
State student-data laws
California SOPIPA, Illinois SOPPA, NY Ed Law 2-d, Texas SB 820, plus 30+ others. Each state has its own DPA template; vendors maintain a contract library.
Rolling
$30,000
External penetration test
Third-party security firm attempts to break into your system, files a report, you remediate, they retest. Many enterprise buyers want one within the last 12 months.
6 weeks + remediation
$35,000
Cyber insurance
Required by most enterprise contracts. Premiums depend on your data volumes and SOC 2 status. A real claim is rare; the certificate is the product.
2 weeks
$25,000
Privacy counsel retainer
Outside counsel for DPA negotiations, breach response readiness, contract red-lines, regulatory updates. You will need them more than you think.
Ongoing
$30,000
Year-one compliance cost
Plus ~400 engineering hours implementing controls, evidence pipelines, and audit automation.
Concurrent
~$340,000
§ 06 · Hidden costs & war stories

The bugs nobody warns you about.

Every one of these is a specific incident we, or peers running production WebRTC, have personally debugged. They are not in the architecture diagram. They are why real-time video has been called the hardest commodity-grade software to ship.

Safari, 2024

The getUserMedia race condition that only fired on iOS 17.4 with non-Apple Bluetooth headsets.

Audio device selection nondeterministically picked the wrong input. Reproduced in production for ~3% of mobile users. Took two senior engineers six weeks to root-cause — the bug was in WebKit, not our code. Workaround required redesigning our audio capture flow.

You don't fix this with a Stack Overflow answer. You fix it by building enough logging infrastructure to see one bad call out of every thirty.

Engineering cost: ~480 hours · Discovered: 14 months in
Android, ongoing

Echo cancellation behaves differently on every Samsung, Pixel, and Xiaomi device.

Hardware AEC quality varies by chipset. Some devices ship aggressive noise gates that cut quiet voices entirely. Some Xiaomi MIUI builds bypass standard audio APIs. Mid-range Samsung devices in India have tested differently than Samsung devices in Korea.

You ship a software AEC fallback. You build a device-specific config table. You buy twenty cheap Android phones and keep them on a shelf. You hire someone whose job is this table.

Hardware lab budget: $40K/yr · Permanent ownership: 0.5 FTE
Network edges

The codec negotiation that crashes when one user is on Firefox and another is on Safari 14.

H.264 vs. VP8 vs. VP9 vs. AV1 negotiation across browsers, OS versions, and hardware acceleration paths is a maze of partial support. Older Safari only does H.264 baseline. Some Firefox builds require an AV1 fallback. The spec says one thing; implementations do another.

You write a compatibility matrix. You maintain it. You re-test it every Chrome release. The matrix lives in your codebase and silently grows.

Browsers in test matrix: 23 · OS combinations: 41
Production, 4pm Eastern

The 4pm Eastern spike that exposes every concurrency bug at once.

Tutoring is a 4–7pm business. Every weekday, your concurrent session count jumps 8× in 20 minutes. You discover that your SFU autoscale group warms slower than the surge, that your Postgres write queue has a hot row, that your recording cluster runs out of file descriptors.

None of this is visible in dev. It is visible at 4:08pm Eastern when a tutor can't get into a session and emails the founder.

Concurrent sessions, peak: ~1,400 · Off-peak: ~80
Cellular & jitter

The student in São Paulo on cellular data with 28% packet loss.

Real student. Real session. The connection survives because of bandwidth estimation, adaptive bitrate, retransmission, FEC, and a custom jitter buffer that we tuned for six months on a Brazilian cellular profile we'd never seen before.

On a vibe-coded MVP, this session drops at the 90-second mark. The student gives up. The tutoring program loses the contract. You never know.

Network conditions tested: 18 profiles · Tuning iterations: 40+
School networks

The school district that blocks UDP. All of it.

Many K-12 networks restrict UDP entirely or rate-limit aggressively, which kills standard WebRTC. You need TCP fallback through TURN-over-TLS on port 443 — the same port as HTTPS — or your sessions don't connect at all.

This isn't optional. It's an entire customer segment. You discover it the first time you do a district pilot.

% of K-12 networks affected: ~22% · Engineering effort: 6 weeks
Sunday night, 11:14pm

The email from a tutoring director who can't get into her session and has 4,000 students booked tomorrow.

AI cannot answer this email. AI cannot SSH into the affected pod, read the logs, notice that a transcoder worker has wedged on a malformed audio packet, restart it, and send a reassuring reply at 11:23pm.

That requires a human. That human is on your payroll. You did not put them on your spreadsheet because you didn't know yet that this email exists.

Avg P1 response time we hold ourselves to: <15 min · On-call rotation: 4 engineers
Chrome release notes

The Chrome 121 update that silently changed how MediaStreamTrack constraints behave.

Chrome ships a major release every six weeks. Most are quiet. Some break your code. You don't see this in your CI; you see it on a Wednesday morning when your error rate goes from 0.4% to 9%.

You read every Chrome release note. You subscribe to webrtc-discuss. You file bugs upstream. This is permanent overhead and it never reduces.

Major browser releases / yr: 32 · Average breaking changes: 2–4
§ 07 · Run the math

Your numbers. Your scale. The same answer.

Adjust the sliders to your situation. The model uses Bay Area senior comp, public list pricing for infrastructure, and 2026 vendor rates. Try lowering everything as far as the model allows — the number doesn't get small.

Total billable user-hours across all your sessions per month. A 200-tutor program running 5 hours/day lands around 10,000.

10,000 user-hours / month

Most build programs reach a serviceable v1 in year three. We're modeling cumulative cost across the buildout.

3 years

Lean = wrap a vendor SDK and skip mobile / compliance for v1. Standard = the team breakdown above. Enterprise = adds redundancy, security depth, and dedicated support.

SOC 2 + FERPA is the floor for any K-12 buyer. Anything less means you can't sign your first real contract.

Total cost of build
Over 3 years, at 10,000 hrs/mo
$30M
Engineering payroll, loaded $23M
Infrastructure & vendors $6.9M
Compliance & legal $0.78M
vs. buy · Pencil Spaces Pro tier
Same workload on Pencil Spaces Pro: ~$43K total over the same horizon.
You'd save $30M — and three years.
Pro tier: $6/month base + $0.12 per user-hour. Scale tier (the right fit for production programs) starts at $3,000/month, custom-quoted at higher volumes.
Model assumes Bay Area loaded comp, modest mobile scope, and standard ed-tech compliance. Real builds tend to overrun this by 20–40%; this is the optimistic case.
§ 08 · The vibe-coding reckoning

"But I'll just have AI write it."

We use AI every day. We ship faster because of it. And we are the team most qualified to tell you what AI cannot do for a production virtual classroom — not because it's incapable, but because the work it can't do is the work that matters.

+

What AI does shorten

Scaffolding the prototype. A working video + whiteboard prototype is a weekend with Claude or Cursor. You'll feel like you're 80% done. You're 5%.
Boilerplate components. Auth flows, settings pages, billing UI, marketing site, dashboards — AI is genuinely fast at these. So fast that boilerplate is no longer a moat.
API glue and CRUD. Wiring services together, writing migrations, drafting endpoints. Routine. AI excels at routine.
First drafts of tests. Claude can generate a full test scaffold for code it just wrote. The fixtures are usually fine.
Documentation, runbooks, internal docs. Hours of work compressed into minutes. Quality is genuinely good.

What AI cannot shorten

The Safari bug only 3% of users hit. Reproducing a non-deterministic WebRTC bug in WebKit isn't a coding problem. It's an instrumentation problem you didn't know you had.
The 4pm Eastern surge. Autoscaling tuned to a real traffic pattern, with a real cost ceiling, is a months-long observability project. Not a prompt.
SOC 2 Type II. An auditor needs evidence of controls operating for nine months. AI cannot fast-forward time. You wait, or you don't ship to enterprise.
The relationship with the school district. The first sale to a real customer is months of conversations, redlines, security questionnaires, and trust. Not a prompt.
The 11pm email from a panicked director. The reply requires reading the logs, knowing the system, knowing the person. None of that is something you can outsource to an LLM in 2026.
The taste decisions. What goes in the product. What doesn't. Where the line is between “feature” and “noise.” This is the part the founder has to do, and it doesn't scale with model size.

AI takes the prototype from weeks to weekends. It does not take the platform from years to months.

— What we tell every founder who asks
§ 09 · The opportunity cost

The thing your competitor builds, while you build this.

Every dollar and engineer-month you sink into rebuilding undifferentiated infrastructure is a dollar and engineer-month not going to your actual product. Same starting line. Two different finishes.

Year three, building it yourself

Where most teams who chose to build are, on month thirty-six.

Month 06Prototype works in the demo. Cracks in the first real pilot.
Month 12First mobile app submitted. Apple rejects. Twice.
Month 18First district pilot. UDP-blocked network. Sessions don't connect.
Month 24SOC 2 audit kicks off. Six months of evidence collection ahead.
Month 30First real outage. Public postmortem. Two engineers leave.
Month 36Platform is roughly comparable to Pencil Spaces. Differentiated product: not yet started.

Year three, buying it

Where teams who bought infrastructure and built differentiation are, on month thirty-six.

Month 01Pencil Spaces or Carbon API live in production. Real sessions running.
Month 03First differentiated workflow shipped — the thing only your team could build.
Month 09First school district contract closed. Compliance inherited from us.
Month 15Product-market fit visible in retention curves. Hiring around the product.
Month 24Second product line. Engineering bandwidth available because nobody is on call for an SFU.
Month 36Category leader in your vertical. Three years of compounding focus on what only you can do.
§ 10 · Why we know

We've spent six years so you don't have to.

These aren't projections. They're the operating envelope of the platform you can buy from us today — running in production, every day, since 2020.

10M+
virtual classroom sessions delivered, across customers in 60+ countries
99.95%
platform uptime measured against an external synthetic monitor
~6 yrs
of compounding investment in the real-time stack you'd be re-deriving
60
full-time team members building this — engineers, designers, support, ops — distributed across multiple countries

I've been the founder on the other side of this conversation. In 2020 my co-founder Amogh and I walked away from senior tech-leadership roles at companies like Meta and Google to make the bet on building a real-time, virtual-classroom-grade platform from first principles. The decision wasn't wrong — we had a thesis about what tutoring needed that nothing on the market did. The cost — in years, in headcount, in the long tail you read above — was real.

The thing nobody told us — and the thing I now tell every founder who asks — is that the infrastructure isn't the product. It's the part of the product you're forced to ship before you get to ship the part you actually care about.

Six years later, that infrastructure is what we sell. As a full-stack platform, called Pencil Spaces. As an embeddable API, called Carbon. We built it once, at unsustainable cost, because we had to. You don't.

If you want to build the differentiated product on top of it — the curriculum, the matching engine, the reporting layer, the AI workflow — we'd love to hear about it. That's where the actual moat is. The video plumbing isn't.

And lest this sound like just our story —

What the well-funded teams who tried this actually had to raise.

We're not the only company that's tried to build a virtual classroom platform from scratch. Two of the most prominent venture-backed attempts of the last five years, and what they had to raise to ship something:

Engageli
$47.5M
total raised, two rounds
Series A: $33M led by Maveron, Corner Ventures, et al.
Founded by veteran ed-tech operators with deep institutional backing. Focused on higher-ed and enterprise; built much of the real-time stack from first principles.
Class Technologies (class.com)
$160–$169M
total raised, four-to-five rounds
Series B: $105M led by SoftBank Vision Fund 2 (July 2021).
Built virtual-classroom UX on top of Zoom's Meeting APIs — outsourced the hardest layer of the stack to a third party, and still had to raise nine figures to ship around it.

Read that second card carefully. Class raised more than five times this page's build-cost anchor — and didn't even build their own real-time video. They wrapped Zoom and built UX on top. Even that was a $160M+ undertaking. Engageli, building closer to the metal, raised $47.5M and runs leaner. Neither story is a knock on the founders — both companies have credible operators. It's a knock on the assumption that this category is cheap to ship. It isn't. The receipts above are receipts; the receipts on this card are the receipts the rest of the industry has filed.

Sources: public funding announcements, Crunchbase, Tracxn, PRNewswire (Class Series B, July 2021). Verify before relying.

Authorship & methodAyush AgrawalCEO and co-founder, Pencil Learning Technologies. Numbers above are sourced from public list pricing for managed real-time infrastructure, Levels.fyi P75 senior IC compensation for SF, and our own operating experience running PencilSpaces and Carbon.dev. The model is conservative and intentionally favors the build case where assumptions are ambiguous — because we've watched real builds overrun even the optimistic version of this math.
§ 11 · Two ways to skip the build

If you're not building it, buy the right version.

Two paths, depending on whether you want a finished product or the infrastructure underneath one. We sell both, because we built both.

For tutoring programs & districts

Pencil Spaces

The full virtual classroom, branded for your program, ready in a week. Everything in the stack diagram — we run it; you teach.

The full real-time stack — video, whiteboard, recording, transcription, AI summary
Native iOS and Android apps
SSO, SCIM, FERPA-compliant DPAs, SOC 2 Type II
Custom branding, your domain, your colors, your name on the door
Unlimited seats, billed by the user-hour
From $6/month for solo tutors, custom for Scale
For developers building their own product

Carbon

The same infrastructure, exposed as an API. Embed real-time video, whiteboard, and persistence into your own app. Skip the build.

WebRTC video, persistent whiteboard, co-browsing, chat
Drop-in components, or compose your own UI on the SDK
Cloud recording, transcription, AI summary — opt-in
Single integration replaces Zoom Meeting APIs, Whereby, Hyperbeam, self-hosted tldraw
Same proven infrastructure, billed by the participant-minute
Pencil Spaces is built on Carbon — that's the proof point
§ 12 · The shortcuts you'll ask about

"But what if we just buy minutes... or use BigBlueButton?"

Two paths come up on every Scale call. Each looks dramatically cheaper than building from scratch — and is, by ~5%. Here's where the other 95% reappears.

Shortcut #1 · Buy minutes from a CPaaS

Twilio. Agora. Daily. 100ms. Chime SDK. Same answer for all of them.

The pitch is reasonable: instead of operating your own real-time media stack, you pay a managed vendor by the participant-minute and they handle the SFU, the TURN, the scaling, the codec wars, the surge load on the third Tuesday in October. Public list pricing across the category, as of 2026:

Vendor
HD video, $ / participant-min
Worth knowing
Twilio Programmable Video
$0.004
Twilio announced End-of-Life for December 2024, extended to December 2026, then reversed in October 2024 and reinstated the product. Three years of strategic uncertainty for any program built on it.
Agora HD
$0.004
Per-resolution pricing — HD is $3.99/1000 min, Full HD is $8.99/1000, UltraHD higher still. Audio billed separately at $0.99/1000.
Daily.co
$0.004
Flat per-participant-minute, audio-only at $0.00099. Recording adds $0.01349/min, storage another $0.003/min.
100ms
$0.004
Acquired by Razorpay; 10K free participant-minutes/mo; recording $0.0135/min, transcription $0.017/min on top.
Amazon Chime SDK
$0.0017
Cheapest in the category. Note: parent service Amazon Chime was shut down February 2026; the SDK survives, for now. Media capture and replication billed separately.
Public list pricing as of early 2026. Standard volume rates; enterprise discounts apply at scale. Verify current rates before relying.

At our reference scale of 10,000 user-hours per month — that's 600,000 participant-minutes — the CPaaS line item lands around $2,000–$2,400/month at the category-standard $0.004/min, or ~$1,000/month on Chime SDK. Year one: ~$25K. Cheap. Cheaper than running your own SFU cluster, by a meaningful margin.

Now what that doesn't include. Recording is separate. Transcription is separate. The whiteboard is separate. Mobile SDKs are separate. Compliance is separate. Scheduling, billing, identity, support tooling, reporting, integrations — all the other 79 components in §02 — are separate. The CPaaS replaces cluster 01 of sixteen. You still owe the other fifteen, and you still owe the engineering team in §03 to build them, integrate them, and operate them.

The vendor-strategy tax.Twilio's path is the cautionary tale. December 2023: EOL announced for December 2024. Customers begin migrating, mid-stride, to Zoom Video SDK (Twilio's recommended replacement). March 2024: extended to December 2026. Customers roll back planning. October 2024: reversed entirely — Twilio Video stays. Three years of strategic uncertainty hanging over every tutoring program built on the API. Picking a CPaaS isn't picking software; it's picking which vendor's strategic mood your roadmap will track. Chime's parent product, Amazon Chime, was shut down in February 2026. The SDK is still there. For now.

Pencil Spaces handles this workload from $1,200/month on our entry tier ($6/month base + $0.12 per user-hour overage) — less than the CPaaS would charge for just the video minutes. Scale tier, the right fit for production programs, starts at $3,000/month (custom-quoted at higher volumes) and includes the SFU plus the other 79 components: whiteboard, recording, transcription, AI summaries, mobile apps, scheduling, identity, compliance posture, on-call, and the integration glue between all of it. The arithmetic is plain: same money or less than the CPaaS for ~80× the surface area, and the strategic-direction risk is ours, not yours.

Shortcut #2 · Run BigBlueButton (or any open-source classroom)

BBB's license is free. The operations are not.

BigBlueButton is a credible, mature open-source virtual classroom. We use it in tests. We respect the project. It's a good answer to a specific kind of question: “can we run small-scale online classes on a budget without paying a per-seat fee?” Yes. You can. We're not going to pretend otherwise.

It's a worse answer to the question this page is actually about, which is: “can we operate a serious tutoring program at 99.99% reliability across every browser, every device, every school network, with the compliance posture a district can sign off on, without funding a build?” The honest cost stack at moderate scale (200 concurrent users, modest customization, K-12 use case):

AWS c5.2xlarge or equivalent + 500 GB storage + bandwidth. Public list rates.Infrastructure.
~$400 / mo
Someone has to run BBB upgrades, patch security, watch the pager. Loaded SF comp.DevOps engineer (50% allocation).
~$155K / yr
SOC 2, FERPA, COPPA, state student-privacy laws — responsibility now, not BBB's. Year-one ramp; ongoing maintenance after.Compliance posture.your
~$200K Y1
The moment your needs diverge from BBB defaults, you fork. Once you fork, every upstream upgrade is a test-merge-deploy operation against your changes — forever.Customization engineering.
~$100–200K / yr
BBB's mobile experience is functional but not at the bar a tutoring program needs. Building native iOS/Android is its own project — see §02 cluster 05.Native mobile.
~$300K Y1
Same surface area as a build, because BBB doesn't ship these.Trust & safety, integrations, reporting, support tooling.
~$200K Y1
Year-one all-in — the well-run version
~$1M
Ongoing, year two onward
~$500K / yr

BBB is “free” in the same way a sailboat is free if a friend hands you the keys: the boat is real, the wind is real, and so is everything you didn't budget for — the slip, the insurance, the haul-out, the surveyor, the diesel, the time. Most teams that try this path underestimate the DevOps line and the compliance line. Both are load-bearing. Neither is negotiable.

The fork problem.If BBB's defaults are 100% what you need, this is a viable path — many universities run it well. If BBB is 90% there and you customize the rest, you fork, and the fork is yours to maintain forever. Every upstream BBB release becomes a merge-and-test operation. By year two, most teams discover they're moving at BBB's roadmap speed, not their own.

Pencil Spaces Scale starts at $36K/year for this workload (custom-quoted at higher volumes), full-stack, modern UX, mobile apps shipped and maintained, compliance handled, no fork to merge against, and a roadmap that moves at your pace because the platform is purpose-built. Our entry tier — Pro at $6/month + $0.12 per user-hour overage — runs smaller programs for less than $15K/year, full-stack. The “free license” framing is one of those headlines that doesn't survive a CFO's spreadsheet.

We're not pretending these shortcuts don't exist or that they never work. CPaaS works for teams whose differentiation is exclusively in the application UI, who are happy to treat video as commodity infrastructure, and who can absorb vendor-strategy risk. BBB works for teams with a DevOps function to spare, whose UX needs map closely to BBB defaults. For everything else — especially K-12 programs that need to clear SOC 2, FERPA, native mobile, and district procurement — the math doesn't favor either path. We've watched it not favor them, repeatedly, for six years.

§ 13 · Objections we hear, answered

"But our case is different."

It usually isn't. Here are the most common reasons buyers tell us they think the math above doesn't apply to them.

We have a strong eng team — couldn't we do this for half?
What if we use a managed SFU vendor (Twilio, Agora, Daily, 100ms, Chime SDK)?
We're not in K-12 — we don't need FERPA / SOC 2.
Won't AI dramatically reduce our team size?
Aren't you incentivized to talk us out of this?

Spend the three years on what only you can build.

We'll handle the video, the whiteboard, the compliance, the on-call.
You build the thing nobody else can.

Questions about your specific situation?hello@pencilspaces.com