Won't this just be obsolete in six months when AI gets smarter?

No. We recalibrate every fixture each quarter against the latest Claude and GPT models. If AI alone clears 60% on a template, we retire it. The bar moves with the models, and we publish the calibration data.

Is this biased? How do you stop the cultural-fit dials from becoming a culture-cloning machine?

Cultural fit is capped at 10% of the composite. The other 90% is observable technical behavior: did they catch the bug, did they prioritize the right one, did they verify the AI output. We score behaviors, not vibes, and the weights are public.

Won't candidates hate this? Two hours feels like unpaid labor.

Every candidate gets a written report within 48 hours, pass or fail: strengths, growth areas, and AI-usage analysis. Completion rates beat take-homes 3x in early testing.

How is this different from HackerRank, CodeSignal, or Karat?

Those test production speed and algorithm recall, both of which AI now handles on its own. We test code review, debugging paths, decisions under ambiguity, and AI-usage quality. A different signal.

Can I customize templates for my codebase?

Yes, on enterprise plans. Pre-calibrated templates by stack (Python, Go, Node, ML) cover 90% of role variation; we'll build a custom fixture during onboarding if you need one.

What if my candidate is great with AI but not great solo?

That's the thing we measure most directly. The AI fluency axis is a positive signal. Top candidates lean on AI and lean well: they prompt with care, they verify outputs, they catch hallucinations. The 'independent vs AI-leveraged' dial lets you weight it for your team.

Re-calibrated against the latest models, every quarter

Hire engineers who can think.

Everyone has AI now, so resumes mean nothing. Crackd drops candidates into a real codebase for 90 minutes, AI included, and shows who can ship.

Start screening See it work

AI allowed and logged · 90-minute screens · Pay per screen, no seats

The signals you used to trust are gone.

Resume, take-home, timed round: one open browser tab clears all three. So you advance people on borrowed signal, and a wrong senior hire still costs six figures.

The question worth asking is whether they can tell when AI is about to walk them off a cliff.

What AI already does for them

A resume that clears every keyword filter

A clean, well-commented take-home

Most LeetCode-style problems, instantly

Confident, fluent answers in a screen share

What it can't do for them

Own a real problem under pressure, push back on a bad spec, and catch the bug the model swore wasn't there.

Take the screen yourself

Here's a real PR. See what you catch.

Click any line you'd comment on, then submit. We'll line you up against a top-decile candidate and an AI reviewing cold. About thirty seconds.

· Look at where the data comes from
· Think about two requests at once
· Not every ugly line is a bug

PR #347: add favorites endpoint

routes/favorites.py

0 lines flagged

Your review

Top candidates leave three to five meaningful comments. There are deliberate red herrings in the diff; flag one and it costs you.

One outright security hole, one concurrency bug, one smaller correctness issue. Go find them.

How it runs

Four steps, and almost none of them are yours.

Set up the screen

Pick a role preset, nudge a few sliders, add any hard requirements. Five minutes.

Send one link

Drop in emails or share one invite URL. No account for them to create.

They work, you don’t

A real PR, a live bug, and a judgment call. AI is on, every keystroke is logged.

Read the report

A composite score, a five-axis breakdown, and the moments that mattered. Ten-minute call.

Good engineers were never the ones who typed the fastest.

Interviews rewarded speed, recall, and a tolerance for unpaid take-homes. None of it was the job, and all of it is now free from a model.

Judgment is what moves your roadmap: knowing what's worth building, and catching the confident answer that's wrong. AI doesn't have judgment. It borrows yours.

So we hand a candidate a real problem with AI switched on, and grade the decisions they make. Everything else was noise.

Find the cracked ones.

Interactive · live scoring

One answer. Score it against your team, not a generic rubric.

Drag the dials to match how your team works. The same answer gets re-scored live, because a response that fits a ship-fast team is wrong for one that gates every deploy.

Your team's style

Communication styleTerse ● Thorough

“Looks good. Two concerns: (1)..., (2)..., Tradeoffs OK if [...].”

Disagreement styleBlunt ● Diplomatic

“"I’m not convinced this approach holds at scale. My reasoning:"”

Documentation habitMinimal ● Extensive

“Short summary + tests mentioned.”

Risk postureShip-fast ● Cautious

“"Behind a flag with a rollback plan."”

AI collaboration styleIndependent ● AI-leveraged

“Pair-programs with AI; accepts ~50%; drives high-level decisions.”

Candidate response

Looks solid overall. Two concerns: (1) `user_id` comes from query params, which is an auth bypass; it should come from the session/JWT. (2) The favorites fetch is going to N+1 at scale, a JOIN would fix it. Otherwise good. Happy to walk through alternatives if helpful.

Style match

90/ 100

Strong fit

Cultural fit is one axis among five, capped at 10% of the composite, so it never overrules technical signal.

What you get back

A report you can decide from in ten minutes.

One composite score with a confidence read, a five-axis breakdown, and the few moments that moved it. Each one is timestamped to the replay, and the whole thing fits on a page.

Candidate Report · 2026-05-22 14:23

Sarah K. · Senior Backend Engineer

Strong yes · High confidence

Composite

/ 100

Technical competence0

Judgment0

Communication0

AI fluency0

Cultural fit0

Standout moments

3:12
“Spotted the auth bypass on line 8 within 3:12 of starting”
Top 5% of historical candidates
18:47
“Asked the right clarifying question before approving”
“Is rate limiting in scope for this PR?” Top 15%.
24:30
“Kept an AI suggestion for the race condition without verifying”
AI hallucinated; candidate didn’t catch it. Bottom 20% on this fixture.

Replay timeline · 89 minOpen full replay →

PR Review · 34mDebugging · 29mDecision · 26m

Pricing

Pay per screen. Not per seat.

One number. No tiers, no per-seat upsell. Run it on every candidate or run it on one. The price and the signal are the same.

$100/ screen

Volume pricing starts at 25 screens. No subscription, no lock-in, cancel any time.

Start screening

Every screen includes

PR Review + Decision Comm modules
Live Debugging + Code Tracing (Wave 2)
Configurable cultural dials
Full 5-axis scored report with replay
Standout moments analysis
Personalized candidate feedback
AI usage telemetry per session
Stripe billing, magic-link candidate flow
Unlimited workspace members

Enterprise — per-position pricing. Hiring several roles at once? Pay per open position, with unlimited candidates per role.

Talk to us →

FAQ

The skeptical questions.

The ones founders and hiring managers ask before they buy.

Have one we didn't answer? Email us.