Career Vision

I'm at a deliberate inflection point — transitioning from platform engineering into AI safety. Three paths are in view. Each has a different theory of change and a different ask of my background.

AI Safety Research

Transitioning into technical AI safety research with a focus on ML security, robustness, and the security of agentic AI systems. Likely pathway: DPhil at Oxford AI Security or an intensive fellowship programme. My statistics training, 10+ years of engineering, and hands-on agentic AI work give me a concrete foundation — and my DevSecOps background brings a security-first lens that is underrepresented in ML research.

DPhil / OxfordML SecurityRobustnessAlignment

AI Safety Engineering

Specialising in AI security, ML infrastructure security, or building safety evaluation infrastructure — at labs like Anthropic, DeepMind, or dedicated safety organisations. This path offers faster time-to-impact and fills a genuine talent gap: there are far fewer safety-focused engineers than researchers in the ecosystem. My agentic AI technical leadership translates directly.

ML InfrastructureSecurity EngineeringSafety EvalsAI Labs

AI Governance

Combining technical depth with policy and governance work — either as a researcher at GovAI-type organisations, through policy fellowships, or in grantmaking roles that require technical evaluation capacity. My linguistics background and international social enterprise experience are comparative advantages for cross-jurisdictional AI governance work.

PolicyGovAIStandardsInternational

Current thinking leans toward Path 1 or 2 — staying deeply technical while pivoting to safety. The core open question: is the marginal safety contribution greater from an engineer who ships immediately, or from one who invests 2–4 years to become a researcher? If you have a perspective, I want to hear it.

Learning Path

Structured upskilling in AI engineering and safety — building from foundations to production systems.

Active

AI Engineering

6 weeks · 45+ hours

Structured to take you from foundations to production. Build real RAG pipelines, evaluation frameworks, and agentic systems.

1
LLM Fundamentals & RAG Foundations Build a working RAG pipeline that answers questions from real documentation
  • How LLMs work under the hood: pre-training, tokenization, attention, post-training
  • The RAG paradigm: when to use it, how to scope your project
  • Data ingestion: handling different document formats for your corpus
  • Your first end-to-end pipeline: index, retrieve, generate, test
  • Framework: RAG Project Scoping Framework

You build

Interactive Q&A system on MCP documentation

QdrantHaystackFastEmbedGemini
2
Chunking & Embeddings Test 7 chunking strategies on your data and find the winner
  • Why chunking is the most important decision in your RAG pipeline
  • 7 strategies compared: naive, sentence, recursive, semantic, hybrid content-aware
  • Embeddings deep dive: how text becomes vectors, FastEmbed vs Voyage
  • LLM-as-Judge evaluation with side-by-side comparison dashboards
  • Framework: Chunking Decision Framework

You build

Ranked chunking strategy backed by your own evaluation evidence

7 StrategiesVoyage AIStreamlitLLM-as-Judge
3
Advanced Retrieval Optimise retrieval accuracy from 70% to 90%+
  • Vector DB internals: how HNSW and approximate nearest neighbor works
  • Hybrid retrieval: combining dense and sparse search with Reciprocal Rank Fusion
  • Reranking architectures: when and why cross-encoders beat bi-encoders
  • Search space narrowing: metadata filtering and LLM-based routing
  • Framework: Retrieval Strategy Selection Framework

You build

Evidence-based retrieval strategy with 4 techniques evaluated head-to-head

Hybrid SearchBM25Voyage RerankerLLM Routing
4
Mastering Evaluation Build your own evaluation system with golden datasets
  • The evaluation challenge: why measuring RAG quality is harder than building it
  • Synthetic test generation, LLM-as-Judge, deterministic semantic metrics
  • Building golden datasets from scratch when no ground truth exists
  • Cross-validating 3 independent evaluation methods to find where your system breaks
  • Framework: RAG Evaluation Strategy Framework

You build

Golden dataset + multi-method evaluation framework

RAGASDeepEvalCustom JudgesTriangulation
5
Production Engineering Deploy a production chatbot with caching, memory, and observability
  • Production architecture: the real tradeoffs between latency, cost, and accuracy
  • Semantic caching with Redis for sub-50ms response times on repeated queries
  • Conversation memory, query rewriting, and intent-based routing
  • Observability, user feedback loops, Docker deployment
  • Framework: Production RAG Architecture

You build

Deployed production chatbot serving real requests

FastAPIRedisStreamlitOpikDocker
6
Agentic AI & Security Build a self-correcting RAG agent with adaptive routing
  • The intelligence spectrum: from single API calls to fully autonomous agents
  • Corrective RAG: grading retrieval quality and self-correcting when it fails
  • Adaptive agents: confidence-based tool selection across multiple sources
  • RAG security essentials: injection detection, retrieval validation, output sanitization
  • Framework: Intelligence Spectrum Framework

You build

Self-correcting CRAG system + adaptive multi-tool agent

CRAGHaystack AgentsTavilyGitHub MCP

Your Mission

The goal is to create legible output — to practice the craft, get feedback, find collaborators, test your fit, and improve understanding. This is the operating framework for everything that follows.

The Cheap Tests Ladder

Cheap tests require the least effort, time, or resources to reduce your uncertainty. Start very short (<1 hour each), progress to short (1–10 hours), then long (10–100 hours), then very long. Each rung gives a stronger signal that you're a good fit — without sunk-cost commitment.

Very short (<1h each)

  • Talk to people further along the path
  • Read abstracts, blog posts, newsletters
  • Watch YouTube videos on technical content
  • Run a GitHub repo, reproduce math proofs

Short (1–10h each)

  • Read research papers and agendas
  • Reproduce a toy version of a paper
  • Write a short post; estimate timelines
  • Attend an ML conference or workshop

Long (10–100h each)

  • Read a book; complete an online course
  • Replicate and extend a paper
  • Do an Apart Sprint hackathon
  • Form an inside view on timelines

Very long (<1000h each)

  • Internships and residencies
  • Research fellowships (MATS, AISC)
  • Masters programme or DPhil
  • Independent research project

Next Steps Framework

Read / Listen / Watch

Follow your nose through the resources below. Prioritise things that build intuition before depth.

Do Stuff — Create Legible Output

Summarise what you read. Write opinions. Code and do maths. Add to GitHub. Post to EA Forum or LessWrong to get feedback.

Network — Learn, Don't Sell

Reach out to people doing the work you want to do. Learn about their path, get feedback on your understanding, build relationships before you need them.

Apply — Even for Feedback

Every application is a cheap test. A rejection with feedback is worth the effort. Use jobs.80000hours.org and the EA Opportunities board.

Roadmap

Concrete next steps toward AI safety — structured by timeframe and grounded in 80,000 Hours career advising.

Opportunity Roles

Specific roles identified through 80,000 Hours advising as strong matches for this career transition.