Category 03 / HITL AI Operations
Human-in-the-loop operations for LLMs and AI products. Preference ranking for RLHF, prompt & response evaluation, adversarial red-teaming, agent & tool-use review, fine-tuning data curation, and policy moderation in the loop. Delivered from Nairobi, against your written rubrics.
Operating model
The six pillars
Every AI product has places where a model ships a decision and a human has to arbitrate it. These are the six places we do that, at production cadence.
Pairwise and N-way preference data for reward-model training. Reviewers rank completions against a written rubric — helpfulness, honesty, tone, task fit — not against personal taste.
Rubric-scored evaluation of model outputs — accuracy, safety, instruction-following, refusal quality. Runs on release candidates, production samples, and A/B comparisons.
Targeted adversarial prompts against safety policies — jailbreaks, policy bypass, persona attacks, prompt injection in tool contexts. Findings delivered with reproducer prompts and proposed mitigations.
Trajectory review for agents that call tools and take actions. Reviewers grade the full trace — plan, tool selection, arg construction, recovery — not just the final answer.
Instruction-tuning and SFT dataset construction. Prompts, ideal completions, rejected completions, schema-tagged domain coverage. Built to be the dataset you’d choose, not the one you had time to make.
Human review on the escalation path for model outputs flagged by an automated moderator. Policy-tagged decisions, written rationale, and feedback that trains the next version of the classifier.
How the loop runs
HITL work doesn’t have one shape. We operate against three, depending on whether you’re training, releasing, or running in production.
For reward-model training or SFT construction. Prompt pool -> reviewer pairs -> QA arbitration -> delivered dataset with per-operator provenance.
Candidate model scored against rubric dimensions before ship. Diff vs. prior version, regression flags, go/no-go signal to your release team.
Production traffic sampled into the review queue on a schedule. Escalations from your moderator classifier route to human arbitration and feed back into training.
Infrastructure & security
Reviewing model outputs means reviewing data that is often sensitive, pre-release, or policy-adjacent. The security envelope isn’t an add-on — it’s where the engagement starts.
Who’s in the loop
"HITL" is a misleading noun. There isn’t one human — there are four, each doing a different job, each accountable for a different artifact.
Applies the rubric to individual items. Trained against the gold set before production, scored against IAA in production.
Resolves reviewer disagreement. Writes a rationale tied to a rubric clause — not a preference.
Independent from delivery. Owns the rubric’s calibration, reports drift, proposes schema amendments back to your policy team.
Single point of contact with your policy or trust-and-safety team. Escalates rubric gaps, routes exceptions, closes the loop on amendments.
What the record looks like
RLHF, evaluation, red-teaming, agent, curation, moderation — from one trained reviewer pool.
Reviewer, arbiter, QA lead, policy liaison — each accountable for a distinct artifact.
Nairobi-based, English-fluent, domain-trained. Not crowdsourced, not anonymous.
Inter-annotator agreement carried over from our data-labelling and QA work, reported weekly.
Scope with us
We scope pool size, training time, and calibration together — then run a paid pilot against a 200-item gold set before you commit to steady-state volume.