NLP & Text Annotation · NER, Intent, Moderation

Techniques

Six techniques. Classical + LLM-era.

From NER and intent for classical pipelines to preference ranking and red-team prompts for LLM post-training — under the same review discipline.

Technique 01 / NER

Named entities. Domain ontologies.

Span-level entity tagging under domain-specific ontologies. Off-the-shelf (PERSON, ORG, LOC, DATE) or bespoke — drug mentions, legal citations, financial instruments, threat categories.

Nested & overlapping spans supported
Domain ontologies: legal, medical, financial, threat
Relation triples when schema requires

Intent & Sentiment

Utterance classification sample — to be added.

Technique 02 / Intent & Sentiment

Utterance-level classification.

Intent labels for CX routing, sentiment for feedback analytics. Multi-label support and schema-defined confidence bands rather than forced binary decisions.

Multi-label with priority rules
Confidence bands, not forced binary
Use: CX routing, voice-of-customer

Technique 03 / Span & Relation

Structured extraction. Relations linked.

Beyond flat entities — spans connected by typed relations for knowledge-graph construction, claim extraction, contract clause mapping. Tooling: Label Studio, Prodigy, Brat, V7 text.

Typed relation edges between spans
Attribute annotations on span + relation
Use: KG, RAG, contract AI

Span & Relation

Knowledge-graph extraction sample — to be added.

Content Moderation

Policy-label decision surface — to be added.

Technique 04 / Content Moderation

Policy-trained, not intuition-trained.

Toxicity, harm categories, platform-specific content rules. Operators train on your policy document, not on a generic "what feels bad" heuristic. Escalation path for high-severity calls.

Policy-document calibration
Severity / confidence dual-axis
Escalation for hard calls, not silence

Technique 05 / Preference & RLHF

Pairwise preference. With a rubric.

Response-pair preference labelling for LLM post-training — helpfulness, harmlessness, factuality. Every label carries a rubric-tied rationale so post-hoc drift is explicable.

Pairwise + k-way ranking
Rubric-tied rationales, not free-text
Use: RLHF, DPO, reward model training

Preference / RLHF

Pairwise preference sample — to be added.

Reranking & Relevance

Search / RAG relevance label sample — to be added.

Technique 06 / Reranking & Relevance

Relevance labels for search & RAG.

Query-document relevance on graded scales. Calibrated against your relevance rubric so rerank model training isn’t chasing an operator’s personal sense of "relevant".

Graded scales (0–3, 0–4) or binary
Rubric calibration before production
Use: search, RAG, recommendation

Where we’ve delivered

Reviewers who know the domain.

A legal span annotator is not labelling medical notes. Domain-trained reviewer pools keep judgment anchored.

Domain 01

Legal

Contract clause tagging, case-citation extraction, obligation & risk span labelling. Reviewer pool with legal training.

NER · span · relation

Domain 02

Medical

Clinical notes, drug mentions, condition / procedure tagging under reviewer oversight. PHI-safe environment.

NER · relation · intent

Domain 03

Financial

Earnings transcripts, instrument extraction, regulatory filings. Multi-language support for EMEA & African exchanges.

NER · sentiment · rerank

Domain 04

General / LLM

Preference labels, red-team prompts, instruction quality, content policy. For foundation-model labs and LLM-app teams.

Preference · moderation · rerank

Schema specifics

What lives in an NLP schema we adopt.

Text schemas fail in predictable places. These are the surfaces most often worth tightening before a pilot runs clean.

01 / Nested spans

Overlap conventions.

Rules for when entities can nest (PERSON inside ORG) vs. when they must be flat.

02 / Tokenisation

Where the boundary falls.

Punctuation, possessives, hyphenation. A consistent boundary rule or drift is inevitable.

03 / Multi-label

Priority tiebreaks.

When multiple classes apply, which wins for single-label downstream tasks.

04 / Abstain

"Don’t know" handled.

Explicit abstain class rather than forcing a decision — carries more signal.

05 / Policy severity

Dual-axis confidence.

Severity and reviewer confidence as separate axes for moderation workflows.

06 / Rubric ties

Rationale vocabulary.

Constrained rationale codes so preference-label rationales cluster cleanly for analysis.

Questions we get

NLP annotation, answered plainly.

Q 01

Do you support languages beyond English?

Yes — English plus selected African and European partner languages. The operator pool varies by language; we scope language coverage at kick-off.

Q 02

Can you do RLHF / preference labelling for foundation models?

Yes — pairwise and k-way preference, red-team prompts, instruction quality. Rubric-tied rationales rather than free-text so labels cluster for reward-model training.

Q 03

How are legal and medical reviewers qualified?

Domain-trained reviewer pools with specific qualifications per engagement — verified at kick-off. Medical work runs under PHI-safe environment and supervision.

Q 04

What tooling do you use?

Yours. Label Studio, Prodigy, Brat, V7 text, Labelbox, Scale, or internal tools. We train to the platform you already run.

Q 05

Do you handle content moderation at platform scale?

Yes — policy-document calibration, dual-axis severity/confidence, explicit escalation path for high-severity calls. Scope is set per engagement.

Judgment-at-volume for language data.