Computer Vision Annotation · Impact Outsourcing

Techniques

Six techniques. One review discipline.

Every technique runs through the same multi-tier review pipeline. Schema changes. The discipline doesn’t.

Technique 01 / Bounding Box

Tight 2D boxes. Occlusion-aware.

Class-labelled bounding boxes with occlusion and truncation flags as part of the schema. The workhorse of object detection pipelines — ADAS, retail shelf, agri-pest, surveillance.

Tight-to-object fit, not generous padding
Occlusion / truncation flags
Multi-class hierarchy support
Crowd / group-handling conventions

Bounding box annotation on a traffic scene

Polygon annotation on a street scene with truck and trees labelled as instances

Technique 02 / Polygonal

Pixel-accurate instance outlines.

Vertex-level polygons for irregular shapes where bounding boxes waste context. Retail product shelving, medical lesions, agri-crop rows, defect detection on manufactured goods.

Per-instance identity preserved
Vertex density tuned to object shape
Interior hole & concavity handling

Technique 03 / Semantic Segmentation

Per-pixel class masks. No gaps.

Dense pixel-level labels for scene understanding — drivable area, vegetation, person, vehicle, infrastructure. Multi-class masks stitched so no pixel is left unlabeled between classes.

Closed-class taxonomy enforced
Boundary agreement scored in QA
Tooling: CVAT, V7, Label Studio, Encord

Hand-pose keypoint annotation showing 21 joint landmarks connected by coloured bone segments across the fingers and palm.

Technique 04 / Keypoint & Pose

Skeletal landmarks. Fixed or bespoke.

Skeletal landmark annotation for body, face, and hand. Standard topologies (COCO, OpenPose) or client-defined skeletons. Occlusion visibility flags per keypoint.

Fixed or client-defined skeletons
Visibility flag per keypoint
Use: fitness, AR, robotics, sports

Technique 05 / Video Tracking

IDs that persist across frames.

Multi-object tracking with stable identity across frames, re-entry handling, and keyframe interpolation. Reviewers trained to resolve occlusion re-ID — not just the first frame.

Stable track ID across re-entry
Keyframe + interpolation workflow
Occlusion resolution conventions

Multi-frame video tracking annotation: five frames showing stable track IDs across re-entry, a keyframe + interpolation timeline, and occlusion resolution examples.

3D cuboid annotation on a point-cloud street scene with orientation-aware boxes around vehicles, pedestrians, and traffic lights.

Technique 06 / 3D Cuboid on 2D

Orientation-aware boxes, mono-vision.

2D-projected 3D cuboids for AV stacks without LiDAR. Encodes heading and object size from monocular frames. Often used as a bridge to full 3D labelling.

Heading + size from monocular video
Calibration to camera intrinsics
Common bridge to LiDAR annotation

Where we’ve delivered

Domain-specialised operators. Not generalists.

CV operators are pooled by domain — the same reviewer who can label medical lesions isn’t also labelling AV dashcam. Domain trains judgment.

Domain 01

Autonomy & ADAS

Dashcam, surround, sensor-fused datasets for perception training. Night / rain / edge-case prioritised.

Box · segmentation · tracking

Domain 02

Retail & shelf

Planogram compliance, product recognition, stock-out detection. High SKU density, long-tail classes.

Polygon · box · count

Domain 03

Medical imaging

Lesion, organ, and anatomical landmark annotation under reviewer oversight. PHI-safe environment.

Polygon · segmentation · keypoint

Domain 04

Agri & geospatial

Crop row, weed, pest, and yield-estimation labels from drone and satellite imagery. Multi-season.

Polygon · segmentation

Schema specifics

What lives in a CV schema we adopt.

Your schema is the source of truth. These are the schema surfaces that most often need tightening before a pilot runs clean.

01 / Class hierarchy

Parent / child taxonomy.

Multi-level class trees with inheritance. Resolved before calibration, versioned through the engagement.

02 / Occlusion

Visibility bands, not yes/no.

Tri-state or percent bands rather than binary flags — carries more useful signal downstream.

03 / Crowd handling

Per-instance vs. group label.

Rules for when a crowd becomes a single "crowd" region rather than N instances.

04 / Truncation

Frame-edge conventions.

How to label objects cut by the frame — extend-the-box, clip-to-frame, or skip.

05 / Polygon vertices

Density per object.

Minimum vertex density tied to object type. Pedestrians ≠ trucks ≠ medical lesions.

06 / Tracking IDs

Re-entry rules.

When a disappearing-then-reappearing object keeps its ID vs. takes a new one.

Questions we get

CV annotation, answered plainly.

Q 01

Do you work inside our labeling tool, or yours?

Yours. We train operators to the platform you already run — CVAT, Label Studio, V7, Labelbox, Encord, Scale, SuperAnnotate, Roboflow, or an internal tool. No migration required.

Q 02

What’s the pilot-to-production timeline?

Typically two to four weeks from scope to steady-state, depending on schema complexity. Pilot runs against a co-drafted gold set; schema tightens before scaling headcount.

Q 03

How do you handle edge cases not covered by the schema?

Route-to-client, not guess-and-log. Ambiguous frames surface to your data lead with a proposed resolution and are not silently labelled wrong.

Q 04

Can you operate in a secured / air-gapped environment?

Yes, for medical, defense, and regulated financial datasets. Operators work inside the secured environment with audited access. Specific envelope confirmed at scope.

Q 05

What’s your accuracy commitment?

Written per engagement. An IAA threshold against a gold reference co-drafted with your team — miss it and rework is on us.

Labels the perception stack actually trusts.