Experimental tasks

What's next

The detection and segmentation paths are the validated core. This page documents the new task heads and training tricks we are actively building on top of them: classification, oriented boxes, pose, and parameter-efficient fine-tuning.

Overview

LibreYOLO is a multi-task framework: the same model family can wear different heads. Alongside the validated detect and segment paths, several new tasks are landing for the two flagship families, YOLO9 and RF-DETR. They all plug into the same LibreYOLO(...) factory and the same Results container, so once you know the core API these are small additions.

  • Classification for YOLO9 and RF-DETR. Whole-image labels with top-1 / top-5 probabilities.
  • Oriented bounding boxes (OBB) for YOLO9 and RF-DETR. Rotated boxes for aerial and document imagery.
  • Keypoints / pose for YOLO9 and RF-DETR. COCO-17 person keypoints.
  • LoRA / DoRA fine-tuning for RF-DETR. Adapt the transformer backbone with a fraction of the memory.

Read this first

Everything on this page is experimental and some of it is still in flight on feature branches. APIs, defaults, and label formats can change before they are promoted into the validated core. The Stability section tracks exactly where each feature stands.

Selecting a Task

Every family defaults to detection. You opt into another task in one of three ways, resolved in this order of precedence:

PriorityMechanismExample
1Explicit argumenttask="obb"
2Checkpoint metadatatask recorded inside a trained .pt
3Filename suffix-cls, -obb, -pose
4Family defaultdetect

Because the public LibreYOLO(...) factory expects a real weights file, the cleanest way to start one of these tasks from scratch is to construct the family class directly and pass task=. Trained checkpoints load back through the unified factory and auto-detect their task.

python
1from libreyolo import LibreYOLO, LibreYOLO9, LibreRFDETR
2
3# Start a task from scratch via the family class
4m = LibreYOLO9(None, size="t", task="classify", nb_classes=10)
5
6# Load a trained checkpoint via the unified factory (task auto-detected)
7m = LibreYOLO("LibreYOLO9t-obb.pt")

Image Classification

YOLO9: t, s, m, cRF-DETR: n, s, m, l

Classification gives a single label for a whole image. YOLO9 keeps its backbone and bolts on a lightweight classification head; RF-DETR reuses its DINOv2 encoder and adds a pooled linear head. Both run at 224 by 224.

Inference and the Probs result

Prediction returns a Results object whose probs field carries a softmax over the classes.

python
1from libreyolo import LibreYOLO
2
3model = LibreYOLO("LibreYOLO9t-cls.pt")
4r = model.predict("cat.jpg")
5
6print(r.probs.top1) # class id of the argmax
7print(r.probs.top1conf) # its probability
8print(r.probs.top5) # [id, id, id, id, id]
9print(model.names[r.probs.top1]) # human-readable label
FieldTypeMeaning
probs.top1intArgmax class id.
probs.top5list[int]Top-5 class ids, descending.
probs.top1conffloatProbability of the top-1 class.
probs.top5conftensorProbabilities of the top-5 classes.
probs.datatensorFull softmax vector.

Dataset format and training

Classification uses an ImageFolder layout, not a YAML. Class names are the sorted subfolder names, pinned to the train split.

dataset/
1dataset/
2 train/
3 cat/ img001.jpg ...
4 dog/ img104.jpg ...
5 val/
6 cat/ ...
7 dog/ ...

The data= argument accepts a folder, a .zip URL, or a known auto-download name (imagenette160 and imagenet10). The head is rebuilt to match the dataset's class count automatically.

python
1from libreyolo import LibreYOLO9
2
3model = LibreYOLO9(None, size="t", task="classify", nb_classes=10)
4result = model.train(
5 data="imagenette160", # folder, .zip URL, or known name
6 epochs=10, batch=64, imgsz=224,
7 optimizer="adamw", lr0=1e-3,
8)
9# Validation reports metrics/accuracy_top1 and metrics/accuracy_top5

Reference runs

Quick sanity checks from development: YOLO9-t reached top-1 0.79 / top-5 0.975 on imagenette160 (10 epochs), and RF-DETR-n reached top-1 0.69 / top-5 0.96 (6 epochs). RF-DETR benefits from internet access on first run to fetch its DINOv2 backbone; offline it falls back to random init.

Oriented Bounding Boxes (OBB)

YOLO9: t, s, m, cRF-DETR: n, s, m, l

Oriented boxes carry a rotation angle, which is what aerial imagery, documents, and densely packed scenes need. YOLO9 adds an angle branch to its detect head; RF-DETR adds a learnable angle embedding to its decoder.

Inference and the OBB result

Results expose an obb field. Angles are in radians.

python
1from libreyolo import LibreYOLO
2
3model = LibreYOLO("LibreYOLO9t-obb.pt")
4r = model.predict("aerial.jpg")
5
6for i in range(len(r.obb.cls)):
7 cx, cy, w, h, angle = r.obb.xywhr[i] # angle in radians
8 corners = r.obb.xyxyxyxy[i] # 4 (x, y) corner points
9 conf, cls = r.obb.conf[i], r.obb.cls[i]
FieldShapeMeaning
obb.xywhrN x 5[cx, cy, w, h, angle], angle in radians.
obb.xyxyxyxyN x 4 x 2Four corner points per box.
obb.confNConfidence per box.
obb.clsNClass id per box.

Dataset format and training

OBB uses a standard detect-style data YAML, but labels are YOLO-OBB text files with exactly nine fields per row: a class id followed by four normalized corner points. The angle is derived from the corners, not stored.

labels/aerial_001.txt
1# class_id x1 y1 x2 y2 x3 y3 x4 y4 (all normalized to [0, 1])
20 0.51 0.32 0.66 0.38 0.62 0.55 0.47 0.49
32 0.10 0.71 0.18 0.69 0.20 0.80 0.12 0.82

A plain detection checkpoint cannot be loaded directly into an OBB model. Going from detect to OBB is only allowed as a training warm-start: pass pretrained=True (YOLO9) or the explicit transfer flag on RF-DETR. Mosaic and mixup are disabled for OBB until corner-aware augmentation lands, and tiled inference is not supported.

python
1from libreyolo import LibreYOLO9
2
3model = LibreYOLO9(None, size="t", task="obb")
4# Warm-start the backbone from a same-family detect checkpoint
5result = model.train(data="dota8.yaml", pretrained=True, epochs=100, imgsz=640)
6
7# CLI equivalent
8# libreyolo train model=LibreYOLO9t.pt data=dota8.yaml --task obb

Validation uses rotated-IoU AP, reported as mAP50 and mAP50-95 under the OBB metric group.

Keypoints / Pose

YOLO9 + RF-DETR: landing soonYOLO-NAS, EdgeCrafter: available

Pose estimation predicts keypoints per detected instance. The default layout is COCO-17 person keypoints. YOLO9 and RF-DETR pose are person-only single-class in their first version; YOLO-NAS and EdgeCrafter pose are already available in the tree.

Inference and the Keypoints result

Results expose a keypoints field of shape (N, K, 3), where the last channel is visibility or confidence, in original-image pixel coordinates.

python
1from libreyolo import LibreYOLO
2
3model = LibreYOLO("LibreYOLO9t-pose.pt")
4r = model.predict("athletes.jpg")
5
6kp = r.keypoints
7print(kp.xy.shape) # (N, 17, 2) pixel coordinates
8print(kp.conf) # (N, 17) per-keypoint visibility / confidence
9print(kp.xyn) # normalized coordinates
10print(r.boxes.xyxy) # person boxes still come along
FieldShapeMeaning
keypoints.xyN x K x 2Pixel keypoint coordinates.
keypoints.xynN x K x 2Normalized keypoint coordinates.
keypoints.confN x KPer-keypoint visibility / confidence.
keypoints.has_visibleN x KBoolean visible mask.

Dataset format and training

Pose uses a data YAML that must declare kpt_shape: [K, 2|3] and, for horizontal-flip augmentation, a flip_idx. Labels are YOLO-pose text rows: a class id, a normalized box, then K keypoint triplets (x, y, v) with visibility v in {0, 1, 2}.

coco8-pose.yaml
1path: coco8-pose
2train: images/train
3val: images/val
4nc: 1
5names:
6 0: person
7kpt_shape: [17, 3]
8flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
python
1from libreyolo import LibreYOLO9
2
3# Warm-start from a detection checkpoint; the keypoint head is reinitialized
4model = LibreYOLO9("LibreYOLO9t.pt", size="t", task="pose")
5model.train(data="coco8-pose.yaml", epochs=100, imgsz=640)
6
7# Validation reports OKS-based AP via the pose validator

In active development

YOLO9 and RF-DETR pose are on a feature branch and have not been merged yet; treat the API above as the intended contract rather than a frozen one. YOLO-NAS pose weights are linked from upstream rather than mirrored and must be staged manually.

LoRA / DoRA Fine-Tuning

RF-DETR: n, s, m, l

LoRA-style adapters let you fine-tune RF-DETR's transformer backbone by training a small set of low-rank matrices while the base weights stay frozen. That cuts optimizer and gradient memory, which is ideal for adapting a strong checkpoint to a new domain on modest hardware.

Enabling it

The whole public API is a single flag on train(). There are no rank, alpha, or target-module knobs to tune; the recipe is fixed to a well-tested configuration. Under the hood the implementation uses DoRA (weight-decomposed LoRA, rank 16) applied to the DINOv2 attention query, key, and value projections.

python
1from libreyolo import LibreYOLO
2
3model = LibreYOLO("rf-detr-nano.pth") # sizes n, s, m, l
4result = model.train(
5 data="data.yaml",
6 lora=True, # DoRA on the frozen DINOv2 backbone
7 epochs=100, batch_size=4, lr=1e-4,
8)
9
10# Resume: LoRA is auto-detected from the checkpoint, no need to repeat the flag
11model.train(data="data.yaml", resume=True)
bash
1# CLI equivalent
2libreyolo train --model rf-detr-nano.pth --data data.yaml --lora

Checkpoints and export

  • Training checkpoints keep the adapter tensors, and the config records that LoRA was used, so loading and resuming rebuild the adapter graph automatically.
  • The detection head always stays trainable, so you can still adapt to a new class count.
  • export() merges the adapters back into dense weights. Exported models are plain and carry no peft dependency.
  • LoRA is RF-DETR only; passing lora=True to other families raises a clear error.

Install extra

LoRA training needs the adapter dependency: pip install "libreyolo[lora]", which pulls in the RF-DETR stack and peft. Exported (merged) models do not need it at inference time.

Stability

Where each feature stands today. Everything here is experimental; this table is the honest map.

FeatureFamiliesState
ClassificationYOLO9, RF-DETRPR open
Oriented boxes (OBB)YOLO9, RF-DETRExperimental
Keypoints / poseYOLO9, RF-DETRLanding soon
Keypoints / poseYOLO-NAS, EdgeCrafterAvailable
LoRA / DoRARF-DETRReviewed

Looking for the stable path?

For production work, the validated core is YOLO9 detection and RF-DETR detection and segmentation. See the core documentation for those, and LibreVLM for open-vocabulary detection.

Track progress and source on GitHub