What's next
The detection and segmentation paths are the validated core. This page documents the new task heads and training tricks we are actively building on top of them: classification, oriented boxes, pose, and parameter-efficient fine-tuning.
Overview
LibreYOLO is a multi-task framework: the same model family can wear different heads. Alongside the validated detect and segment paths, several new tasks are landing for the two flagship families, YOLO9 and RF-DETR. They all plug into the same LibreYOLO(...) factory and the same Results container, so once you know the core API these are small additions.
- Classification for YOLO9 and RF-DETR. Whole-image labels with top-1 / top-5 probabilities.
- Oriented bounding boxes (OBB) for YOLO9 and RF-DETR. Rotated boxes for aerial and document imagery.
- Keypoints / pose for YOLO9 and RF-DETR. COCO-17 person keypoints.
- LoRA / DoRA fine-tuning for RF-DETR. Adapt the transformer backbone with a fraction of the memory.
Read this first
Everything on this page is experimental and some of it is still in flight on feature branches. APIs, defaults, and label formats can change before they are promoted into the validated core. The Stability section tracks exactly where each feature stands.
Selecting a Task
Every family defaults to detection. You opt into another task in one of three ways, resolved in this order of precedence:
| Priority | Mechanism | Example |
|---|---|---|
| 1 | Explicit argument | task="obb" |
| 2 | Checkpoint metadata | task recorded inside a trained .pt |
| 3 | Filename suffix | -cls, -obb, -pose |
| 4 | Family default | detect |
Because the public LibreYOLO(...) factory expects a real weights file, the cleanest way to start one of these tasks from scratch is to construct the family class directly and pass task=. Trained checkpoints load back through the unified factory and auto-detect their task.
1 from libreyolo import LibreYOLO, LibreYOLO9, LibreRFDETR 2 3 # Start a task from scratch via the family class 4 m = LibreYOLO9(None, size="t", task="classify", nb_classes=10) 5 6 # Load a trained checkpoint via the unified factory (task auto-detected) 7 m = LibreYOLO("LibreYOLO9t-obb.pt")
Image Classification
Classification gives a single label for a whole image. YOLO9 keeps its backbone and bolts on a lightweight classification head; RF-DETR reuses its DINOv2 encoder and adds a pooled linear head. Both run at 224 by 224.
Inference and the Probs result
Prediction returns a Results object whose probs field carries a softmax over the classes.
1 from libreyolo import LibreYOLO 2 3 model = LibreYOLO("LibreYOLO9t-cls.pt") 4 r = model.predict("cat.jpg") 5 6 print(r.probs.top1) # class id of the argmax 7 print(r.probs.top1conf) # its probability 8 print(r.probs.top5) # [id, id, id, id, id] 9 print(model.names[r.probs.top1]) # human-readable label
| Field | Type | Meaning |
|---|---|---|
probs.top1 | int | Argmax class id. |
probs.top5 | list[int] | Top-5 class ids, descending. |
probs.top1conf | float | Probability of the top-1 class. |
probs.top5conf | tensor | Probabilities of the top-5 classes. |
probs.data | tensor | Full softmax vector. |
Dataset format and training
Classification uses an ImageFolder layout, not a YAML. Class names are the sorted subfolder names, pinned to the train split.
1 dataset/ 2 train/ 3 cat/ img001.jpg ... 4 dog/ img104.jpg ... 5 val/ 6 cat/ ... 7 dog/ ...
The data= argument accepts a folder, a .zip URL, or a known auto-download name (imagenette160 and imagenet10). The head is rebuilt to match the dataset's class count automatically.
1 from libreyolo import LibreYOLO9 2 3 model = LibreYOLO9(None, size="t", task="classify", nb_classes=10) 4 result = model.train( 5 data="imagenette160", # folder, .zip URL, or known name 6 epochs=10, batch=64, imgsz=224, 7 optimizer="adamw", lr0=1e-3, 8 ) 9 # Validation reports metrics/accuracy_top1 and metrics/accuracy_top5
Reference runs
Quick sanity checks from development: YOLO9-t reached top-1 0.79 / top-5 0.975 on imagenette160 (10 epochs), and RF-DETR-n reached top-1 0.69 / top-5 0.96 (6 epochs). RF-DETR benefits from internet access on first run to fetch its DINOv2 backbone; offline it falls back to random init.
Oriented Bounding Boxes (OBB)
Oriented boxes carry a rotation angle, which is what aerial imagery, documents, and densely packed scenes need. YOLO9 adds an angle branch to its detect head; RF-DETR adds a learnable angle embedding to its decoder.
Inference and the OBB result
Results expose an obb field. Angles are in radians.
1 from libreyolo import LibreYOLO 2 3 model = LibreYOLO("LibreYOLO9t-obb.pt") 4 r = model.predict("aerial.jpg") 5 6 for i in range(len(r.obb.cls)): 7 cx, cy, w, h, angle = r.obb.xywhr[i] # angle in radians 8 corners = r.obb.xyxyxyxy[i] # 4 (x, y) corner points 9 conf, cls = r.obb.conf[i], r.obb.cls[i]
| Field | Shape | Meaning |
|---|---|---|
obb.xywhr | N x 5 | [cx, cy, w, h, angle], angle in radians. |
obb.xyxyxyxy | N x 4 x 2 | Four corner points per box. |
obb.conf | N | Confidence per box. |
obb.cls | N | Class id per box. |
Dataset format and training
OBB uses a standard detect-style data YAML, but labels are YOLO-OBB text files with exactly nine fields per row: a class id followed by four normalized corner points. The angle is derived from the corners, not stored.
1 # class_id x1 y1 x2 y2 x3 y3 x4 y4 (all normalized to [0, 1]) 2 0 0.51 0.32 0.66 0.38 0.62 0.55 0.47 0.49 3 2 0.10 0.71 0.18 0.69 0.20 0.80 0.12 0.82
A plain detection checkpoint cannot be loaded directly into an OBB model. Going from detect to OBB is only allowed as a training warm-start: pass pretrained=True (YOLO9) or the explicit transfer flag on RF-DETR. Mosaic and mixup are disabled for OBB until corner-aware augmentation lands, and tiled inference is not supported.
1 from libreyolo import LibreYOLO9 2 3 model = LibreYOLO9(None, size="t", task="obb") 4 # Warm-start the backbone from a same-family detect checkpoint 5 result = model.train(data="dota8.yaml", pretrained=True, epochs=100, imgsz=640) 6 7 # CLI equivalent 8 # libreyolo train model=LibreYOLO9t.pt data=dota8.yaml --task obb
Validation uses rotated-IoU AP, reported as mAP50 and mAP50-95 under the OBB metric group.
Keypoints / Pose
Pose estimation predicts keypoints per detected instance. The default layout is COCO-17 person keypoints. YOLO9 and RF-DETR pose are person-only single-class in their first version; YOLO-NAS and EdgeCrafter pose are already available in the tree.
Inference and the Keypoints result
Results expose a keypoints field of shape (N, K, 3), where the last channel is visibility or confidence, in original-image pixel coordinates.
1 from libreyolo import LibreYOLO 2 3 model = LibreYOLO("LibreYOLO9t-pose.pt") 4 r = model.predict("athletes.jpg") 5 6 kp = r.keypoints 7 print(kp.xy.shape) # (N, 17, 2) pixel coordinates 8 print(kp.conf) # (N, 17) per-keypoint visibility / confidence 9 print(kp.xyn) # normalized coordinates 10 print(r.boxes.xyxy) # person boxes still come along
| Field | Shape | Meaning |
|---|---|---|
keypoints.xy | N x K x 2 | Pixel keypoint coordinates. |
keypoints.xyn | N x K x 2 | Normalized keypoint coordinates. |
keypoints.conf | N x K | Per-keypoint visibility / confidence. |
keypoints.has_visible | N x K | Boolean visible mask. |
Dataset format and training
Pose uses a data YAML that must declare kpt_shape: [K, 2|3] and, for horizontal-flip augmentation, a flip_idx. Labels are YOLO-pose text rows: a class id, a normalized box, then K keypoint triplets (x, y, v) with visibility v in {0, 1, 2}.
1 path: coco8-pose 2 train: images/train 3 val: images/val 4 nc: 1 5 names: 6 0: person 7 kpt_shape: [17, 3] 8 flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
1 from libreyolo import LibreYOLO9 2 3 # Warm-start from a detection checkpoint; the keypoint head is reinitialized 4 model = LibreYOLO9("LibreYOLO9t.pt", size="t", task="pose") 5 model.train(data="coco8-pose.yaml", epochs=100, imgsz=640) 6 7 # Validation reports OKS-based AP via the pose validator
In active development
YOLO9 and RF-DETR pose are on a feature branch and have not been merged yet; treat the API above as the intended contract rather than a frozen one. YOLO-NAS pose weights are linked from upstream rather than mirrored and must be staged manually.
LoRA / DoRA Fine-Tuning
LoRA-style adapters let you fine-tune RF-DETR's transformer backbone by training a small set of low-rank matrices while the base weights stay frozen. That cuts optimizer and gradient memory, which is ideal for adapting a strong checkpoint to a new domain on modest hardware.
Enabling it
The whole public API is a single flag on train(). There are no rank, alpha, or target-module knobs to tune; the recipe is fixed to a well-tested configuration. Under the hood the implementation uses DoRA (weight-decomposed LoRA, rank 16) applied to the DINOv2 attention query, key, and value projections.
1 from libreyolo import LibreYOLO 2 3 model = LibreYOLO("rf-detr-nano.pth") # sizes n, s, m, l 4 result = model.train( 5 data="data.yaml", 6 lora=True, # DoRA on the frozen DINOv2 backbone 7 epochs=100, batch_size=4, lr=1e-4, 8 ) 9 10 # Resume: LoRA is auto-detected from the checkpoint, no need to repeat the flag 11 model.train(data="data.yaml", resume=True)
1 # CLI equivalent 2 libreyolo train --model rf-detr-nano.pth --data data.yaml --lora
Checkpoints and export
- Training checkpoints keep the adapter tensors, and the config records that LoRA was used, so loading and resuming rebuild the adapter graph automatically.
- The detection head always stays trainable, so you can still adapt to a new class count.
export()merges the adapters back into dense weights. Exported models are plain and carry nopeftdependency.- LoRA is RF-DETR only; passing
lora=Trueto other families raises a clear error.
Install extra
LoRA training needs the adapter dependency: pip install "libreyolo[lora]", which pulls in the RF-DETR stack and peft. Exported (merged) models do not need it at inference time.
Stability
Where each feature stands today. Everything here is experimental; this table is the honest map.
| Feature | Families | State |
|---|---|---|
| Classification | YOLO9, RF-DETR | PR open |
| Oriented boxes (OBB) | YOLO9, RF-DETR | Experimental |
| Keypoints / pose | YOLO9, RF-DETR | Landing soon |
| Keypoints / pose | YOLO-NAS, EdgeCrafter | Available |
| LoRA / DoRA | RF-DETR | Reviewed |
Looking for the stable path?
For production work, the validated core is YOLO9 detection and RF-DETR detection and segmentation. See the core documentation for those, and LibreVLM for open-vocabulary detection.