Experimental tasks

What's next

The detection and segmentation paths are the validated core. This page documents the new task heads and training tricks we are actively building on top of them: classification, oriented boxes, pose, and parameter-efficient fine-tuning.

Overview

LibreYOLO is a multi-task framework: the same model family can wear different heads. Alongside the validated detect and segment paths, several new tasks are landing for the two flagship families, YOLO9 and RF-DETR. They all plug into the same LibreYOLO(...) factory and the same Results container, so once you know the core API these are small additions.

Classification for YOLO9 and RF-DETR. Whole-image labels with top-1 / top-5 probabilities.
Oriented bounding boxes (OBB) for YOLO9 and RF-DETR. Rotated boxes for aerial and document imagery.
Keypoints / pose for YOLO9 and RF-DETR. COCO-17 person keypoints.
Small-object detection with YOLO9-P2, a YOLOv9 variant with a stride-4 scale for the 4-16 px objects of aerial and drone imagery, including a VisDrone research-preview checkpoint.
LoRA / DoRA fine-tuning for RF-DETR. Adapt the transformer backbone with a fraction of the memory.

Read this first

Everything on this page is experimental and some of it is still in flight on feature branches. APIs, defaults, and label formats can change before they are promoted into the validated core. The Stability section tracks exactly where each feature stands.

Selecting a Task

Every family defaults to detection. You opt into another task in one of three ways, resolved in this order of precedence:

Priority	Mechanism	Example
1	Explicit argument	`task="obb"`
2	Checkpoint metadata	task recorded inside a trained .pt
3	Filename suffix	`-cls`, `-obb`, `-pose`
4	Family default	detect

Because the public LibreYOLO(...) factory expects a real weights file, the cleanest way to start one of these tasks from scratch is to construct the family class directly and pass task=. Trained checkpoints load back through the unified factory and auto-detect their task.

python

1 from libreyolo import LibreYOLO, LibreYOLO9, LibreRFDETR
2 
3 # Start a task from scratch via the family class
4 m = LibreYOLO9(None, size="t", task="classify", nb_classes=10)
5 
6 # Load a trained checkpoint via the unified factory (task auto-detected)
7 m = LibreYOLO("LibreYOLO9t-obb.pt")

Image Classification

YOLO9: t, s, m, cRF-DETR: n, s, m, l

Classification gives a single label for a whole image. YOLO9 keeps its backbone and bolts on a lightweight classification head; RF-DETR reuses its DINOv2 encoder and adds a pooled linear head. Both run at 224 by 224.

Inference and the Probs result

Prediction returns a Results object whose probs field carries a softmax over the classes.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreYOLO9t-cls.pt")
4 r = model.predict("cat.jpg")
5 
6 print(r.probs.top1)        # class id of the argmax
7 print(r.probs.top1conf)    # its probability
8 print(r.probs.top5)        # [id, id, id, id, id]
9 print(model.names[r.probs.top1])  # human-readable label

Field	Type	Meaning
`probs.top1`	int	Argmax class id.
`probs.top5`	list[int]	Top-5 class ids, descending.
`probs.top1conf`	float	Probability of the top-1 class.
`probs.top5conf`	tensor	Probabilities of the top-5 classes.
`probs.data`	tensor	Full softmax vector.

Dataset format and training

Classification uses an ImageFolder layout, not a YAML. Class names are the sorted subfolder names, pinned to the train split.

dataset/

1 dataset/
2   train/
3     cat/   img001.jpg ...
4     dog/   img104.jpg ...
5   val/
6     cat/   ...
7     dog/   ...

The data= argument accepts a folder, a .zip URL, or a known auto-download name (imagenette160 and imagenet10). The head is rebuilt to match the dataset's class count automatically.

python

1 from libreyolo import LibreYOLO9
2 
3 model = LibreYOLO9(None, size="t", task="classify", nb_classes=10)
4 result = model.train(
5     data="imagenette160",   # folder, .zip URL, or known name
6     epochs=10, batch=64, imgsz=224,
7     optimizer="adamw", lr0=1e-3,
8 )
9 # Validation reports metrics/accuracy_top1 and metrics/accuracy_top5

Reference runs

Quick sanity checks from development: YOLO9-t reached top-1 0.79 / top-5 0.975 on imagenette160 (10 epochs), and RF-DETR-n reached top-1 0.69 / top-5 0.96 (6 epochs). RF-DETR benefits from internet access on first run to fetch its DINOv2 backbone; offline it falls back to random init.

Oriented Bounding Boxes (OBB)

YOLO9: t, s, m, cRF-DETR: n, s, m, l

Oriented boxes carry a rotation angle, which is what aerial imagery, documents, and densely packed scenes need. YOLO9 adds an angle branch to its detect head; RF-DETR adds a learnable angle embedding to its decoder.

Inference and the OBB result

Results expose an obb field. Angles are in radians.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreYOLO9t-obb.pt")
4 r = model.predict("aerial.jpg")
5 
6 for i in range(len(r.obb.cls)):
7     cx, cy, w, h, angle = r.obb.xywhr[i]  # angle in radians
8     corners = r.obb.xyxyxyxy[i]           # 4 (x, y) corner points
9     conf, cls = r.obb.conf[i], r.obb.cls[i]

Field	Shape	Meaning
`obb.xywhr`	N x 5	[cx, cy, w, h, angle], angle in radians.
`obb.xyxyxyxy`	N x 4 x 2	Four corner points per box.
`obb.conf`	N	Confidence per box.
`obb.cls`	N	Class id per box.

Dataset format and training

OBB uses a standard detect-style data YAML, but labels are YOLO-OBB text files with exactly nine fields per row: a class id followed by four normalized corner points. The angle is derived from the corners, not stored.

labels/aerial_001.txt

1 # class_id  x1 y1  x2 y2  x3 y3  x4 y4   (all normalized to [0, 1])
2 0  0.51 0.32  0.66 0.38  0.62 0.55  0.47 0.49
3 2  0.10 0.71  0.18 0.69  0.20 0.80  0.12 0.82

A plain detection checkpoint cannot be loaded directly into an OBB model. Going from detect to OBB is only allowed as a training warm-start: pass pretrained=True (YOLO9) or the explicit transfer flag on RF-DETR. Mosaic and mixup are disabled for OBB until corner-aware augmentation lands, and tiled inference is not supported.

python

1 from libreyolo import LibreYOLO9
2 
3 model = LibreYOLO9(None, size="t", task="obb")
4 # Warm-start the backbone from a same-family detect checkpoint
5 result = model.train(data="dota8.yaml", pretrained=True, epochs=100, imgsz=640)
6 
7 # CLI equivalent
8 # libreyolo train model=LibreYOLO9t.pt data=dota8.yaml --task obb

Validation uses rotated-IoU AP, reported as mAP50 and mAP50-95 under the OBB metric group.

Keypoints / Pose

YOLO9 + RF-DETR: landing soonYOLO-NAS, EdgeCrafter: available

Pose estimation predicts keypoints per detected instance. The default layout is COCO-17 person keypoints. YOLO9 and RF-DETR pose are person-only single-class in their first version; YOLO-NAS and EdgeCrafter pose are already available in the tree.

Inference and the Keypoints result

Results expose a keypoints field of shape (N, K, 3), where the last channel is visibility or confidence, in original-image pixel coordinates.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("LibreYOLO9t-pose.pt")
4 r = model.predict("athletes.jpg")
5 
6 kp = r.keypoints
7 print(kp.xy.shape)   # (N, 17, 2) pixel coordinates
8 print(kp.conf)       # (N, 17)    per-keypoint visibility / confidence
9 print(kp.xyn)        # normalized coordinates
10 print(r.boxes.xyxy)  # person boxes still come along

Field	Shape	Meaning
`keypoints.xy`	N x K x 2	Pixel keypoint coordinates.
`keypoints.xyn`	N x K x 2	Normalized keypoint coordinates.
`keypoints.conf`	N x K	Per-keypoint visibility / confidence.
`keypoints.has_visible`	N x K	Boolean visible mask.

Dataset format and training

Pose uses a data YAML that must declare kpt_shape: [K, 2|3] and, for horizontal-flip augmentation, a flip_idx. Labels are YOLO-pose text rows: a class id, a normalized box, then K keypoint triplets (x, y, v) with visibility v in {0, 1, 2}.

coco8-pose.yaml

1 path: coco8-pose
2 train: images/train
3 val: images/val
4 nc: 1
5 names:
6   0: person
7 kpt_shape: [17, 3]
8 flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]

python

1 from libreyolo import LibreYOLO9
2 
3 # Warm-start from a detection checkpoint; the keypoint head is reinitialized
4 model = LibreYOLO9("LibreYOLO9t.pt", size="t", task="pose")
5 model.train(data="coco8-pose.yaml", epochs=100, imgsz=640)
6 
7 # Validation reports OKS-based AP via the pose validator

In active development

YOLO9 and RF-DETR pose are on a feature branch and have not been merged yet; treat the API above as the intended contract rather than a frozen one. YOLO-NAS pose weights are linked from upstream rather than mirrored and must be staged manually.

Small-Object Detection (YOLO9-P2)

YOLO9-P2: t, sVisDrone research preview

YOLO9-P2 is YOLOv9 with a fourth detection scale at stride 4. Stock YOLOv9 detects at strides 8/16/32, so objects below ~16 px fall under its finest grid; the P2 head catches the 4-16 px range that dominates aerial and drone footage.

In a controlled A/B on VisDrone (same recipe, same resolution, same init; the only change was the P2 head), small-object AP improved by +49% over stock YOLOv9 of the same size. Adding higher training resolution and the bigger s size roughly doubled small-object AP across the project:

Model	AP	AP50	AP_small
Stock YOLO9-t @640 (control)	0.123	0.220	0.047
YOLO9-P2-t @640 (same-recipe A/B)	0.138	0.254	0.070
YOLO9-P2-s @768 (released preview)	0.226	0.385	0.141

VisDrone2019-DET val (548 images), pycocotools, single seed; treat ±1 point as noise.

The VisDrone research preview

A trained checkpoint is published as LibreYOLO9P2s-visdrone. The family is merged on dev but not yet in a PyPI release, so install from source until the next release.

python

1 from libreyolo import LibreYOLO
2 
3 # Auto-downloads from the LibreYOLO Hugging Face org
4 model = LibreYOLO("LibreYOLO9P2s-visdrone.pt")
5 
6 # Evaluate/predict at 768 - the resolution it was trained at
7 results = model.predict("aerial.jpg", imgsz=768, conf=0.25)

Non-commercial license

The preview checkpoint is trained on VisDrone2019-DET (AISKYEYE, Tianjin University), licensed CC BY-NC-SA 3.0: non-commercial use only, unlike LibreYOLO's MIT code and COCO-default weights. It detects the 10 VisDrone aerial classes, not COCO. The model card ships the exact training recipe, the per-epoch metrics, and a clean-room dataset converter so you can reproduce it or retrain on your own data.

When (not) to use it

Match the architecture to the arena. On COCO-like data ("small" means 16-32 px) the P2 head does not help; stock YOLOv9 is the better pick there. Reach for YOLO9-P2 when your objects live under ~16 px: drone and aerial footage, distant CCTV, satellite tiles. The extra scale roughly doubles compute and anchor count. That is the price of the stride-4 grid.

Training your own

YOLO9-P2 transfer-initializes from stock YOLOv9 detect checkpoints: the backbone, shared neck, and existing head towers load; the new P2 modules start fresh. The recipe below encodes what we learned the hard way on tiny-object data:

python

1 from libreyolo import LibreYOLO9P2
2 
3 model = LibreYOLO9P2(None, size="s")
4 model.train(
5     data="/abs/path/tiny_objects.yaml",
6     imgsz=768,                # resolution is the biggest lever for tiny objects
7     lr0=0.005,                # the family default 0.01 diverges on transfer init
8     mosaic_prob=0.0,          # mosaic tiling shrinks tiny objects below detectability
9     mixup_prob=0.0,
10     hsv_prob=1.0, flip_prob=0.5,
11     max_labels=600,           # dense aerial frames exceed the default 100-box cap
12     pretrained="LibreYOLO9s.pt",  # transfer init from stock YOLOv9
13     epochs=60,
14 )

LoRA / DoRA Fine-Tuning

RF-DETR: n, s, m, l

LoRA-style adapters let you fine-tune RF-DETR's transformer backbone by training a small set of low-rank matrices while the base weights stay frozen. That cuts optimizer and gradient memory, which is ideal for adapting a strong checkpoint to a new domain on modest hardware.

Enabling it

The whole public API is a single flag on train(). There are no rank, alpha, or target-module knobs to tune; the recipe is fixed to a well-tested configuration. Under the hood the implementation uses DoRA (weight-decomposed LoRA, rank 16) applied to the DINOv2 attention query, key, and value projections.

python

1 from libreyolo import LibreYOLO
2 
3 model = LibreYOLO("rf-detr-nano.pth")   # sizes n, s, m, l
4 result = model.train(
5     data="data.yaml",
6     lora=True,        # DoRA on the frozen DINOv2 backbone
7     epochs=100, batch_size=4, lr=1e-4,
8 )
9 
10 # Resume: LoRA is auto-detected from the checkpoint, no need to repeat the flag
11 model.train(data="data.yaml", resume=True)

bash

1 # CLI equivalent
2 libreyolo train --model rf-detr-nano.pth --data data.yaml --lora

Checkpoints and export

Training checkpoints keep the adapter tensors, and the config records that LoRA was used, so loading and resuming rebuild the adapter graph automatically.
The detection head always stays trainable, so you can still adapt to a new class count.
export() merges the adapters back into dense weights. Exported models are plain and carry no peft dependency.
LoRA is RF-DETR only; passing lora=True to other families raises a clear error.

Install extra

LoRA training needs the adapter dependency: pip install "libreyolo[lora]", which pulls in the RF-DETR stack and peft. Exported (merged) models do not need it at inference time.

Stability

Where each feature stands today. Everything here is experimental; this table is the honest map.

Feature	Families	State
Classification	YOLO9, RF-DETR	PR open
Oriented boxes (OBB)	YOLO9, RF-DETR	Experimental
Keypoints / pose	YOLO9, RF-DETR	Landing soon
Keypoints / pose	YOLO-NAS, EdgeCrafter	Available
Small-object detection	YOLO9-P2	Research preview
LoRA / DoRA	RF-DETR	Reviewed

Looking for the stable path?

For production work, the validated core is YOLO9 detection and RF-DETR detection and segmentation. See the core documentation for those, and LibreVLM for open-vocabulary detection.

Track progress and source on GitHub

1	from libreyolo import LibreYOLO, LibreYOLO9, LibreRFDETR
2
3	# Start a task from scratch via the family class
4	m = LibreYOLO9(None, size="t", task="classify", nb_classes=10)
5
6	# Load a trained checkpoint via the unified factory (task auto-detected)
7	m = LibreYOLO("LibreYOLO9t-obb.pt")

1	from libreyolo import LibreYOLO
2
3	model = LibreYOLO("LibreYOLO9t-cls.pt")
4	r = model.predict("cat.jpg")
5
6	print(r.probs.top1) # class id of the argmax
7	print(r.probs.top1conf) # its probability
8	print(r.probs.top5) # [id, id, id, id, id]
9	print(model.names[r.probs.top1]) # human-readable label

1	dataset/
2	train/
3	cat/ img001.jpg ...
4	dog/ img104.jpg ...
5	val/
6	cat/ ...
7	dog/ ...

1	from libreyolo import LibreYOLO9
2
3	model = LibreYOLO9(None, size="t", task="classify", nb_classes=10)
4	result = model.train(
5	data="imagenette160", # folder, .zip URL, or known name
6	epochs=10, batch=64, imgsz=224,
7	optimizer="adamw", lr0=1e-3,
8	)
9	# Validation reports metrics/accuracy_top1 and metrics/accuracy_top5

1	from libreyolo import LibreYOLO
2
3	model = LibreYOLO("LibreYOLO9t-obb.pt")
4	r = model.predict("aerial.jpg")
5
6	for i in range(len(r.obb.cls)):
7	cx, cy, w, h, angle = r.obb.xywhr[i] # angle in radians
8	corners = r.obb.xyxyxyxy[i] # 4 (x, y) corner points
9	conf, cls = r.obb.conf[i], r.obb.cls[i]

1	# class_id x1 y1 x2 y2 x3 y3 x4 y4 (all normalized to [0, 1])
2	0 0.51 0.32 0.66 0.38 0.62 0.55 0.47 0.49
3	2 0.10 0.71 0.18 0.69 0.20 0.80 0.12 0.82

1	from libreyolo import LibreYOLO9
2
3	model = LibreYOLO9(None, size="t", task="obb")
4	# Warm-start the backbone from a same-family detect checkpoint
5	result = model.train(data="dota8.yaml", pretrained=True, epochs=100, imgsz=640)
6
7	# CLI equivalent
8	# libreyolo train model=LibreYOLO9t.pt data=dota8.yaml --task obb

1	from libreyolo import LibreYOLO
2
3	model = LibreYOLO("LibreYOLO9t-pose.pt")
4	r = model.predict("athletes.jpg")
5
6	kp = r.keypoints
7	print(kp.xy.shape) # (N, 17, 2) pixel coordinates
8	print(kp.conf) # (N, 17) per-keypoint visibility / confidence
9	print(kp.xyn) # normalized coordinates
10	print(r.boxes.xyxy) # person boxes still come along

1	path: coco8-pose
2	train: images/train
3	val: images/val
4	nc: 1
5	names:
6	0: person
7	kpt_shape: [17, 3]
8	flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]

1	from libreyolo import LibreYOLO9
2
3	# Warm-start from a detection checkpoint; the keypoint head is reinitialized
4	model = LibreYOLO9("LibreYOLO9t.pt", size="t", task="pose")
5	model.train(data="coco8-pose.yaml", epochs=100, imgsz=640)
6
7	# Validation reports OKS-based AP via the pose validator

1	from libreyolo import LibreYOLO
2
3	# Auto-downloads from the LibreYOLO Hugging Face org
4	model = LibreYOLO("LibreYOLO9P2s-visdrone.pt")
5
6	# Evaluate/predict at 768 - the resolution it was trained at
7	results = model.predict("aerial.jpg", imgsz=768, conf=0.25)

1	from libreyolo import LibreYOLO9P2
2
3	model = LibreYOLO9P2(None, size="s")
4	model.train(
5	data="/abs/path/tiny_objects.yaml",
6	imgsz=768, # resolution is the biggest lever for tiny objects
7	lr0=0.005, # the family default 0.01 diverges on transfer init
8	mosaic_prob=0.0, # mosaic tiling shrinks tiny objects below detectability
9	mixup_prob=0.0,
10	hsv_prob=1.0, flip_prob=0.5,
11	max_labels=600, # dense aerial frames exceed the default 100-box cap
12	pretrained="LibreYOLO9s.pt", # transfer init from stock YOLOv9
13	epochs=60,
14	)

1	from libreyolo import LibreYOLO
2
3	model = LibreYOLO("rf-detr-nano.pth") # sizes n, s, m, l
4	result = model.train(
5	data="data.yaml",
6	lora=True, # DoRA on the frozen DINOv2 backbone
7	epochs=100, batch_size=4, lr=1e-4,
8	)
9
10	# Resume: LoRA is auto-detected from the checkpoint, no need to repeat the flag
11	model.train(data="data.yaml", resume=True)

1	# CLI equivalent
2	libreyolo train --model rf-detr-nano.pth --data data.yaml --lora