Testing Agentic Units (AUs)
Version: 1.0 Status: AU Testing Guide Last Updated: 2025-11-17
This document provides comprehensive testing guidelines for Agentic Units in the AGEniX ecosystem.
Table of Contents
- Overview
- Test Strategy
- Unit Testing
- Integration Testing
- Contract Testing
- Security Testing
- Performance Testing
- Test Organization
- CI/CD Integration
- Testing Checklist
1. Overview
Why Test AUs?
AUs execute in production worker environments with:
- Untrusted user inputs
- Zero-trust security model
- Limited resources
- Strict contracts
Comprehensive testing ensures:
- Contract compliance (
--describe, stdin/stdout, JSON) - Security (no command injection, path traversal)
- Reliability (handles malformed input gracefully)
- Performance (bounded memory, reasonable speed)
###Testing Philosophy
- Test-Driven Development (TDD): Write tests first
- 100% Contract Coverage: All required behaviors tested
- Security-First: Test attack scenarios explicitly
- Realistic Inputs: Use real-world examples
2. Test Strategy
2.1 Test Pyramid
��������������
E2E Tests (Few)
(via AGX)
��������������
����������������
Integration (Some)
Tests
����������������
������������������
Contract Tests (Many)
(--describe,
stdin/stdout)
������������������
��������������������
Unit Tests (Most)
(Core logic)
��������������������
2.2 Test Categories
| Test Type | Purpose | Frequency | Scope |
|---|---|---|---|
| Unit | Test individual functions | 100s | Milliseconds |
| Contract | Verify AU contract compliance | 10s | Seconds |
| Integration | Test with real inputs | 10s | Seconds |
| Security | Test attack vectors | 10s | Seconds |
| Performance | Benchmark speed/memory | Few | Minutes |
| E2E | Test via AGX planner | Few | Minutes |
3. Unit Testing
3.1 Core Logic Tests
Test pure functions independently:
// src/ocr.rs
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_regions() {
let engine_output = vec![
EngineRegion {
text: "Hello".to_string(),
confidence: 0.98,
bbox: [10.0, 20.0, 100.0, 50.0],
}
];
let regions = convert_to_au_regions(&engine_output);
assert_eq!(regions.len(), 1);
assert_eq!(regions[0].text, "Hello");
assert_eq!(regions[0].confidence, 0.98);
}
#[test]
fn test_combine_text() {
let regions = vec![
OcrRegion { text: "Hello".into(), confidence: 0.98, bbox: [...] },
OcrRegion { text: "World".into(), confidence: 0.95, bbox: [...] },
];
let combined = combine_region_text(®ions);
assert_eq!(combined, "Hello World");
}
}
3.2 Model Configuration Tests
// src/model.rs
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_model_config_from_path() {
let config = ModelConfig::from_cli(Some(PathBuf::from("/models/test.gguf")));
assert!(config.is_ok());
}
#[test]
fn test_model_config_requires_path() {
let config = ModelConfig::from_cli(None);
assert!(config.is_err());
assert!(config.unwrap_err().to_string().contains("MODEL_PATH"));
}
#[test]
fn test_model_config_rejects_nonexistent() {
let config = ModelConfig::from_cli(Some(PathBuf::from("/nonexistent.gguf")));
assert!(config.is_err());
}
}
4. Integration Testing
4.1 End-to-End Binary Execution
Test the AU as a black box:
// tests/integration_test.rs
use std::process::{Command, Stdio};
use std::io::Write;
#[test]
fn test_ocr_basic_png() {
// Load test image
let image_bytes = include_bytes!("fixtures/hello_world.png");
// Run AU
let mut child = Command::new("cargo")
.args(&["run", "--", "--model-path", "/models/test.gguf"])
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.unwrap();
// Write image to stdin
child.stdin.as_mut().unwrap().write_all(image_bytes).unwrap();
// Get output
let output = child.wait_with_output().unwrap();
// Verify success
assert!(output.status.success());
// Parse JSON output
let result: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();
assert_eq!(result["text"], "Hello World");
}
4.2 Testing with Real Files
#[test]
fn test_ocr_invoice() {
let image = std::fs::read("tests/fixtures/invoice.png").unwrap();
let output = Command::new("cargo")
.args(&["run", "--", "--model-path", "/models/ocr.gguf"])
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
.unwrap()
.stdin.unwrap()
.write_all(&image)
.unwrap();
let output = child.wait_with_output().unwrap();
let result: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();
// Verify invoice fields extracted
assert!(result["text"].as_str().unwrap().contains("INVOICE"));
assert!(result["text"].as_str().unwrap().contains("$"));
}
4.3 Test Fixtures
Create tests/fixtures/ directory with sample inputs:
tests/
�� fixtures/
�� hello_world.png # Simple text
�� multi_line.png # Multiple lines
�� rotated.png # Rotated text
�� low_quality.jpg # Low resolution
�� corrupt.png # Malformed PNG
�� empty.png # Blank image
�� integration_test.rs
5. Contract Testing
5.1 --describe Output
#[test]
fn test_describe_output_valid_json() {
let output = Command::new("cargo")
.args(&["run", "--", "--describe"])
.output()
.unwrap();
assert!(output.status.success());
// Must be valid JSON
let card: serde_json::Value = serde_json::from_slice(&output.stdout)
.expect("--describe output must be valid JSON");
// Required fields
assert!(card.get("name").is_some());
assert!(card.get("version").is_some());
assert!(card.get("description").is_some());
assert!(card.get("capabilities").is_some());
}
5.2 Schema Validation
use jsonschema::JSONSchema;
#[test]
fn test_describe_matches_schema() {
let output = Command::new("cargo")
.args(&["run", "--", "--describe"])
.output()
.unwrap();
let card: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();
// Load schema
let schema_json = std::fs::read_to_string("../../agenix/specs/describe.schema.json").unwrap();
let schema: serde_json::Value = serde_json::from_str(&schema_json).unwrap();
let compiled = JSONSchema::compile(&schema).unwrap();
// Validate
let result = compiled.validate(&card);
if let Err(errors) = result {
for error in errors {
eprintln!("Validation error: {}", error);
}
panic!("--describe output does not match schema");
}
}
5.3 stdin/stdout Contract
#[test]
fn test_stdout_is_valid_json() {
let image = include_bytes!("fixtures/hello_world.png");
let output = run_au(image);
// stdout must be parseable JSON
let json: serde_json::Value = serde_json::from_slice(&output.stdout)
.expect("stdout must be valid JSON");
// Must have expected fields
assert!(json.get("text").is_some());
assert!(json.get("regions").is_some());
}
#[test]
fn test_errors_go_to_stderr() {
let corrupt_image = b"not an image";
let output = run_au(corrupt_image);
// Should fail
assert!(!output.status.success());
// stdout should be empty or invalid JSON
assert!(serde_json::from_slice::<serde_json::Value>(&output.stdout).is_err());
// stderr should contain error message
let stderr = String::from_utf8_lossy(&output.stderr);
assert!(stderr.contains("Error") || stderr.contains("error"));
}
5.4 Exit Codes
#[test]
fn test_exit_code_success() {
let output = run_au(include_bytes!("fixtures/hello_world.png"));
assert_eq!(output.status.code(), Some(0));
}
#[test]
fn test_exit_code_invalid_input() {
let output = run_au(b"not an image");
assert_eq!(output.status.code(), Some(2)); // Invalid input
}
#[test]
fn test_exit_code_missing_model() {
let output = Command::new("cargo")
.args(&["run"]) // No --model-path
.stdin(Stdio::piped())
.output()
.unwrap();
assert_ne!(output.status.code(), Some(0));
}
6. Security Testing
6.1 Input Validation
#[test]
fn test_rejects_oversized_input() {
// Create 100MB input
let huge_input = vec![0u8; 100 * 1024 * 1024];
let output = run_au(&huge_input);
// Should fail gracefully
assert!(!output.status.success());
let stderr = String::from_utf8_lossy(&output.stderr);
assert!(stderr.contains("too large") || stderr.contains("size"));
}
#[test]
fn test_rejects_malformed_png() {
let bad_png = b"\x89PNG\x00\x00\x00\x00"; // Invalid PNG
let output = run_au(bad_png);
assert!(!output.status.success());
let stderr = String::from_utf8_lossy(&output.stderr);
assert!(stderr.to_lowercase().contains("invalid") ||
stderr.to_lowercase().contains("corrupt"));
}
6.2 Path Traversal
#[test]
fn test_rejects_path_traversal() {
let output = Command::new("cargo")
.args(&["run", "--", "--model-path", "../../../etc/passwd"])
.stdin(Stdio::piped())
.output()
.unwrap();
assert!(!output.status.success());
let stderr = String::from_utf8_lossy(&output.stderr);
assert!(stderr.contains("not found") || stderr.contains("traversal"));
}
#[test]
fn test_rejects_absolute_paths_outside_allowed() {
let output = Command::new("cargo")
.args(&["run", "--", "--model-path", "/etc/passwd"])
.stdin(Stdio::piped())
.output()
.unwrap();
assert!(!output.status.success());
}
6.3 Command Injection
#[test]
fn test_no_shell_execution() {
// Attempt command injection via model path
let output = Command::new("cargo")
.args(&["run", "--", "--model-path", "model.gguf; rm -rf /"])
.stdin(Stdio::piped())
.output()
.unwrap();
// Should fail (model not found), NOT execute shell command
assert!(!output.status.success());
// Ensure no shell execution occurred
let stderr = String::from_utf8_lossy(&output.stderr);
assert!(!stderr.contains("rm") || stderr.contains("not found"));
}
7. Performance Testing
7.1 Benchmarks
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn bench_ocr(c: &mut Criterion) {
let image = include_bytes!("../tests/fixtures/hello_world.png");
let config = ModelConfig::from_cli(Some(PathBuf::from("/models/test.gguf"))).unwrap();
c.bench_function("ocr_hello_world", |b| {
b.iter(|| {
run_ocr(black_box(image), &config, None)
});
});
}
criterion_group!(benches, bench_ocr);
criterion_main!(benches);
7.2 Memory Profiling
# Run with memory profiling
cargo build --release
valgrind --tool=massif --massif-out-file=massif.out \
./target/release/agx-ocr --model-path /models/test.gguf < test.png
# Analyze memory usage
ms_print massif.out
7.3 Load Testing
#[test]
#[ignore] // Long-running test
fn test_1000_images_no_leak() {
let image = include_bytes!("fixtures/hello_world.png");
for i in 0..1000 {
let output = run_au(image);
assert!(output.status.success(), "Failed on iteration {}", i);
}
// If we get here, no memory leak or panic
}
8. Test Organization
8.1 Directory Structure
agx-ocr/
�� src/
�� main.rs
�� ocr.rs
�� #[cfg(test)] mod tests { ... }
�� model.rs
�� #[cfg(test)] mod tests { ... }
�� types.rs
�� tests/
�� fixtures/
�� hello_world.png
�� invoice.png
�� corrupt.png
�� integration_test.rs
�� contract_test.rs
�� security_test.rs
�� benches/
�� ocr_benchmark.rs
8.2 Test Naming
// Unit tests: test_<function>_<scenario>
#[test]
fn test_parse_regions_empty() { }
#[test]
fn test_parse_regions_multiple() { }
// Integration tests: test_<feature>_<case>
#[test]
fn test_ocr_hello_world() { }
#[test]
fn test_ocr_rotated_text() { }
// Security tests: test_<attack>_<defense>
#[test]
fn test_command_injection_rejected() { }
#[test]
fn test_path_traversal_blocked() { }
9. CI/CD Integration
9.1 GitHub Actions
name: Test AU
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
- name: Run unit tests
run: cargo test --lib
- name: Run integration tests
run: cargo test --test '*'
- name: Run contract tests
run: |
cargo build --release
./tests/validate_contract.sh
- name: Check --describe schema
run: |
cargo run -- --describe > describe.json
jsonschema -i describe.json ../agenix/specs/describe.schema.json
- name: Security audit
run: cargo audit
- name: Coverage
run: |
cargo install cargo-tarpaulin
cargo tarpaulin --fail-under 80
9.2 Contract Validation Script
#!/bin/bash
# tests/validate_contract.sh
set -e
AU_BIN="./target/release/agx-ocr"
echo "Testing --describe..."
$AU_BIN --describe > /tmp/describe.json
jq empty /tmp/describe.json || (echo "Invalid JSON" && exit 1)
echo "Testing stdin/stdout..."
cat tests/fixtures/hello_world.png | $AU_BIN --model-path /models/test.gguf > /tmp/out.json
jq empty /tmp/out.json || (echo "Invalid output JSON" && exit 1)
echo "Testing exit codes..."
echo "invalid" | $AU_BIN --model-path /models/test.gguf > /dev/null 2>&1 || [ $? -eq 2 ]
echo "All contract tests passed!"
10. Testing Checklist
Before merging AU code, verify:
Core Contract
-
--describeoutputs valid JSON matching schema - stdin � JSON stdout works for valid input
- Errors go to stderr only
- Exit codes correct (0, 1, 2)
- Binary stdin handled correctly
Unit Tests
- All core functions have unit tests
- Edge cases tested (empty, null, max values)
- Error paths tested
Integration Tests
- At least 3 real-world test fixtures
- Happy path: valid input � expected output
- Sad path: invalid input � graceful error
Security Tests
- Oversized input rejected
- Malformed input handled gracefully
- Path traversal blocked
- No command injection vulnerabilities
- No secrets in logs
Performance Tests
- Typical input completes in < 10s
- Memory usage bounded
- No memory leaks (1000 iterations)
CI/CD
- Tests run automatically on push
- Contract validation script passes
- Coverage > 80%
- Security audit clean
Related Documentation
- AU Specification - AU contract requirements
- Testing Strategy - General testing guidelines
- Security Guidelines - Security testing details
- Example AU - Reference implementation with tests
Maintained by: AGX Core Team Review cycle: Quarterly Questions? Open an issue in the AU repository