Testing Agentic Units (AUs)

Version: 1.0 Status: AU Testing Guide Last Updated: 2025-11-17

This document provides comprehensive testing guidelines for Agentic Units in the AGEniX ecosystem.

Overview
Test Strategy
Unit Testing
Integration Testing
Contract Testing
Security Testing
Performance Testing
Test Organization
CI/CD Integration
Testing Checklist

1. Overview

Why Test AUs?

AUs execute in production worker environments with:

Untrusted user inputs
Zero-trust security model
Limited resources
Strict contracts

Comprehensive testing ensures:

Contract compliance (--describe, stdin/stdout, JSON)
Security (no command injection, path traversal)
Reliability (handles malformed input gracefully)
Performance (bounded memory, reasonable speed)

###Testing Philosophy

Test-Driven Development (TDD): Write tests first
100% Contract Coverage: All required behaviors tested
Security-First: Test attack scenarios explicitly
Realistic Inputs: Use real-world examples

2. Test Strategy

2.1 Test Pyramid

        ��������������
          E2E Tests     (Few)
          (via AGX)   
        ��������������
       ����������������
         Integration      (Some)
            Tests      
       ����������������
      ������������������
         Contract Tests   (Many)
        (--describe,    
         stdin/stdout)  
      ������������������
     ��������������������
         Unit Tests         (Most)
       (Core logic)      
     ��������������������

2.2 Test Categories

Test Type	Purpose	Frequency	Scope
Unit	Test individual functions	100s	Milliseconds
Contract	Verify AU contract compliance	10s	Seconds
Integration	Test with real inputs	10s	Seconds
Security	Test attack vectors	10s	Seconds
Performance	Benchmark speed/memory	Few	Minutes
E2E	Test via AGX planner	Few	Minutes

3. Unit Testing

3.1 Core Logic Tests

Test pure functions independently:

// src/ocr.rs
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_parse_regions() {
        let engine_output = vec![
            EngineRegion {
                text: "Hello".to_string(),
                confidence: 0.98,
                bbox: [10.0, 20.0, 100.0, 50.0],
            }
        ];

        let regions = convert_to_au_regions(&engine_output);

        assert_eq!(regions.len(), 1);
        assert_eq!(regions[0].text, "Hello");
        assert_eq!(regions[0].confidence, 0.98);
    }

    #[test]
    fn test_combine_text() {
        let regions = vec![
            OcrRegion { text: "Hello".into(), confidence: 0.98, bbox: [...] },
            OcrRegion { text: "World".into(), confidence: 0.95, bbox: [...] },
        ];

        let combined = combine_region_text(&regions);
        assert_eq!(combined, "Hello World");
    }
}

3.2 Model Configuration Tests

// src/model.rs
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_model_config_from_path() {
        let config = ModelConfig::from_cli(Some(PathBuf::from("/models/test.gguf")));
        assert!(config.is_ok());
    }

    #[test]
    fn test_model_config_requires_path() {
        let config = ModelConfig::from_cli(None);
        assert!(config.is_err());
        assert!(config.unwrap_err().to_string().contains("MODEL_PATH"));
    }

    #[test]
    fn test_model_config_rejects_nonexistent() {
        let config = ModelConfig::from_cli(Some(PathBuf::from("/nonexistent.gguf")));
        assert!(config.is_err());
    }
}

4. Integration Testing

4.1 End-to-End Binary Execution

Test the AU as a black box:

// tests/integration_test.rs
use std::process::{Command, Stdio};
use std::io::Write;

#[test]
fn test_ocr_basic_png() {
    // Load test image
    let image_bytes = include_bytes!("fixtures/hello_world.png");

    // Run AU
    let mut child = Command::new("cargo")
        .args(&["run", "--", "--model-path", "/models/test.gguf"])
        .stdin(Stdio::piped())
        .stdout(Stdio::piped())
        .stderr(Stdio::piped())
        .spawn()
        .unwrap();

    // Write image to stdin
    child.stdin.as_mut().unwrap().write_all(image_bytes).unwrap();

    // Get output
    let output = child.wait_with_output().unwrap();

    // Verify success
    assert!(output.status.success());

    // Parse JSON output
    let result: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();
    assert_eq!(result["text"], "Hello World");
}

4.2 Testing with Real Files

#[test]
fn test_ocr_invoice() {
    let image = std::fs::read("tests/fixtures/invoice.png").unwrap();

    let output = Command::new("cargo")
        .args(&["run", "--", "--model-path", "/models/ocr.gguf"])
        .stdin(Stdio::piped())
        .stdout(Stdio::piped())
        .spawn()
        .unwrap()
        .stdin.unwrap()
        .write_all(&image)
        .unwrap();

    let output = child.wait_with_output().unwrap();
    let result: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();

    // Verify invoice fields extracted
    assert!(result["text"].as_str().unwrap().contains("INVOICE"));
    assert!(result["text"].as_str().unwrap().contains("$"));
}

4.3 Test Fixtures

Create tests/fixtures/ directory with sample inputs:

tests/
�� fixtures/
   �� hello_world.png       # Simple text
   �� multi_line.png         # Multiple lines
   �� rotated.png            # Rotated text
   �� low_quality.jpg        # Low resolution
   �� corrupt.png            # Malformed PNG
   �� empty.png              # Blank image
�� integration_test.rs

5. Contract Testing

5.1 --describe Output

#[test]
fn test_describe_output_valid_json() {
    let output = Command::new("cargo")
        .args(&["run", "--", "--describe"])
        .output()
        .unwrap();

    assert!(output.status.success());

    // Must be valid JSON
    let card: serde_json::Value = serde_json::from_slice(&output.stdout)
        .expect("--describe output must be valid JSON");

    // Required fields
    assert!(card.get("name").is_some());
    assert!(card.get("version").is_some());
    assert!(card.get("description").is_some());
    assert!(card.get("capabilities").is_some());
}

5.2 Schema Validation

use jsonschema::JSONSchema;

#[test]
fn test_describe_matches_schema() {
    let output = Command::new("cargo")
        .args(&["run", "--", "--describe"])
        .output()
        .unwrap();

    let card: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();

    // Load schema
    let schema_json = std::fs::read_to_string("../../agenix/specs/describe.schema.json").unwrap();
    let schema: serde_json::Value = serde_json::from_str(&schema_json).unwrap();

    let compiled = JSONSchema::compile(&schema).unwrap();

    // Validate
    let result = compiled.validate(&card);
    if let Err(errors) = result {
        for error in errors {
            eprintln!("Validation error: {}", error);
        }
        panic!("--describe output does not match schema");
    }
}

5.3 stdin/stdout Contract

#[test]
fn test_stdout_is_valid_json() {
    let image = include_bytes!("fixtures/hello_world.png");

    let output = run_au(image);

    // stdout must be parseable JSON
    let json: serde_json::Value = serde_json::from_slice(&output.stdout)
        .expect("stdout must be valid JSON");

    // Must have expected fields
    assert!(json.get("text").is_some());
    assert!(json.get("regions").is_some());
}

#[test]
fn test_errors_go_to_stderr() {
    let corrupt_image = b"not an image";

    let output = run_au(corrupt_image);

    // Should fail
    assert!(!output.status.success());

    // stdout should be empty or invalid JSON
    assert!(serde_json::from_slice::<serde_json::Value>(&output.stdout).is_err());

    // stderr should contain error message
    let stderr = String::from_utf8_lossy(&output.stderr);
    assert!(stderr.contains("Error") || stderr.contains("error"));
}

5.4 Exit Codes

#[test]
fn test_exit_code_success() {
    let output = run_au(include_bytes!("fixtures/hello_world.png"));
    assert_eq!(output.status.code(), Some(0));
}

#[test]
fn test_exit_code_invalid_input() {
    let output = run_au(b"not an image");
    assert_eq!(output.status.code(), Some(2)); // Invalid input
}

#[test]
fn test_exit_code_missing_model() {
    let output = Command::new("cargo")
        .args(&["run"])  // No --model-path
        .stdin(Stdio::piped())
        .output()
        .unwrap();

    assert_ne!(output.status.code(), Some(0));
}

6. Security Testing

6.1 Input Validation

#[test]
fn test_rejects_oversized_input() {
    // Create 100MB input
    let huge_input = vec![0u8; 100 * 1024 * 1024];

    let output = run_au(&huge_input);

    // Should fail gracefully
    assert!(!output.status.success());

    let stderr = String::from_utf8_lossy(&output.stderr);
    assert!(stderr.contains("too large") || stderr.contains("size"));
}

#[test]
fn test_rejects_malformed_png() {
    let bad_png = b"\x89PNG\x00\x00\x00\x00"; // Invalid PNG

    let output = run_au(bad_png);

    assert!(!output.status.success());

    let stderr = String::from_utf8_lossy(&output.stderr);
    assert!(stderr.to_lowercase().contains("invalid") ||
            stderr.to_lowercase().contains("corrupt"));
}

6.2 Path Traversal

#[test]
fn test_rejects_path_traversal() {
    let output = Command::new("cargo")
        .args(&["run", "--", "--model-path", "../../../etc/passwd"])
        .stdin(Stdio::piped())
        .output()
        .unwrap();

    assert!(!output.status.success());

    let stderr = String::from_utf8_lossy(&output.stderr);
    assert!(stderr.contains("not found") || stderr.contains("traversal"));
}

#[test]
fn test_rejects_absolute_paths_outside_allowed() {
    let output = Command::new("cargo")
        .args(&["run", "--", "--model-path", "/etc/passwd"])
        .stdin(Stdio::piped())
        .output()
        .unwrap();

    assert!(!output.status.success());
}

6.3 Command Injection

#[test]
fn test_no_shell_execution() {
    // Attempt command injection via model path
    let output = Command::new("cargo")
        .args(&["run", "--", "--model-path", "model.gguf; rm -rf /"])
        .stdin(Stdio::piped())
        .output()
        .unwrap();

    // Should fail (model not found), NOT execute shell command
    assert!(!output.status.success());

    // Ensure no shell execution occurred
    let stderr = String::from_utf8_lossy(&output.stderr);
    assert!(!stderr.contains("rm") || stderr.contains("not found"));
}

7. Performance Testing

7.1 Benchmarks

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn bench_ocr(c: &mut Criterion) {
    let image = include_bytes!("../tests/fixtures/hello_world.png");
    let config = ModelConfig::from_cli(Some(PathBuf::from("/models/test.gguf"))).unwrap();

    c.bench_function("ocr_hello_world", |b| {
        b.iter(|| {
            run_ocr(black_box(image), &config, None)
        });
    });
}

criterion_group!(benches, bench_ocr);
criterion_main!(benches);

7.2 Memory Profiling

# Run with memory profiling
cargo build --release
valgrind --tool=massif --massif-out-file=massif.out \
    ./target/release/agx-ocr --model-path /models/test.gguf < test.png

# Analyze memory usage
ms_print massif.out

7.3 Load Testing

#[test]
#[ignore] // Long-running test
fn test_1000_images_no_leak() {
    let image = include_bytes!("fixtures/hello_world.png");

    for i in 0..1000 {
        let output = run_au(image);
        assert!(output.status.success(), "Failed on iteration {}", i);
    }

    // If we get here, no memory leak or panic
}

8. Test Organization

8.1 Directory Structure

agx-ocr/
�� src/
   �� main.rs
   �� ocr.rs
      �� #[cfg(test)] mod tests { ... }
   �� model.rs
      �� #[cfg(test)] mod tests { ... }
   �� types.rs
�� tests/
   �� fixtures/
      �� hello_world.png
      �� invoice.png
      �� corrupt.png
   �� integration_test.rs
   �� contract_test.rs
   �� security_test.rs
�� benches/
    �� ocr_benchmark.rs

8.2 Test Naming

// Unit tests: test_<function>_<scenario>
#[test]
fn test_parse_regions_empty() { }

#[test]
fn test_parse_regions_multiple() { }

// Integration tests: test_<feature>_<case>
#[test]
fn test_ocr_hello_world() { }

#[test]
fn test_ocr_rotated_text() { }

// Security tests: test_<attack>_<defense>
#[test]
fn test_command_injection_rejected() { }

#[test]
fn test_path_traversal_blocked() { }

9. CI/CD Integration

9.1 GitHub Actions

name: Test AU

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install Rust
        uses: actions-rs/toolchain@v1
        with:
          toolchain: stable

      - name: Run unit tests
        run: cargo test --lib

      - name: Run integration tests
        run: cargo test --test '*'

      - name: Run contract tests
        run: |
          cargo build --release
          ./tests/validate_contract.sh

      - name: Check --describe schema
        run: |
          cargo run -- --describe > describe.json
          jsonschema -i describe.json ../agenix/specs/describe.schema.json

      - name: Security audit
        run: cargo audit

      - name: Coverage
        run: |
          cargo install cargo-tarpaulin
          cargo tarpaulin --fail-under 80

9.2 Contract Validation Script

#!/bin/bash
# tests/validate_contract.sh

set -e

AU_BIN="./target/release/agx-ocr"

echo "Testing --describe..."
$AU_BIN --describe > /tmp/describe.json
jq empty /tmp/describe.json || (echo "Invalid JSON" && exit 1)

echo "Testing stdin/stdout..."
cat tests/fixtures/hello_world.png | $AU_BIN --model-path /models/test.gguf > /tmp/out.json
jq empty /tmp/out.json || (echo "Invalid output JSON" && exit 1)

echo "Testing exit codes..."
echo "invalid" | $AU_BIN --model-path /models/test.gguf > /dev/null 2>&1 || [ $? -eq 2 ]

echo "All contract tests passed!"

10. Testing Checklist

Before merging AU code, verify:

Core Contract

--describe outputs valid JSON matching schema
stdin � JSON stdout works for valid input
Errors go to stderr only
Exit codes correct (0, 1, 2)
Binary stdin handled correctly

Unit Tests

All core functions have unit tests
Edge cases tested (empty, null, max values)
Error paths tested

Integration Tests

At least 3 real-world test fixtures
Happy path: valid input � expected output
Sad path: invalid input � graceful error

Security Tests

Performance Tests

Typical input completes in < 10s
Memory usage bounded
No memory leaks (1000 iterations)

CI/CD

Tests run automatically on push
Contract validation script passes
Coverage > 80%
Security audit clean

AU Specification - AU contract requirements
Testing Strategy - General testing guidelines
Security Guidelines - Security testing details
Example AU - Reference implementation with tests

Maintained by: AGX Core Team Review cycle: Quarterly Questions? Open an issue in the AU repository

Table of Contents​

1. Overview​

Why Test AUs?​

2. Test Strategy​

2.1 Test Pyramid​

2.2 Test Categories​

3. Unit Testing​

3.1 Core Logic Tests​

3.2 Model Configuration Tests​

4. Integration Testing​

4.1 End-to-End Binary Execution​

4.2 Testing with Real Files​

4.3 Test Fixtures​

5. Contract Testing​

5.1 --describe Output​

5.2 Schema Validation​

5.3 stdin/stdout Contract​

5.4 Exit Codes​

6. Security Testing​

6.1 Input Validation​

6.2 Path Traversal​

6.3 Command Injection​

7. Performance Testing​

7.1 Benchmarks​

7.2 Memory Profiling​

7.3 Load Testing​

8. Test Organization​

8.1 Directory Structure​

8.2 Test Naming​

9. CI/CD Integration​

9.1 GitHub Actions​

9.2 Contract Validation Script​

10. Testing Checklist​

Core Contract​

Unit Tests​

Integration Tests​

Security Tests​

Performance Tests​

CI/CD​

Related Documentation​

Table of Contents