Data Structures Reference
This section documents the data structures returned by various API calls in the OmniEmbodied Framework.
Evaluation Results
EvaluationInterface.run_evaluation() Response
The EvaluationInterface.run_evaluation() method returns a structured dictionary with the following format:
{
"runinfo": {
"run_id": str, # Unique run identifier
"model_name": str, # Model name (e.g., "vllm:Qwen2.5-7B-Instruct")
"agent_type": str, # Agent type ("single" or "multi")
"task_mode": str, # Task mode ("independent", "sequential", "combined")
"start_time": str, # ISO format timestamp
"end_time": str, # ISO format timestamp
"total_scenarios": int, # Total number of scenarios evaluated
"config_file": str, # Configuration file used
"status": str, # Status ("completed", "failed", "interrupted")
"duration_seconds": float, # Total evaluation duration
"note": str # Optional additional notes
},
"overall_summary": {
"success_rate": float, # Overall success rate (0.0 to 1.0)
"total_scenarios": int, # Total scenarios processed
"successful_scenarios": int, # Number of successful scenarios
"failed_scenarios": int, # Number of failed scenarios
"average_steps": float, # Average steps per scenario
"average_duration": float, # Average duration per scenario (seconds)
"total_llm_calls": int, # Total LLM API calls made
"error_distribution": { # Error type distribution
"timeout": int,
"max_steps": int,
"action_failed": int,
"other": int
}
},
"task_category_statistics": { # Performance by task category
"direct_command": {
"success_rate": float,
"scenario_count": int,
"average_steps": float
},
"attribute_reasoning": {
"success_rate": float,
"scenario_count": int,
"average_steps": float
},
# ... other task categories
}
}
Usage Example:
result = EvaluationInterface.run_evaluation(
config_file="single_agent_config",
agent_type="single",
task_type="independent",
scenario_selection={"scenario_range": {"start": "00001", "end": "00010"}}
)
# Access run information
print(f"Run ID: {result['runinfo']['run_id']}")
print(f"Success Rate: {result['overall_summary']['success_rate']:.2%}")
# Access task-specific performance
for category, stats in result['task_category_statistics'].items():
print(f"{category}: {stats['success_rate']:.2%} success rate")
Scenario Selection Configuration
The scenario_selection parameter accepts the following structure:
{
"dataset_type": str, # "single", "multi", or "mixed"
"mode": str, # "all", "range", or "list"
# For range mode
"scenario_range": {
"start": str, # Starting scenario ID (e.g., "00001")
"end": str # Ending scenario ID (e.g., "00100")
},
# For list mode
"scenario_list": [str], # List of specific scenario IDs
# Task filtering options
"task_filter": {
"categories": [str], # Task categories to include
"difficulty_levels": [str], # Difficulty levels to include
"exclude_scenarios": [str] # Specific scenarios to exclude
}
}
Available Task Categories:
"direct_command"- Simple command following"attribute_reasoning"- Object attribute-based reasoning"tool_use"- Tool manipulation tasks"spatial_reasoning"- Spatial relationship understanding"multi_step_reasoning"- Complex multi-step tasks"explicit_collaboration"- Direct multi-agent collaboration"implicit_collaboration"- Indirect multi-agent coordination"compound_collaboration"- Complex collaborative scenarios
Agent Configuration Structures
Single Agent Configuration
{
"agent_config": {
"agent_class": str, # "modes.single_agent.llm_agent.LLMAgent"
"max_history": int, # Maximum conversation history length
"max_steps_per_task": int, # Maximum steps per task
"timeout_per_action": int, # Timeout per action (seconds)
"retry_failed_actions": bool, # Whether to retry failed actions
},
"llm_config": {
"mode": str, # "api" or "vllm"
"api": { # For API mode
"provider": str, # "openai", "anthropic", etc.
"model": str, # Model name
"temperature": float, # Generation temperature
"max_tokens": int, # Maximum tokens per response
"api_key": str # API key
},
"vllm": { # For vLLM mode
"model": str, # Model name or path
"endpoint": str # vLLM server endpoint
}
},
"execution": {
"max_total_steps": int, # Maximum total steps per evaluation
"max_steps_per_task": int, # Maximum steps per individual task
"step_timeout": int # Timeout per step (seconds)
}
}
Multi-Agent Configuration
{
"agent_config": {
"agent_class": str, # "modes.centralized.centralized_agent.CentralizedAgent"
"coordination_mode": str, # "centralized" or "decentralized"
"agent_count": int, # Number of agents
"communication_enabled": bool, # Enable inter-agent communication
"shared_memory": bool # Enable shared memory between agents
},
# Same LLM and execution configurations as single agent
"llm_config": { ... },
"execution": { ... }
}
Error Handling
Exception Types
The framework raises the following custom exceptions:
ConfigurationError
class ConfigurationError(Exception):
"""Raised when configuration is invalid or incomplete."""
pass
EvaluationError
class EvaluationError(Exception):
"""Raised when evaluation fails."""
pass
ScenarioNotFoundError
class ScenarioNotFoundError(Exception):
"""Raised when a specified scenario cannot be found."""
pass
Error Response Format
When an error occurs during evaluation, the response includes error information:
{
"runinfo": {
"status": "failed",
"error_type": str, # Type of error that occurred
"error_message": str, # Human-readable error message
"failed_scenario": str, # Scenario that caused failure (if applicable)
# ... other runinfo fields
},
"error_details": {
"traceback": str, # Full error traceback (if debug enabled)
"context": dict # Additional context information
}
}
Common Error Types:
"configuration_error"- Invalid or missing configuration"scenario_not_found"- Specified scenario file not found"llm_connection_error"- Cannot connect to LLM service"timeout_error"- Evaluation timed out"resource_error"- Insufficient system resources
Trajectory Data Structures
Individual Step Data
Each step in a trajectory contains:
{
"step_number": int, # Step index (starting from 1)
"timestamp": str, # ISO format timestamp
"agent_id": str, # Agent identifier
"action": {
"action_type": str, # Action name (e.g., "MOVE", "GRAB")
"parameters": dict, # Action parameters
"raw_command": str # Original command string
},
"observation": {
"current_room": str, # Current room name
"visible_objects": [str], # List of visible objects
"inventory": [str], # Agent's inventory
"status_message": str # Status description
},
"result": {
"success": bool, # Whether action succeeded
"message": str, # Result message
"error": str # Error message (if failed)
},
"llm_interaction": { # LLM-specific data
"prompt": str, # Full prompt sent to LLM
"response": str, # LLM response
"token_count": int, # Tokens used
"processing_time": float # LLM response time
}
}
Complete Trajectory Structure
{
"scenario_id": str, # Scenario identifier
"agent_type": str, # Agent type used
"task_description": str, # Task description
"start_time": str, # Evaluation start time
"end_time": str, # Evaluation end time
"final_status": str, # "success", "failure", "timeout"
"total_steps": int, # Total steps taken
"total_llm_calls": int, # Total LLM API calls
"steps": [ # List of step data
# ... step objects as described above
],
"summary": {
"task_completed": bool, # Whether task was completed
"efficiency_score": float, # Efficiency metric (0.0 to 1.0)
"error_count": int, # Number of failed actions
"unique_actions_used": [str] # List of unique actions performed
}
}
Usage Examples
Working with Evaluation Results
# Run evaluation and process results
result = EvaluationInterface.run_evaluation(
config_file="single_agent_config",
agent_type="single",
task_type="independent",
scenario_selection={
"scenario_range": {"start": "00001", "end": "00050"}
}
)
# Extract key metrics
runinfo = result['runinfo']
summary = result['overall_summary']
print(f"Evaluation: {runinfo['run_id']}")
print(f"Model: {runinfo['model_name']}")
print(f"Duration: {runinfo['duration_seconds']:.1f}s")
print(f"Success Rate: {summary['success_rate']:.2%}")
print(f"Average Steps: {summary['average_steps']:.1f}")
# Analyze performance by task category
for category, stats in result['task_category_statistics'].items():
print(f"\n{category.replace('_', ' ').title()}:")
print(f" Success Rate: {stats['success_rate']:.2%}")
print(f" Scenarios: {stats['scenario_count']}")
print(f" Avg Steps: {stats['average_steps']:.1f}")
Error Handling Best Practices
try:
result = EvaluationInterface.run_evaluation(
config_file="single_agent_config",
agent_type="single",
task_type="independent",
scenario_selection={"scenario_range": {"start": "00001", "end": "00010"}}
)
# Check if evaluation completed successfully
if result['runinfo']['status'] == 'completed':
print(f"Success! {result['overall_summary']['success_rate']:.2%} success rate")
else:
print(f"Evaluation ended with status: {result['runinfo']['status']}")
except ConfigurationError as e:
print(f"Configuration error: {e}")
# Handle configuration issues
except ScenarioNotFoundError as e:
print(f"Scenario not found: {e}")
# Handle missing scenarios
except EvaluationError as e:
print(f"Evaluation failed: {e}")
# Handle evaluation failures
except Exception as e:
print(f"Unexpected error: {e}")
# Handle other errors
See Also
OmniEmbodied Framework API - Framework API reference
Evaluation System - Evaluation system guide
Basic Simulation Examples - Usage examples