Task Types and Categories
OmniEmbodied includes a comprehensive taxonomy of tasks designed to evaluate different aspects of embodied AI capabilities. This guide explains the different task categories, their characteristics, and evaluation criteria.
Task Taxonomy Overview
Tasks are organized into two main groups:
Single-Agent Tasks: Tasks that can be completed by a single agent working independently.
Multi-Agent Tasks: Tasks that require coordination between multiple agents.
Task Categories
├── Single-Agent Tasks
│ ├── Direct Command Following
│ ├── Attribute-Based Reasoning
│ ├── Tool Use and Manipulation
│ ├── Spatial Reasoning
│ └── Compound Multi-Step Reasoning
└── Multi-Agent Tasks
├── Explicit Collaboration
├── Implicit Collaboration
└── Compound Collaboration
Single-Agent Task Categories
Direct Command Following
Description: Basic command execution tasks that require agents to follow explicit instructions without complex reasoning.
Characteristics: - Clear, unambiguous instructions - Single-step or simple multi-step actions - Minimal environmental reasoning required - Direct mapping from instruction to action
Example Tasks: - “Go to the kitchen” - “Take the red apple from the table” - “Open the refrigerator door” - “Turn on the living room lights”
Evaluation Criteria: - Task completion accuracy - Instruction following precision - Error handling for impossible actions
Sample Scenario:
{
"task_id": "direct_001",
"category": "direct_command",
"description": "Take the blue book from the shelf",
"initial_state": {
"agent_location": "study_room",
"target_object": "blue_book",
"object_location": "bookshelf"
},
"success_criteria": {
"agent_has_object": "blue_book"
}
}
Difficulty Levels: - Basic: Single action commands (“take apple”) - Intermediate: Multi-step commands (“go to kitchen, take apple”) - Advanced: Commands with implicit steps (“prepare the apple” → wash, cut, etc.)
Attribute-Based Reasoning
Description: Tasks requiring agents to reason about object properties and select items based on specific attributes.
Characteristics: - Object selection based on properties - Comparison and filtering operations - Understanding of attribute relationships - Context-aware decision making
Example Tasks: - “Find the heaviest object in the room” - “Take the red item that can hold liquids” - “Get the smallest electronic device” - “Bring me something soft and warm”
Evaluation Criteria: - Correct attribute identification - Accurate object selection - Reasoning about attribute relationships
Sample Scenario:
{
"task_id": "attr_001",
"category": "attribute_reasoning",
"description": "Find the largest container that is currently empty",
"initial_state": {
"objects": [
{"id": "bowl_small", "size": "small", "type": "container", "contents": []},
{"id": "pot_large", "size": "large", "type": "container", "contents": ["soup"]},
{"id": "bucket_medium", "size": "medium", "type": "container", "contents": []}
]
},
"success_criteria": {
"selected_object": "bucket_medium"
}
}
Key Attributes: - Physical: size, weight, color, material, temperature - Functional: can_open, can_contain, is_electronic, is_fragile - State-based: is_clean, is_on, is_open, contents
Tool Use and Manipulation
Description: Tasks involving the use of tools and objects to accomplish goals, requiring understanding of object affordances and tool functionality.
Characteristics: - Tool selection and usage - Understanding object affordances - Sequential manipulation actions - Cause-and-effect reasoning
Example Tasks: - “Use the can opener to open the can” - “Cut the vegetables with the knife” - “Clean the table with the cloth” - “Measure the liquid with the measuring cup”
Evaluation Criteria: - Appropriate tool selection - Correct tool usage sequence - Goal achievement through tool use
Sample Scenario:
{
"task_id": "tool_001",
"category": "tool_use",
"description": "Open the can of soup using available tools",
"initial_state": {
"target_object": "soup_can",
"available_tools": ["can_opener", "knife", "spoon"],
"object_states": {
"soup_can": {"is_open": false}
}
},
"success_criteria": {
"object_states": {
"soup_can": {"is_open": true}
}
}
}
Tool Categories: - Kitchen Tools: knives, can openers, measuring cups, mixers - Cleaning Tools: cloths, brushes, vacuum cleaners, mops - Maintenance Tools: screwdrivers, hammers, wrenches - Electronic Tools: remote controls, computers, phones
Spatial Reasoning
Description: Tasks requiring understanding of spatial relationships, navigation, and positional reasoning.
Characteristics: - Understanding spatial relationships - Navigation planning - Positional reasoning - 3D spatial understanding
Example Tasks: - “Put the book between the lamp and the clock” - “Find the object that is behind the chair” - “Move the table to create more space” - “Arrange objects in order of height”
Evaluation Criteria: - Accurate spatial understanding - Correct positional placement - Efficient navigation paths
Sample Scenario:
{
"task_id": "spatial_001",
"category": "spatial_reasoning",
"description": "Place the vase in the center of the dining table",
"initial_state": {
"agent_location": "dining_room",
"target_object": "vase",
"target_location": "dining_table_center"
},
"success_criteria": {
"object_location": {
"vase": "dining_table_center"
}
}
}
Spatial Concepts: - Relationships: on, in, under, behind, between, next to - Directions: north, south, left, right, forward, back - Distances: near, far, close, adjacent, opposite - Arrangements: center, corner, edge, middle, side
Compound Multi-Step Reasoning
Description: Complex tasks requiring multiple reasoning steps, planning, and integration of various cognitive abilities.
Characteristics: - Multi-step planning required - Integration of multiple task types - Complex goal decomposition - Long-horizon reasoning
Example Tasks: - “Prepare a simple sandwich for lunch” - “Clean and organize the living room” - “Set up the dining table for two people” - “Find and repair the broken lamp”
Evaluation Criteria: - Correct task decomposition - Logical step sequencing - Successful completion of all sub-goals
Sample Scenario:
{
"task_id": "compound_001",
"category": "compound_reasoning",
"description": "Prepare the kitchen for cooking dinner",
"subtasks": [
{
"id": "clean_counter",
"description": "Clean the kitchen counter",
"type": "tool_use"
},
{
"id": "gather_utensils",
"description": "Get cooking utensils from drawer",
"type": "direct_command"
},
{
"id": "preheat_oven",
"description": "Set oven to 350°F",
"type": "tool_use"
}
]
}
Multi-Agent Task Categories
Explicit Collaboration
Description: Tasks requiring direct communication and coordination between agents with clearly defined roles.
Characteristics: - Direct inter-agent communication - Clearly defined roles and responsibilities - Coordinated action sequences - Shared goal achievement
Example Tasks: - “Agent A: Get ingredients. Agent B: Prepare cooking area” - “One agent holds the ladder while the other climbs” - “Coordinate to move the heavy table together” - “Take turns using the shared tool”
Evaluation Criteria: - Successful role coordination - Effective communication - Synchronized actions - Shared goal achievement
Sample Scenario:
{
"task_id": "collab_001",
"category": "explicit_collaboration",
"description": "Move the heavy sofa from living room to bedroom",
"agents": {
"agent_1": {"role": "lifter_front", "initial_location": "living_room"},
"agent_2": {"role": "lifter_back", "initial_location": "living_room"}
},
"coordination_required": {
"synchronized_lifting": true,
"coordinated_movement": true
}
}
Implicit Collaboration
Description: Tasks where agents must infer collaboration needs and coordinate without explicit communication.
Characteristics: - Implicit coordination cues - Shared situational awareness - Emergent cooperation patterns - Inference-based collaboration
Example Tasks: - “Both agents clean different rooms simultaneously” - “Prepare different parts of the same meal” - “Search different areas for the same lost item” - “Organize items while avoiding interference”
Evaluation Criteria: - Effective implicit coordination - Minimal interference between agents - Complementary actions - Efficient task distribution
Sample Scenario:
{
"task_id": "implicit_001",
"category": "implicit_collaboration",
"description": "Clean the entire house efficiently",
"global_goal": "all_rooms_clean",
"coordination_style": "implicit",
"success_criteria": {
"all_rooms_clean": true,
"minimal_redundancy": true,
"efficient_coverage": true
}
}
Compound Collaboration
Description: Complex multi-agent tasks combining explicit and implicit coordination with sophisticated planning.
Characteristics: - Mixed coordination modes - Complex multi-step planning - Dynamic role assignment - Adaptive collaboration strategies
Example Tasks: - “Plan and execute a dinner party for guests” - “Reorganize the entire living space” - “Collaborate to complete a complex assembly task” - “Coordinate emergency response procedures”
Sample Scenario:
{
"task_id": "compound_collab_001",
"category": "compound_collaboration",
"description": "Prepare and serve a three-course meal",
"phases": [
{"phase": "planning", "type": "explicit_coordination"},
{"phase": "preparation", "type": "implicit_collaboration"},
{"phase": "execution", "type": "explicit_coordination"}
]
}
Task Configuration and Filtering
Task Selection
You can filter tasks by category in your configuration:
scenario_selection:
task_filter:
categories:
- "direct_command"
- "attribute_reasoning"
- "tool_use"
# Additional filters
agent_count: "single" # single, multi, all
difficulty: "medium" # basic, medium, advanced
max_steps: 20 # Maximum steps allowed
Task Difficulty Levels
Each task category has multiple difficulty levels:
Basic Level: - Simple, single-step tasks - Clear success criteria - Minimal environmental complexity
Intermediate Level: - Multi-step tasks - Some environmental reasoning required - Multiple possible solution paths
Advanced Level: - Complex, long-horizon tasks - Significant planning required - Multiple interconnected sub-goals
Evaluation Metrics
Success Metrics
Binary Success: Task completed successfully (True/False)
Partial Success: Progress towards completion (0.0 - 1.0)
Efficiency Metrics: - Steps taken vs. optimal path - Time to completion - Resource utilization
Quality Metrics: - Action appropriateness - Error recovery capability - Solution elegance
Error Analysis
Error Categories: - Planning Errors: Incorrect task decomposition - Execution Errors: Failed action attempts - Reasoning Errors: Incorrect object/attribute identification - Coordination Errors: Failed multi-agent communication
Error Recovery: - Agent’s ability to recognize failures - Adaptive replanning capabilities - Learning from mistakes
Benchmarking and Comparison
Standard Evaluation Protocol
Scenario Selection: Representative sample from each category
Multiple Runs: Average over multiple trials for statistical significance
Consistent Configuration: Same parameters across different agents
Detailed Logging: Complete action traces for analysis
Reporting Format:
Task Category Performance Report
================================
Direct Command: 92.3% (185/200)
Attribute Reasoning: 78.5% (157/200)
Tool Use: 71.2% (142/200)
Spatial Reasoning: 83.7% (167/200)
Compound Reasoning: 62.1% (124/200)
Overall Single-Agent: 77.6% (775/1000)
Multi-Agent Performance:
Explicit Collaboration: 65.3% (131/200)
Implicit Collaboration: 58.7% (117/200)
Compound Collaboration: 42.1% (84/200)
Overall Multi-Agent: 55.4% (332/600)
Creating Custom Task Types
Task Definition Format
{
"task_id": "custom_001",
"category": "custom_category",
"description": "Human-readable task description",
"initial_state": {
"agent_locations": {},
"object_states": {},
"environment_conditions": {}
},
"success_criteria": {
"primary_goals": [],
"secondary_goals": [],
"failure_conditions": []
},
"metadata": {
"difficulty": "medium",
"estimated_steps": 15,
"required_skills": ["reasoning", "manipulation"]
}
}
Custom Evaluation Criteria
You can define custom success criteria:
def custom_task_verifier(task_definition, final_state):
"""Custom verification logic for specific task types."""
success_conditions = task_definition['success_criteria']
# Implement custom logic here
primary_complete = check_primary_goals(success_conditions, final_state)
secondary_complete = check_secondary_goals(success_conditions, final_state)
return {
'success': primary_complete,
'partial_success': calculate_partial_completion(final_state),
'quality_score': evaluate_solution_quality(final_state)
}
Best Practices
For Researchers
Task Selection: - Choose diverse tasks that cover your research interests - Include both basic and advanced difficulty levels - Ensure statistical significance with adequate sample sizes
Evaluation Protocol: - Use consistent evaluation procedures - Report both aggregate and per-category results - Include error analysis and failure modes
Reproducibility: - Document exact configurations used - Share custom task definitions - Provide complete experimental details
For Developers
Agent Design: - Test on diverse task categories to identify limitations - Implement robust error handling for action failures - Consider task-specific optimization strategies
Performance Optimization: - Profile performance on computationally intensive tasks - Optimize for common task patterns - Balance speed vs. accuracy trade-offs
Next Steps
To learn more about using tasks in OmniEmbodied:
../examples/task_filtering_examples - Filtering and selecting tasks
evaluation_framework - Setting up evaluations
OmniEmbodied Framework API - API reference for task handling
../developer/extending - Creating custom task types
For practical examples:
- ../examples/evaluation_workflows - Complete evaluation examples
- Browse the data/ directory for example task definitions
- See config/ directory for task filtering configurations