Troubleshooting

This guide helps you diagnose and resolve common issues with OmniEmbodied. Issues are organized by category with detailed solutions and prevention tips.

Installation Issues

ImportError: No module named ‘OmniSimulator’

Symptoms: - Python can’t find the OmniSimulator module - Import statements fail

Causes: - OmniSimulator package not installed correctly - Python path issues - Virtual environment problems

Solutions:

Reinstall OmniSimulator:

cd OmniEmbodied/OmniSimulator
pip install -e .

Check Python path:

import sys
print(sys.path)
# Ensure OmniEmbodied directory is in the path

Verify virtual environment:

which python
pip list | grep -i omnisimulator

Permission Denied Errors

Symptoms: - Can’t write to directories - Installation fails with permission errors

Solutions:

Use virtual environment (recommended):

python -m venv omniembodied-env
source omniembodied-env/bin/activate
pip install -e .

Install for current user only:
```
pip install --user -e .
```

Check directory permissions:

ls -la
# Ensure you have write permissions

YAML Configuration Errors

Symptoms: - “yaml.scanner.ScannerError” messages - Configuration not loading

Solutions:

Validate YAML syntax:

python -c "import yaml; yaml.safe_load(open('config.yaml'))"

Check indentation (use spaces, not tabs):

# Correct
dataset:
  default: "eval_single"

# Incorrect (mixed tabs/spaces)
dataset:
  default: "eval_single"

Escape special characters:

# For strings with special characters
message: "Task: \"find the key\""

Runtime Errors

Simulation Hangs or Times Out

Symptoms: - Simulation appears stuck - No progress for extended periods - Timeout errors

Diagnostic Steps:

Enable debug logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Check LLM connectivity:

curl -I https://api.openai.com/v1/models
# Or test your LLM endpoint

Monitor system resources:

top        # Linux/Mac
htop       # Enhanced version
# Check CPU, memory usage

Solutions:

Set reasonable timeouts:

execution:
  max_steps_per_task: 35
  timeout_seconds: 300

Check API rate limits:

llm_config:
  timeout: 30
  max_retries: 3

Use faster models for testing:

llm_config:
  model_name: "gpt-3.5-turbo"  # Faster than GPT-4

Invalid Action Errors

Symptoms: - Agent attempts impossible actions - Action validation failures - “Action not allowed” messages

Diagnostic Steps:

Check action logs:
```
grep -i "action" simulation.log
```

Verify environment state:

# Add debug prints in your agent
print(f"Current room: {agent.current_room}")
print(f"Available objects: {environment.get_objects()}")

Solutions:

Improve agent prompting:

agent_config:
  environment_description:
    detail_level: 'full'
    show_object_properties: true

Add action validation:

# In custom agent code
if not self.can_execute_action(action, target):
    return self.fallback_action()

Enable step-by-step verification:

task_verification:
  enabled: true
  mode: "step_by_step"

LLM API Issues

Authentication Errors

Symptoms: - “Invalid API key” errors - “Authentication failed” messages - HTTP 401 responses

Solutions:

Verify API key:

echo $OPENAI_API_KEY
# Should show your actual API key

Test API access:

curl -H "Authorization: Bearer $OPENAI_API_KEY" \
     https://api.openai.com/v1/models

Check key permissions: - Ensure API key has required permissions - Check account billing status - Verify key hasn’t expired

Rate Limit Errors

Symptoms: - “Rate limit exceeded” messages - HTTP 429 responses - Slow or failed requests

Solutions:

Reduce request frequency:

parallel_evaluation:
  scenario_parallelism:
    max_parallel_scenarios: 2  # Reduce from default

Add request delays:

import time
time.sleep(1)  # Add delay between requests

Upgrade API plan: - Consider higher tier for increased limits - Monitor usage in API provider dashboard

Model Not Found Errors

Symptoms: - “Model not found” errors - Invalid model name responses

Solutions:

Check available models:

curl -H "Authorization: Bearer $OPENAI_API_KEY" \
     https://api.openai.com/v1/models

Use correct model names:

llm_config:
  model_name: "gpt-4-turbo-preview"  # Check exact name

Verify model access: - Some models require special access - Check account eligibility

Performance Issues

Slow Simulation Speed

Symptoms: - Simulations take much longer than expected - High CPU or memory usage - System becomes unresponsive

Diagnostic Tools:

Profile execution:

import cProfile
pr = cProfile.Profile()
pr.enable()
# Run simulation
pr.disable()
pr.print_stats()

Monitor resources:

# Memory usage
ps aux | grep python

# Disk I/O
iotop

# Network activity
netstat -i

Solutions:

Optimize configuration:

agent_config:
  max_history: 10  # Reduce from default 20

execution:
  max_steps_per_task: 25  # Reduce if appropriate

Use parallel processing wisely:

parallel_evaluation:
  scenario_parallelism:
    max_parallel_scenarios: 4  # Based on your CPU cores

Clean up regularly:

# Remove old logs
find . -name "*.log" -mtime +7 -delete

# Clear temporary files
rm -rf /tmp/omniembodied_*

Memory Issues

Symptoms: - “Out of memory” errors - System swapping excessively - Process killed by OS

Solutions:

Reduce memory usage:

agent_config:
  max_history: 5  # Smaller history

logging:
  level: "WARNING"  # Less verbose logging

Process scenarios in batches:

# Instead of processing all at once
scenarios = get_all_scenarios()
batch_size = 10
for i in range(0, len(scenarios), batch_size):
    batch = scenarios[i:i+batch_size]
    process_batch(batch)

Monitor memory usage:

import psutil
process = psutil.Process()
print(f"Memory usage: {process.memory_info().rss / 1024 / 1024:.2f} MB")

Data and File Issues

Missing Dataset Files

Symptoms: - “File not found” errors for scenarios - Empty evaluation results

Solutions:

Verify data directory structure:

ls -la data/
# Should contain eval/, sft/, data-all/ directories

Check file paths in configuration:

dataset:
  default: "eval_single"  # Must match directory structure

Download missing data:

# If data is in separate repository
git submodule update --init --recursive

Corrupted JSON Files

Symptoms: - JSON parsing errors - “Invalid JSON” messages - Partial data loading

Diagnostic Steps:

Validate JSON files:

python -m json.tool scenario.json > /dev/null
echo $?  # Should be 0 for valid JSON

Find corrupted files:

find data/ -name "*.json" -exec sh -c 'python -m json.tool "$1" > /dev/null || echo "Invalid: $1"' _ {} \;

Solutions:

Restore from backup:

git checkout HEAD -- data/corrupted_file.json

Fix manually: - Use JSON validator to identify issues - Common problems: missing commas, unescaped quotes

Logging and Debugging

Enable Detailed Logging

For general debugging:

import logging

# Enable debug for all modules
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

For specific components:

# Simulator core
logging.getLogger("OmniSimulator.core").setLevel(logging.DEBUG)

# Agent decisions
logging.getLogger("modes.single_agent").setLevel(logging.DEBUG)

# LLM interactions
logging.getLogger("llm").setLevel(logging.DEBUG)

In configuration file:

logging:
  level: "DEBUG"
  show_llm_details: true

Save Debug Information

# Save detailed state
import json

debug_info = {
    'agent_state': agent.get_state(),
    'environment_state': env.get_state(),
    'action_history': agent.get_history(),
    'error_context': str(exception)
}

with open('debug_output.json', 'w') as f:
    json.dump(debug_info, f, indent=2)

Getting Help

Before asking for help, collect:

System information:

python --version
pip list | grep -E "(omni|llm|yaml)"
uname -a  # Linux/Mac
# Windows: systeminfo

Error details: - Complete error messages - Stack traces - Configuration files (remove sensitive data) - Steps to reproduce
Log files: - Enable debug logging - Include relevant log excerpts - Timestamp information

Where to get help:

Check this troubleshooting guide first
Search existing GitHub issues
Create new issue with detailed information
Ask in GitHub Issues for usage questions

Creating effective bug reports:

Clear title: Describe the problem concisely
Environment: System details, versions
Steps to reproduce: Exact sequence of actions
Expected vs actual: What should happen vs what does
Logs and errors: Relevant error messages
Minimal example: Simplest case that shows the problem

Common Error Patterns

Pattern: “Attribute ‘X’ not found” - Usually indicates missing configuration - Check spelling and indentation in YAML - Verify all required fields are present

Pattern: “Connection refused” or “Timeout” - Network connectivity issues - API endpoint problems - Firewall or proxy blocking requests

Pattern: “Permission denied” - File system permissions - Virtual environment not activated - Trying to modify read-only files

Pattern: “Module not found” - Installation incomplete - Python path issues - Wrong virtual environment

Remember: most issues have been encountered before. Take time to search existing solutions before creating new issues.