Architecture
Operating Modes
The system has two modes that share the same perception interface (/perception/cones_3d). A controller node doesn't know whether it's running in simulation or on real hardware.
Simulation Mode
┌──────────────────────────────────────────────────────────────┐
│ Gazebo Fortress (headless) │
│ │
│ Physics (ODE) ──► Ackermann steering ──► Odometry │
│ ▲ │ │
│ │ │ │
│ RGBD Camera (10Hz) ────│────────────────────│─────────── │
└─────────────────────────│────────────────────│───────────────┘
│ ros_gz_bridge
/kart/cmd_vel │
│ ▼
│ /model/kart/odometry
│ │
│ ▼
│ ┌─────────────────────┐
│ │ perfect_perception │
│ │ (reads SDF + odom) │
│ └──────────┬──────────┘
│ │
│ /perception/cones_3d
│ │
│ ▼
│ ┌─────────────────────┐
└─────────│ cone_follower │
│ (midpoint steering) │
└─────────────────────┘
Real Hardware Mode
┌──────────┐ ┌──────────────────────────────────────┐
│ Gamepad │──/joy──►│ joy_to_cmd_vel │
└──────────┘ └───────────┬──────────────────────────┘
│
/actuation_cmd
│
▼
┌───────────────────────┐ UART ┌────────────┐
│ msgs_to_micro │──────────────►│ ESP32 │
│ (Ackermann → 4 bytes) │ 115200 baud │ (Medulla) │
└───────────────────────┘ └────────────┘
┌──────────┐
│ ZED │── RGB + Depth + CameraInfo
│ Camera │
└──────┬───┘
│
▼
┌──────────────────┐ /perception/ ┌──────────────────────┐
│ yolo_detector │──── cones_2d ──────►│ cone_depth_localizer │
│ (YOLOv5, PyTorch)│ │ (2D → 3D projection) │
└──────────────────┘ └──────────┬───────────┘
│
/perception/cones_3d
│
▼
┌─────────────────────┐
│ controller node │
│ (to be developed) │
└─────────────────────┘
Topic Map
Simulation Topics
| Topic |
Message Type |
Publisher |
Subscriber |
/clock |
rosgraph_msgs/Clock |
Gazebo (bridged) |
All nodes (use_sim_time) |
/model/kart/odometry |
nav_msgs/Odometry |
Gazebo (bridged) |
perfect_perception |
/kart/cmd_vel |
geometry_msgs/Twist |
cone_follower |
Gazebo (bridged) |
/zed/.../rgb/image_rect_color |
sensor_msgs/Image |
Gazebo camera (bridged + remapped) |
yolo_detector |
/zed/.../depth/depth_registered |
sensor_msgs/Image |
Gazebo camera (bridged + remapped) |
cone_depth_localizer |
/zed/.../rgb/camera_info |
sensor_msgs/CameraInfo |
Gazebo camera (bridged + remapped) |
cone_depth_localizer |
Perception Topics (shared)
| Topic |
Message Type |
Publisher |
Subscriber |
/perception/cones_2d |
vision_msgs/Detection2DArray |
yolo_detector |
cone_depth_localizer |
/perception/cones_3d |
vision_msgs/Detection3DArray |
cone_depth_localizer or perfect_perception |
Controller, cone_marker_viz_3d |
/perception/cones_3d_markers |
visualization_msgs/MarkerArray |
cone_marker_viz_3d |
RViz |
/perception/yolo/annotated |
sensor_msgs/Image |
yolo_detector |
RViz (debug) |
Real Hardware Topics
| Topic |
Message Type |
Publisher |
Subscriber |
/joy |
sensor_msgs/Joy |
joy_node |
joy_to_cmd_vel |
/actuation_cmd |
ackermann_msgs/AckermannDriveStamped |
joy_to_cmd_vel |
comms_micro |
TF Frame Tree
odom
└── base_link (kart chassis center)
└── camera_link (front-mounted camera)
- In simulation:
perfect_perception_node broadcasts all transforms from odometry
- On real hardware: the ZED ROS wrapper and
robot_state_publisher handle TF
Message Flow
The autonomous driving loop, whether in simulation or on real hardware, follows these steps:
- Sense — Camera (real ZED or simulated Gazebo) produces RGB + depth images
- Detect — YOLO finds 2D cone bounding boxes, or
perfect_perception uses ground truth
- Localize — Depth image projects 2D boxes into 3D positions in camera frame
- Decide — Controller separates blue (left) and yellow (right) cones, computes steering target
- Act —
Twist sent to Gazebo's Ackermann plugin (sim) or AckermannDriveStamped sent to ESP32 (real)