Windy-NavRL: Wind-resilient RL Framework for UAV Navigation
Date:
Reinforcement Learning, Deep Learning, UAV, Robotics, LSTM-PPO
Introduction
Autonomous UAV navigation in outdoor environments faces significant challenges from wind disturbances, which can cause trajectory deviations, increased energy consumption, and mission failure. This project develops a reinforcement learning framework that enables UAVs to learn wind-aware navigation policies.
Project Objective: Implement LSTM-PPO architecture to improve UAV navigation robustness under challenging wind conditions.
Impact: Contributes to safer autonomous drone operations for infrastructure inspection, search and rescue, and package delivery applications.
Methods
System Architecture
Developed an enhanced LSTM-PPO architecture built on the NavRL framework:
LSTM-PPO Agent:
- Observation Space: Position, velocity, goal vector, estimated wind velocity
- LSTM Layer: Temporal memory to capture wind pattern history
- Policy Network: Actor-critic architecture with shared feature extraction
- Wind Estimator: Real-time wind velocity estimation from UAV dynamics
Training Infrastructure:
- Multi-environment distributed training in parallel simulations
- Curriculum learning with gradually increasing wind intensity
- Domain randomization for varying wind field characteristics
Simulation Environments
Implemented realistic wind simulation in both Gazebo and Isaac Sim:
Wind Field Types:
- Constant Wind: Uniform wind field (baseline testing)
- Turbulent Wind: Spatially-varying wind patterns
- Gust Events: Sudden wind direction and speed changes
- Vortex Fields: Rotational wind patterns near obstacles
Results
The wind-aware RL policy demonstrates improved navigation performance compared to traditional PID control and standard PPO approaches across various wind conditions. The LSTM component enables the policy to learn temporal wind patterns and adapt control strategies accordingly.
Key Observations:
- Improved success rates in reaching navigation goals under wind disturbances
- More stable trajectories with reduced deviation from planned paths
- Better energy efficiency through wind-aware trajectory planning
Policy Behaviors:
- Proactive compensation for anticipated wind effects
- Efficient path planning that considers wind direction
- Quick stabilization after unexpected wind gusts
Discussion
Why LSTM Architecture
The LSTM memory component provides critical advantages:
- Captures temporal patterns in wind disturbances
- Distinguishes between sustained wind and transient gusts
- Enables predictive control based on recent observations
Real-World Deployment Considerations
Current Work:
- Physics-based simulation for sim-to-real transfer
- Integration with CERLAB UAV autonomy stack
- Real-world testing with custom LiDAR-equipped UAV platform
- Wind estimation from IMU and visual cues
My Role & Contributions
As the lead developer on this research project under Prof. Kenji Shimada’s supervision, I:
✓ Designed and implemented LSTM-PPO architecture with wind state estimation
✓ Developed realistic wind field models in Gazebo and Isaac Sim
✓ Built distributed training infrastructure for parallel policy learning
✓ Conducted experiments comparing different policy architectures
✓ Analyzed navigation behaviors and wind-response strategies
✓ Contributed to CERLAB UAV autonomy stack integration
Technical Skills Demonstrated: Reinforcement Learning, ROS, Gazebo, Isaac Sim, PyTorch, UAV Control, Python, C++
Conclusion
This ongoing research project implements a wind-resilient RL framework for UAV navigation using LSTM-PPO architecture. The approach demonstrates improved robustness under wind disturbances through temporal modeling and wind-aware policy learning.
Key Achievements:
- LSTM-PPO architecture for wind-resilient navigation
- Comprehensive wind simulation framework in Gazebo and Isaac Sim
- Integration with CERLAB autonomy stack
- Ongoing real-world deployment and validation
