top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Evaluating PPO and SAC Algorithms for Continuous Control

Date

April 2023

Role & Contributions

Lead developer for SAC and DDPG implementations
Conducted comprehensive hyperparameter studies
Performed comparative analysis of algorithms
Generated detailed performance metrics and visualizations

Date

October 2024 - December 2024

    Overview
    A comparative study implementing and analyzing three state-of-the-art reinforcement learning algorithms - Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC), and Deep Deterministic Policy Gradient (DDPG) - for solving challenging continuous control tasks in the CarRacing-v2 environment.
    Project Goals

    Implement and evaluate multiple deep RL algorithms
    Compare performance metrics including sample efficiency, stability, and final performance
    Analyze hyperparameter sensitivity and optimization
    Develop practical insights for algorithm selection in continuous control tasks

    Technical Details
    Environments

    CarRacing-v2: Visual-based racing environment with:

    96x96 RGB image observations
    Continuous action space for steering, acceleration, and braking
    Procedurally generated tracks



    Algorithms Implemented

    SAC Implementation (My Focus):

    Dual critic networks for reduced overestimation bias
    Automatic entropy tuning for exploration
    Experience replay buffer for sample efficiency
    State-of-the-art performance in continuous control


    DDPG Implementation (My Focus):

    Deterministic policy gradient approach
    Actor-critic architecture with target networks
    Ornstein-Uhlenbeck process for exploration
    Specialized for continuous action spaces


    PPO Implementation (Collaborative Work):

    Clipped surrogate objective function
    On-policy learning with improved stability
    Value function and policy optimization
    Robust performance across tasks



    Key Results

    SAC demonstrated superior sample efficiency
    PPO showed better training stability
    DDPG provided effective learning in continuous spaces
    Successful navigation of complex racing scenarios

    Technologies Used

    Python
    PyTorch/TensorFlow
    OpenAI Gym
    Numpy/Pandas
    Matplotlib for visualization
bottom of page