projects


  .--.  
 /    |      |
     /
  '--'  
    

maddpg: multi-agent deep deterministic policy gradient (github)

jul 2024 -- aug 2024 | tools: python, pytorch, pettingzoo, numpy, multi-agent rl

  • built a reinforcement learning system for multiple agents that can learn to work together or compete in different environments using pettingzoo.
  • created shared and individual policies for agents, so they can make decisions independently while still learning as a group.
  • added experience buffers for each agent to store past experiences and improve learning from them during training.
  • tested the system on tasks like simple_adversary and simple_spread, and tuned settings like learning rates and batch sizes for better performance.

rl2-enhanced: advanced reinforcement learning for llms (github)

jul 2025 | tools: python, pytorch, hydra, mlflow, weights&biases, optuna, numpy, distributed rl

  • extended the original rl2 repository by chenmien tan, adding advanced features for scalable and robust language model training.
  • implemented adaptive kl penalty mechanisms (exponential, pid, scheduled controllers) for stable ppo optimization.
  • developed multi-objective optimization with pareto frontier tracking, supporting reward, entropy, and kl constraints.
  • integrated alternative advantage estimation methods (v-trace, retrace(λ), td(λ), multi-step returns) for improved sample efficiency.
  • automated hyperparameter tuning using optuna and hyperopt, enabling bayesian and grid search strategies.
  • added advanced memory optimization: gradient checkpointing, cpu offloading, adaptive batch sizing, and detailed profiling.
  • enabled experiment tracking and model versioning with mlflow and weights&biases for reproducible research and mlops workflows.

rl protokit: rapid prototyping toolkit for reinforcement learning (github)

jul 2024 -- present | tools: python, pytorch, gymnasium, pettingzoo, numpy, reinforcement learning

  • developed modular cli tool for rl prototyping that enables quick generation of custom gym wrappers, hyperparameter tuning, policy debugging, and full end-to-end pipelines, reducing setup time for rl experiments compared to traditional methods.
  • implemented advanced rl features including prioritized replay buffers and rnn policy networks for efficient off-policy learning and handling temporal dependencies in partially observable environments, with support for discrete and continuous action spaces.
  • integrated intrinsic curiosity module and multi-agent wrappers using pettingzoo for enhanced exploration in sparse-reward settings and cooperative multi-agent scenarios, along with frame-stacking, grayscale transforms, and atari-specific preprocessing.
  • added ppo clip annealing, kl-divergence logging, and sac temperature auto-tuning for stable policy optimization and entropy regularization, enabling more robust training across various rl algorithms and environments.

isaac gym humanoid robot: teaching robots to walk (github)

nov 2023 | tools: python, pytorch, isaac gym, nvidia gpu

  • created a computer program that teaches virtual humanoid robots how to walk and balance using trial-and-error learning (reinforcement learning).
  • used nvidia's isaac gym simulator to run thousands of robot training sessions simultaneously on a single graphics card for faster learning.
  • implemented reward systems that give the robot positive feedback for good walking behavior and negative feedback for falling or poor movement.
  • applied ppo (proximal policy optimization) algorithm to help the robot gradually improve its walking skills through repeated practice.

tinygrad rlcv: lightweight rl for computer vision (github)

oct 2023 -- dec 2023 | tools: python, tinygrad, opencv, numpy, reinforcement learning

  • created ultra-lightweight neural network operations with tinygrad, achieving a small memory footprint (<10mb) for deployment on resource-limited edge devices (mobile cpus and arm).
  • built a memory-optimized dqn agent with prioritized experience replay and flexible policy networks, enabling stable learning and real-time inference (>30 fps) on target hardware.
  • developed an optimized opencv pipeline for efficient, real-time feature extraction from webcam streams, enabling robust object tracking on low-power systems.