srini.terminal - hacker portfolio

projects


   git clone https://github.com/
   Cloning into 'projects'...
   
   [          ] 0%

jul 2024 -- aug 2024 | tools: python, pytorch, pettingzoo, numpy, multi-agent rl

built a reinforcement learning system for multiple agents that can learn to work together or compete in different environments using pettingzoo.
created shared and individual policies for agents, so they can make decisions independently while still learning as a group.
added experience buffers for each agent to store past experiences and improve learning from them during training.
tested the system on tasks like simple_adversary and simple_spread, and tuned settings like learning rates and batch sizes for better performance.

jul 2025 | tools: python, pytorch, hydra, mlflow, weights&biases, optuna, numpy, distributed rl

extended the original rl2 repository by chenmien tan, adding advanced features for scalable and robust language model training.
implemented adaptive kl penalty mechanisms (exponential, pid, scheduled controllers) for stable ppo optimization.
developed multi-objective optimization with pareto frontier tracking, supporting reward, entropy, and kl constraints.
integrated alternative advantage estimation methods (v-trace, retrace(λ), td(λ), multi-step returns) for improved sample efficiency.
automated hyperparameter tuning using optuna and hyperopt, enabling bayesian and grid search strategies.
added advanced memory optimization: gradient checkpointing, cpu offloading, adaptive batch sizing, and detailed profiling.
enabled experiment tracking and model versioning with mlflow and weights&biases for reproducible research and mlops workflows.

jul 2024 -- present | tools: python, pytorch, gymnasium, pettingzoo, numpy, reinforcement learning

developed modular cli tool for rl prototyping that enables quick generation of custom gym wrappers, hyperparameter tuning, policy debugging, and full end-to-end pipelines, reducing setup time for rl experiments compared to traditional methods.
implemented advanced rl features including prioritized replay buffers and rnn policy networks for efficient off-policy learning and handling temporal dependencies in partially observable environments, with support for discrete and continuous action spaces.
integrated intrinsic curiosity module and multi-agent wrappers using pettingzoo for enhanced exploration in sparse-reward settings and cooperative multi-agent scenarios, along with frame-stacking, grayscale transforms, and atari-specific preprocessing.
added ppo clip annealing, kl-divergence logging, and sac temperature auto-tuning for stable policy optimization and entropy regularization, enabling more robust training across various rl algorithms and environments.

nov 2023 | tools: python, pytorch, isaac gym, nvidia gpu

created a computer program that teaches virtual humanoid robots how to walk and balance using trial-and-error learning (reinforcement learning).
used nvidia's isaac gym simulator to run thousands of robot training sessions simultaneously on a single graphics card for faster learning.
implemented reward systems that give the robot positive feedback for good walking behavior and negative feedback for falling or poor movement.
applied ppo (proximal policy optimization) algorithm to help the robot gradually improve its walking skills through repeated practice.

oct 2023 -- dec 2023 | tools: python, tinygrad, opencv, numpy, reinforcement learning

created ultra-lightweight neural network operations with tinygrad, achieving a small memory footprint (<10mb) for deployment on resource-limited edge devices (mobile cpus and arm).
built a memory-optimized dqn agent with prioritized experience replay and flexible policy networks, enabling stable learning and real-time inference (>30 fps) on target hardware.
developed an optimized opencv pipeline for efficient, real-time feature extraction from webcam streams, enabling robust object tracking on low-power systems.