Rouzbeh Meshkinnejad

Senior AI Researcher & Machine Learning Engineer

I work on representation learning, continual learning, and applied ML systems, translating research ideas into reliable, production-grade AI.

What I Work On

Representation learning and continual learning for vision and multimodal systems
Large-scale training and evaluation (multi-GPU, mixed precision, Azure ML)
Applied LLM systems: retrieval, deep search agents, reasoning pipelines
ML for scientific and industrial domains (life sciences, chemistry, geoscience)
Research-to-production ML workflows and infrastructure

Selected Research

Look-Ahead Selective Plasticity diagram showing continual learning process

Look-Ahead Selective Plasticity for Continual Learning

NeurIPS (UniReps Workshop) · 2024

We propose a look-ahead selective plasticity mechanism that combines contrastive learning and distillation to reduce catastrophic forgetting in continual visual learning. By valuing features that both preserve past knowledge and transfer to new tasks, the method achieves state-of-the-art performance on CIFAR-10 and TinyImageNet.
Continual learning systems often suffer from catastrophic forgetting due to uniform parameter plasticity across tasks. In contrast, our method models learning as a sequence of events and introduces a look-ahead evaluation step to guide plasticity decisions at task transitions.
Many existing approaches estimate parameter importance solely based on past tasks, leading to overly stable solutions that gradually lose the ability to learn. Our key motivation is that features valuable for continual learning should be assessed not only by what they retain, but also by how well they transfer to future tasks.
To capture this, we introduce a look-ahead selective plasticity mechanism that uses a small number of samples from a new task to estimate the transferability of learned features. At the level of embedding neurons, we evaluate performance jointly on data from previous tasks and the first batch of the new task. Neurons that preserve performance under this joint evaluation are treated as transferable and selectively protected, while the rest of the network remains plastic.
This process is implemented through selective plasticity, using a distillation loss applied to a subset of embeddings, and gradient modulation, which dampens updates to salient parameters that contribute to both transfer and past-task performance. This design allows the model to adapt to new tasks with a larger set of free parameters, while preserving transferable features learned earlier.
Experiments on CIFAR-10 and TinyImageNet show higher average accuracy, reduced forgetting, and improved forward transfer compared to regularization- and distillation-based baselines, while maintaining competitive performance on new tasks.

Paper Code Project Page

Effects of Neuromodulation-Inspired Mechanisms on the Performance of Deep Neural Networks in a Spatial Learning Task

iScience · 2023

We build on prior work on neuromodulation-inspired deep neural networks by examining how neuromodulatory components influence both learning behavior and single-unit activity in a spatial learning task. Within a multiscale neuromodulatory framework, plastic components, dropout probability modulation, and learning-rate decay were introduced at the single-unit, layer, and whole-network levels, respectively.
These additions led to measurable behavioral benefits, including faster learning and reduced ambulation error. Our results show that neuromodulatory components shape learning trajectories, final performance, and single-unit responses in a manner that depends on both the specific mechanism and its hyperparameters.
Rather than altering the core network architecture, neuromodulation-inspired processes were embedded as learning-time controls operating at different spatial scales. Local plasticity governed individual weight updates, adaptive dropout dynamically regulated layer-level regularization, and network-level learning-rate scheduling adjusted global optimization over time. Across both TensorFlow and PyTorch implementations, these mechanisms consistently improved convergence and task performance in an open-field spatial environment.
Analysis of fully connected layer activations revealed units with diverse spatial tuning properties, including place-like and grid-like response patterns. The emergence and expression of these patterns varied with the presence and configuration of neuromodulatory components, demonstrating that biologically inspired learning mechanisms can systematically influence internal network dynamics.

Paper Code

Beyond Stationary Simulation; Modern Approaches to Stochastic Modelling

Stochastic Environmental Research and Risk Assessment · 2023

We study the limitations of stationary assumptions in classical stochastic simulation and explore modern alternatives that model non-stationary, evolving processes. In particular, we examine how Generative Adversarial Networks (GANs) can be used as flexible data-driven simulators for complex stochastic systems.
Traditional stochastic modelling frameworks often assume stationarity, which simplifies analysis but fails to capture the dynamics of many real-world systems whose underlying distributions evolve over time. This mismatch leads to inaccurate simulations and unreliable uncertainty estimates in practice.
Our objective is to move beyond stationary formulations by leveraging modern stochastic modelling techniques that adapt to time-varying dynamics. We analyze GAN-based approaches as a powerful alternative, where generative models learn the underlying data distribution directly and can implicitly capture non-stationary behavior without explicit parametric assumptions.
We compare GAN-based simulation with classical stochastic methods, highlighting differences in expressiveness, stability, and their ability to model regime shifts and distributional drift. The study discusses practical considerations for training and evaluating generative models in stochastic simulation settings.
Experimental results show that GAN-based approaches achieve lower distributional error and improved sample realism compared to stationary simulations, particularly in scenarios with evolving dynamics, demonstrating their effectiveness as modern tools for non-stationary stochastic modelling.

Paper

Applied Systems & Industry Work

LLM-based deep search agent architecture

Deep Search & Reasoning Agent

ALS Geoanalytics · 2025

Designed and deployed a multi-step LLM-based agent capable of iterative retrieval, reasoning, and knowledge updates over external web sources.
The system was built for reliability, cost control, and extensibility, and integrated into internal workflows for domain experts.

LangChain
LangGraph
Azure Serverless LLM

Contact for Details

Kubernetes Backend System for Real-Time ML Inference

ALS Geoanalytics · 2025

Designed a production-grade Kubernetes backend for low-latency, real-time ML inference, targeting large vision transformer models. The work focused on system architecture, scalability, and operational correctness, and was formalized in a detailed design document covering infrastructure, deployment, and runtime considerations.
The system was designed around Azure Kubernetes Service (AKS) and provisioned using Infrastructure as Code (Bicep), with a strict separation between infrastructure and application layers. The architecture emphasized reproducibility, environment parity, and safe iteration, enabling model teams to deploy and update inference services independently of cluster operations.
Key design elements included autoscaling strategies for bursty inference workloads, GPU-aware scheduling, health checks and rollout policies for zero-downtime updates, and CI/CD pipelines for containerized model services. The design also addressed observability, cost control, and security boundaries between services. While the system was not deployed to production, it was reviewed and validated as a deployment-ready architecture.

Kubernetes (AKS)
Azure Container Registry
Infrastructure as Code (Bicep)
CI/CD Pipelines
GPU Scheduling
Autoscaling

Contact for Details

Life Science Image Classification

ALS Geoanalytics · 2023-2025

Led the end-to-end design, implementation, and deployment of a large-scale image classification system for life science data, transforming a largely manual inspection workflow into an automated, ML-driven pipeline. The system scaled from a small unlabeled dataset to 2+ million high-resolution images while maintaining production-level performance and reliability.
The project began with limited labeled data, where manual annotation was prohibitively expensive. The task was to design a pipeline that could bootstrap supervision, scale efficiently, and deliver high-accuracy predictions suitable for downstream operational use. To address this, we introduced a pseudo-labeling strategy using classical image processing techniques (OpenCV) to generate initial labels from raw imagery. A custom web application was designed and deployed to visualize the pseudo-labeling process and enable domain experts to validate and refine annotations. All data and annotations were versioned and stored in Azure Blob Storage, enabling reproducible training at scale.
Model development initially relied on ResNet-based architectures, and later transitioned to DINOv2 as dataset scale and diversity increased. Training was optimized using Distributed Data Parallel (DDP) on Azure Machine Learning with mixed-precision training as well as experimenting with torch.compile backends and CUDA graphs, reducing epoch time from ~100 hours to ~7 hours on 16 T4 GPUs (≈14x speedup).
After approximately two weeks of training, the system achieved >93% accuracy on high-resolution images and >99% accuracy on small tile images. The model was deployed in a validation environment using Azure Batch, Logic Apps, and on-premises upload services, enabling seamless integration with existing workflows. This solution replaced a costly manual review process, resulting in over $100K in annual operational savings.

PyTorch
Distributed Data Parallel (DDP)
Mixed Precision Training
torch.compile
CUDA Graphs
Azure Machine Learning
MLflow
Azure Blob Storage
Azure Batch
Logic Apps
OpenCV
DINOv2
ResNet
Streamlit

Contact for Details

Chemical process optimization with ML workflow

Chemistry Process Optimization with Machine Learning

ALS Geoanalytics · 2023-2025

Enhanced a mature industrial chemical process by introducing a machine-learning layer to improve consistency and efficiency. The work focused on extracting incremental gains from large-scale operational data without altering core process mechanics.
While the existing process performed reliably, measurable variability remained across environments and operating conditions. The objective was to design a data-driven system that could identify subtle patterns in historical data and support improved outcomes at scale.
Performed extensive exploratory analysis on 10M+ tabular records, engineered features, and trained deep learning models using PyTorch. Established reproducible experimentation and model tracking with Azure Machine Learning and MLflow, and integrated the models into a semi-automated workflow suitable for global laboratory environments.
Achieved >90% predictive accuracy in production use, contributing to $500K+ in annualized savings through improved success rates and operational efficiency across deployed sites.

PyTorch
Azure Machine Learning
MLflow
Dask
Pandas

Contact for Details

Stochastic Physical Inversion Optimization on GPU

Optimization of Stochastic Physical Inversion

GoldSpot Discoveries Ltd. · 2022-2023

Redesigned and accelerated large-scale stochastic inversion workflows for geophysical signals by re-architecting computation to run efficiently on GPUs. The system preserves physically accurate forward modeling while enabling fast, practical exploration of ill-posed inverse problems.
In geophysical modeling, the forward process (i.e. observable signals such as gravity or electromagnetic responses from a known 3D subsurface property mesh) is deterministic and well-defined. The inverse problem, however, is inherently ill-posed: multiple subsurface configurations can explain the same 2D observations, requiring stochastic and iterative optimization over high-dimensional parameter spaces. Existing inversion workflows relied on legacy CPU-based software, making large-scale experimentation prohibitively slow.
I optimized the inversion pipeline by implementing GPU-native execution for the stochastic components, introducing lazy evaluation for high-dimensional tensor operations, and restructuring numerical kernels to maximize parallelism and memory efficiency without modifying the underlying physical models. Custom infrastructure supported scalable experimentation across different inversion setups while ensuring numerical stability and reproducibility.
The resulting system reduced end-to-end inversion runtime by ~12x, from ~8 hours to minutes, enabling domain experts to iterate rapidly over inversion configurations and significantly improving the practicality of advanced inversion techniques in real-world research workflows.

PyTorch
KeOps
GPyTorch

Contact for Details

3D Subsurface Modeling with Deep Learning and Physical Simulations

GoldSpot Discoveries Ltd. · 2022-2023

Developed deep learning models to predict 3D subsurface structures from 2D geophysical observations. Combined physical modeling with data-driven approaches to enhance mineral exploration accuracy.
Leveraging a hybrid of CNN-LSTM, pure LSTM, and Transformer architectures, alongside traditional ML baselines (Random Forests, XGBoost), we addressed the inverse problem of reconstructing subsurface properties from gravity, electromagnetic, and volumetric datasets. Data was generated both from the forward physical process and via FastGAN to augment limited observations. The models were trained to map 2D signals to 3D representations, enabling accurate and scalable predictions of mineral deposits. This work integrated physical insight, generative modeling, and deep learning to improve predictive performance and efficiency of real-world exploration applications.

Long Short Term Memory (LSTM)
Seq2Seq with Attention
CNN-LSTM
Inverse Modelling
Random Forest
XGBoost

Related Work

Semi-Supervised Deep Learning for Satellite Imagery Segmentation

GoldSpot Discoveries Ltd. · 2022-2023

Developed a semi-supervised deep learning pipeline to segment high-resolution satellite imagery with limited human-annotated labels. Demonstrated a scalable approach for integrating automated segmentation into geoscientific workflows.
Using ResNet-18 as the backbone, I applied a combination of semi-supervised methods—including consistency regularization and pseudo-labeling—to maximize performance under scarce annotations. Large raster images were read, tiled, segmented, and stitched into high-resolution maps, with evaluation via accuracy and Intersection-over-Union (IoU). While initially serving as a proof-of-concept, the project validated the feasibility of automated segmentation for domain experts and provided a foundation for further development toward reliable satellite image analysis.

ResNet-18
PyTorch
Semi-Supervised Learning
Sentinel-2
Sattelite Imagery
Google Earth
Rasterio

Related Dataset

DoubleA: Embedding Augmentation for Pretrained Language Models

DoubleA: Controllable Embedding Augmentation for Pretrained Language Models

BSc Thesis · 2021

Proposed a novel embedding-level data augmentation method for transformer-based language models, enabling flexible and controllable generation of synthetic training samples. Designed to improve generalization in low-resource NLP settings.
I introduced DoubleA, an embedding augmentation framework that operates directly in the representation space of pretrained models (BERT), addressing the limitations of word-level text augmentation. The method combines lightweight word-level perturbations with a latent-space modeling approach using SVD and Gaussian sampling to generate arbitrarily many augmented embeddings with tunable strength. DoubleA was evaluated on a downstream IMDb sentiment classification task, where it consistently outperformed standard augmentation baselines (EDA, E-Mixup, E-Stitchup) and achieved modest but reliable gains over no augmentation (≈+3% accuracy relative to other augmentation methods). This work demonstrates how controllable embedding-space augmentation can enhance robustness and generalization while remaining computationally efficient and compatible with semi-supervised learning pipelines.

PyTorch
Hugging Face
BERT
IMDb Sentiment Classification

Report Code

Stock Prediction Via Sentiment Analysis and Hybrid Deep Learning Models

Western University · 2021

Developed a hybrid modeling framework that combines historical stock price data with sentiment signals extracted from StockTwits messages to predict short-term stock price movements. The project investigated whether social-media-driven sentiment provides complementary information beyond price-only models.
The project implemented a pipeline that jointly modeled financial time-series data and sentiment derived from stock-related tweets collected from StockTwits. Sentiment features were extracted using a BERT-based classifier trained on domain-specific text, while price dynamics were modeled using neural networks operating on historical price windows and technical indicators. These components were integrated into a hybrid architecture to evaluate the impact of sentiment signals on downstream prediction tasks. Models were evaluated on real-world stock and tweet datasets using regression and classification metrics. Results showed that incorporating sentiment information led to more stable predictions and modest performance improvements under certain conditions, while also revealing challenges related to noise, timing alignment, and variability in social sentiment signals.

PyTorch
Convolutional Neural Networks
Recurrent Neural Networks
LSTM
Sentiment Analysis
BERT
Huggingface

Report Code

Education

MSc, Computer Science — Western University (4.0/4.0)

BSc, Computer Engineering — Sharif University of Technology (18.88/20)

Talks & Teaching

Look-Ahead Selective Plasticity for Continual Learning of Visual Tasks

Poster Presentation

NeurIPS (UniReps Workshop), 2024

View Poster

Look-Ahead Selective Plasticity for Continual Learning of Visual Tasks

Inited Talk

University of Tokyo, 2024

The effects of neuromodulation-inspired mechanisms on learning in an open field navigation task

Poster Presentation

Robarts Institute, Western University, 2023

EM-PRISE: a Tool for Anomaly Analysis and One-dimensional Inversion of Electromagnetic Data

Oral Presentation

KEGS PDAC Electromagnetic Mini-Symposium, 2022

Teaching Assistant Work

Fundumentals of Programming II, Artificial Intelligence (2x), Probability and Statistics, Data Structures and Algorithms, Data Transmission

Background & Distinctions

Recipient of Vector Institute Scholarship in AI

2021

Top 0.1% — Ranked 89th nationally in the university entrance exam among 137,788 participants

2017

USA Computing Olympiad — advanced to Gold Division (CodeForces Profile)

2015

Selected as a member of Iran's national team to participate in the International Mathematics Competition (IMC)

2012