Research
I build CMOS/FPGA systems for probabilistic computing with p-bits (Ising/Boltzmann machines) and distributed architectures for fast sampling, learning, and quantum-inspired optimization. My approach is full-stack — device physics → architectures → algorithms — advancing from single FPGA prototypes to multi-FPGA asynchronous fabrics targeting million-node p-computers.
Research areas
Distributed probabilistic computing
- Balanced multi-chip mapping via probabilistic Potts partitioning
- Asynchronous, latency-tolerant links that preserve solution quality
- Roadmap and measurements toward 100k–1M+ p-bits
Probabilistic Generative AI & ML
- Hardware-aware Deep Boltzmann Machines (DBMs) and EBMs
- Contrastive-divergence at extreme sweep counts
- Neural Quantum States (NQS) for quantum & scientific data
Quantum-inspired optimization & sampling
- Sparse fabrics that emulate dense couplings efficiently
- Higher-order couplers to improve prefactors and scaling
- APT, SQA, and non-local MC for hard instances
Full-stack focus
Device / physical layer
- Physics-inspired p-bits (CMOS today; sMTJs next)
- Monte-Carlo-faithful device models and noise tuning
- Quality tracked by energy / free-energy metrics
- Tight device-to-architecture co-design loops
Architecture
- Sparse IMs with multiplexed dense interactions
- Higher-order interactions (e.g., XORSAT) realized efficiently
- Balanced partitioning & async, latency-tolerant links
- Chromatic, massively parallel Gibbs updates
Algorithms
- SA / PT with non-local moves (ICM) at scale
- Simulated Quantum Annealing (SQA)
- DBM/EBM training and NQS sampling
- Large-sweep, hardware-aware learning loops
Key figures of merit
1500B flips/s (measured) Up to 6 orders vs. CPU Gibbs ≈100× vs. GPU/TPU (flips/s & energy) 6-FPGA UCSB: ~50k p-bits (async links) Synthesized 1M p-bits on 18× VP1902 (Siemens) 4,264 p-bits / ≈30k params (DBM) Multi-FPGA system to 1M+ nodes
Spotlight




Methods I use
- Massively parallel (graph-colored) Gibbs
- Simulated Annealing (SA)
- Adaptive Parallel Tempering (APT)
- Simulated Quantum Annealing (SQA)
- Isoenergetic Cluster Moves (ICM)
- Non-equilibrium Monte Carlo (NMC)
- Higher-order p-bits / couplers
- DBM training (hardware-aware)
- NQS sampling & training
Current directions
- Million-node p-computers: partitioning and interconnects that preserve solution quality at extreme scale.
- Probabilistic AI & NQS at scale: compact, hardware-efficient learning for scientific/quantum data.
- Heterogeneous p-computers: CMOS + sMTJs to keep improving flips/J and density.
Full publication list: Publications