Research

I build CMOS/FPGA systems for probabilistic computing with p-bits (Ising/Boltzmann machines) and distributed architectures for fast sampling, learning, and quantum-inspired optimization. My approach is full-stack — device physics → architectures → algorithms — advancing from single FPGA prototypes to multi-FPGA asynchronous fabrics targeting million-node p-computers.

Research areas

Distributed probabilistic computing

  • Balanced multi-chip mapping via probabilistic Potts partitioning
  • Asynchronous, latency-tolerant links that preserve solution quality
  • Roadmap and measurements toward 100k–1M+ p-bits

Probabilistic Generative AI & ML

  • Hardware-aware Deep Boltzmann Machines (DBMs) and EBMs
  • Contrastive-divergence at extreme sweep counts
  • Neural Quantum States (NQS) for quantum & scientific data

Quantum-inspired optimization & sampling

  • Sparse fabrics that emulate dense couplings efficiently
  • Higher-order couplers to improve prefactors and scaling
  • APT, SQA, and non-local MC for hard instances

Full-stack focus

Device / physical layer

  • Physics-inspired p-bits (CMOS today; sMTJs next)
  • Monte-Carlo-faithful device models and noise tuning
  • Quality tracked by energy / free-energy metrics
  • Tight device-to-architecture co-design loops

Architecture

  • Sparse IMs with multiplexed dense interactions
  • Higher-order interactions (e.g., XORSAT) realized efficiently
  • Balanced partitioning & async, latency-tolerant links
  • Chromatic, massively parallel Gibbs updates

Algorithms

  • SA / PT with non-local moves (ICM) at scale
  • Simulated Quantum Annealing (SQA)
  • DBM/EBM training and NQS sampling
  • Large-sweep, hardware-aware learning loops

Key figures of merit

1500B flips/s (measured) Up to 6 orders vs. CPU Gibbs ≈100× vs. GPU/TPU (flips/s & energy) 6-FPGA UCSB: ~50k p-bits (async links) Synthesized 1M p-bits on 18× VP1902 (Siemens) 4,264 p-bits / ≈30k params (DBM) Multi-FPGA system to 1M+ nodes

Spotlight

Methods I use

  • Massively parallel (graph-colored) Gibbs
  • Simulated Annealing (SA)
  • Adaptive Parallel Tempering (APT)
  • Simulated Quantum Annealing (SQA)
  • Isoenergetic Cluster Moves (ICM)
  • Non-equilibrium Monte Carlo (NMC)
  • Higher-order p-bits / couplers
  • DBM training (hardware-aware)
  • NQS sampling & training

Current directions

  • Million-node p-computers: partitioning and interconnects that preserve solution quality at extreme scale.
  • Probabilistic AI & NQS at scale: compact, hardware-efficient learning for scientific/quantum data.
  • Heterogeneous p-computers: CMOS + sMTJs to keep improving flips/J and density.
Full publication list: Publications