Research
My current research
My research mainly focuses on understanding the inductive bias of overparametrized deep neural networks, especially in the context of multi-task learning, representation learning, and transfer learning. In the case of L2-regularized parameters, we have proven a theorem that helps to understand the inductive bias towards multi-task learning of deep infinitely wide ReLU-NNs (https://arxiv.org/abs/2112.15577). The proof of this theorem led us to the discovery of a fast almost loss-less compression method for ReLU NNs, which we have not tested well enough in practice yet (https://openreview.net/pdf?id=9GUTgHZgKCH).
Further, we developed a method to estimate the epistemic uncertainty of NN’s prediction (https://proceedings.mlr.press/v162/heiss22a.html). We improved 2 times the SOTA of multiple market design (combinatorial auction) benchmarks 1st by developing a NN architecture that enforces monotonicity constraints and implementing it in an auction mechanism (https://www.ijcai.org/proceedings/2022/0077.pdf) and 2nd by promoting exploration into our auction mechanism in Bayesian optimization fashion by using our estimation of epistemic uncertainty (https://doi.org/10.1609/aaai.v37i5.25726), where our simulations suggest that revenue could be increased by more than 200 million USD for an auction comparable to the Canadian 4G spectrum auction, but it is still a very long way until this mechanism could be implanted in such large auctions. Recently we modified our mechanism to use more practical demand queries instead of value queries (https://www.researchgate.net/publication/373262611_Machine_Learning-powered_Combinatorial_Clock_Auction accepted to AAAI'24).
In another project, we extended the theory and methodology for Path-Dependent Neural Jump ODEs to deal with noisy irregularly observed time series (https://openreview.net/forum?id=0T2OTVCCC1).
Unpublished projects I am working on right now: Outlier-robust NNs; Deep probabilistic calibration of financial models; better theoretical understanding of the Cold Posterior phenomena of Bayesian Neural Networks; Further improving theoretical understanding of the inductive bias of various ML methods (e.g. extending https://arxiv.org/abs/1911.02903); extending our techniques for combinatorial auctions.
My opinion on the generalization strengths of deep neural networks
I am very excited about the empirical fact that deep learning methods can generalize surprisingly well to unseen data points. I am extremely curious to understand their inductive bias that allows them to do so and especially how this inductive bias is influenced by design choices of the architecture and hyper-parameters, especially in the context of multi-task learning, transfer learning, representation learning, and feature learning. For some specific design-choices, I derived a theory to understand the inductive bias of NNs. In general, I see the following 4 main strengths in the inductive bias of neural networks:
- Deep Learning can strongly benefit from multi-task learning, transfer learning, representation learning, and feature learning.
- NNs with standard activations functions (such as ReLU) have an inductive bias towards flat/simple/smooth/nonoscillating functions because of implicit (and explicit) regularization.
- Some architectures (such as transformers, CNNs, RNNs, GNNs) have (soft) invariances/symmetries that are helpful for certain domains
- the flexibility of architectures and training algorithms allows for many ways to manipulate the inductive bias by hand-crafting tricks such as specific forms of data augmentation.
Talks
- ETH AI Center Fellows x Associated Researchers Meetup in Zürich, 2024 Deep Learning Theory on Multi-task Learning (photo)
- SfS-PhD Talk in Zürich, 2023, How to forecast consistently based on noisy incomplete observations at irregular observation times (video)
- Oxford ETH Workshop in Oxford, 2023, Bayesian Optimization-based Combinatorial Assignment (video)
- Post/Doctoral Seminar in Mathematical Finance in Zürich, 2023, Bayesian Optimization-based Combinatorial Assignment (video)
- SfS-PhD Talk in Zürich, 2023, Bayesian Optimization-based Combinatorial Assignment (video)
- Yu Group meeting in Berkeley, 2023, Theory on understanding the inductive bias of deep neural networks towards multi-task learning and methods to reduce the number of neurons
- AAAI’23 in Washington DC, 2023, Bayesian Optimization-based Combinatorial Assignment (video)
- Uncertainty reading group via zoom, 2022, NOMU: Neural Optimization-based Model Uncertainty (video)
- Oxford ETH Workshop in Zürich, 2022, How Infinitely Wide Neural Networks Benefit from Multi-task Learning - an Exact Macroscopic Characterization
- SfS-PhD Talk in Zürich, 2022, How Infinitely Wide Neural Networks Benefit from Multi-task Learning - an Exact Macroscopic Characterization
- Post/Doctoral Seminar in Mathematical Finance in Zürich, 2020, How implicit Regularization of Artificial Neural Networks Characterized the Learned function or a Mathematical Point of View on the Psychology of Artificial Neural Networks
- ViZuS in Vienna, 2019, How implicit regularization of neural networks affects the learned function
- FWZ Seminar in Padova, 2019, Randomized shallow neural networks and their use in understanding gradient descent
My research topics include:
Theory of the inductive bias on infinitely wide deep neural networks (including muti-task learning and transfer learning). Uncertainty and generalization of neural networks. Bayesian optimization with the help of neural networks. Deep Learning in Market Design. Compression of neural networks. Monotonic neural networks. Bayesian neural networks. Outlier-Robust neural network. Irregularly observed time series.