Research

Publications and preprints

Proof of the \(K(\pi,1)\) conjecture for affine Artin groups with Mario Salvetti

Inventiones Mathematicae 224, pp. 487-572, 2021

Show abstract arXiv Journal (open access )

Proof of the \(K(\pi,1)\) conjecture for affine Artin groups

We prove the \(K(\pi,1)\) conjecture for affine Artin groups: the complexified complement of an affine reflection arrangement is a classifying space. This is a long-standing problem, due to Arnol'd, Pham, and Thom. Our proof is based on recent advancements in the theory of dual Coxeter and Artin groups, as well as on several new results and constructions. In particular: we show that all affine noncrossing partition posets are EL-shellable; we use these posets to construct finite classifying spaces for dual affine Artin groups; we introduce new CW models for the orbit configuration spaces associated with arbitrary Coxeter groups; we construct finite classifying spaces for the braided crystallographic groups introduced by McCammond and Sulway.
Structured prediction as translation between augmented natural languages with Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang, and Stefano Soatto

International Conference on Learning Representations (ICLR), spotlight paper, 2021

Show abstract arXiv Conference (open access ) Code

Structured prediction as translation between augmented natural languages

We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. Instead of tackling the problem by training task-specific discriminative classifiers, we frame it as a translation task between augmented natural languages, from which the task-relevant information can be easily extracted. Our approach can match or outperform task-specific models on all tasks, and in particular achieves new state-of-the-art results on joint entity and relation extraction (CoNLL04, ADE, NYT, and ACE2005 datasets), relation classification (FewRel and TACRED), and semantic role labeling (CoNLL-2005 and CoNLL-2012). We accomplish this while using the same architecture and hyperparameters for all tasks, and even when training a single model to solve all tasks at the same time (multi-task learning). Finally, we show that our framework can also significantly improve the performance in a low-resource regime, thanks to better use of label semantics.
Dual structures on Coxeter and Artin groups of rank three with Emanuele Delucchi and Mario Salvetti

Geometry & Topology, to appear, 2024

Show abstract arXiv

Dual structures on Coxeter and Artin groups of rank three

We extend the theory of dual Coxeter and Artin groups to all rank-three Coxeter systems, beyond the previously studied spherical and affine cases. Using geometric, combinatorial, and topological techniques, we show that rank-three noncrossing partition posets are EL-shellable lattices and give rise to Garside groups isomorphic to the associated standard Artin groups. Within this framework, we prove the \(K(\pi,1)\) conjecture, the triviality of the center, and the solubility of the word problem for rank-three Artin groups. Some of our constructions apply to general Artin groups; we hope they will help develop complete solutions to the \(K(\pi,1)\) conjecture and other open problems in the area.
Rectangular analogues of the square paths conjecture and the univariate Delta conjecture with Alessandro Iraci, Roberto Pagaria, and Anna Vanden Wyngaerd

Combinatorial Theory 3 (2), 2023

Show abstract arXiv (outdated) Journal

Rectangular analogues of the square paths conjecture and the univariate Delta conjecture

In this paper, we extend the rectangular side of the shuffle conjecture by stating a rectangular analogue of the square paths conjecture. In addition, we describe a set of combinatorial objects and one statistic that are a first step towards a rectangular extension of (the rise version of) the Delta conjecture, and of (the rise version of) the Delta square conjecture, corresponding to the case \( q=1 \) of an expected general statement. We also prove our new rectangular paths conjecture in the special case when the sides of the rectangle are coprime.
Taxonomy expansion for Named Entity Recognition with Karthikeyan K, Yogarshi Vyas, Jie Ma, Neha Anna John, Shuai Wang, Yassine Benajiba, Vittorio Castelli, Dan Roth, and Miguel Ballesteros

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Show abstract arXiv (outdated) Conference (open access )

Taxonomy expansion for Named Entity Recognition

Training a Named Entity Recognition (NER) model often involves fixing a taxonomy of entity types. However, requirements evolve and we might need the NER model to recognize additional entity types. A simple approach is to re-annotate entire dataset with both existing and additional entity types and then train the model on the re-annotated dataset. However, this is an extremely laborious task. To remedy this, we propose a novel approach called Partial Label Model (PLM) that uses only partially annotated datasets. We experiment with 6 diverse datasets and show that PLM consistently performs better than most other approaches (0.5 - 2.5 F1), including in novel settings for taxonomy expansion not considered in prior work. The gap between PLM and all other approaches is especially large in settings where there is limited data available for the additional entity types (as much as 11 F1), thus suggesting a more cost effective approaches to taxonomy expansion.
À-la-carte Prompt Tuning (APT): combining distinct data via composable prompting with Benjamin Bowman, Alessandro Achille, Luca Zancato, Matthew Trager, Pramuditha Perera, and Stefano Soatto

Conference on Computer Vision and Pattern Recognition (CVPR), 2023

Show abstract arXiv (outdated) Conference (open access )

À-la-carte Prompt Tuning (APT): combining distinct data via composable prompting

We introduce À-la-carte Prompt Tuning (APT), a transformer-based scheme to tune prompts on distinct data so that they can be arbitrarily composed at inference time. The individual prompts can be trained in isolation, possibly on different devices, at different times, and on different distributions or domains. Furthermore each prompt only contains information about the subset of data it was exposed to during training. During inference, models can be assembled based on arbitrary selections of data sources, which we call "à-la-carte learning". À-la-carte learning enables constructing bespoke models specific to each user's individual access rights and preferences. We can add or remove information from the model by simply adding or removing the corresponding prompts without retraining from scratch. We demonstrate that à-la-carte built models achieve accuracy within 5% of models trained on the union of the respective sources, with comparable cost in terms of training and inference time. For the continual learning benchmarks Split CIFAR-100 and CORe50, we achieve state-of-the-art performance.
A table of \(n\)-component handlebody links of genus \(n+1\) up to six crossings with Giovanni Bellettini, Maurizio Paolini, and Yi-Sheng Wang

Mathematical Proceedings of the Cambridge Philosophical Society 174 (1), 2023

Show abstract arXiv (outdated) Journal

A table of \(n\)-component handlebody links of genus \(n+1\) up to six crossings

A handlebody link is a union of handlebodies of positive genus embedded in 3-space, which generalizes the notion of links in classical knot theory. In this paper, we consider handlebody links with a genus 2 handlebody and \(n-1\) solid tori, \(n>1\). Our main result is the classification of such handlebody links with six crossings or less, up to ambient isotopy.
DIVA: dataset derivative of a learning task with Yonatan Dukler, Alessandro Achille, Avinash Ravichandran, Marzia Polito, and Stefano Soatto

International Conference on Learning Representations (ICLR), 2022

Show abstract arXiv (outdated) Conference (open access )

DIVA: dataset derivative of a learning task

We present a method to compute the derivative of a learning task with respect to a dataset. A learning task is a function from a training set to the validation error, which can be represented by a trained deep neural network (DNN). The "dataset derivative" is a linear operator, computed around the trained model, that informs how perturbations of the weight of each training sample affect the validation error, usually computed on a separate validation dataset. Our method, DIVA (Differentiable Validation) hinges on a closed-form differentiable expression of the leave-one-out cross-validation error around a pre-trained DNN. Such expression constitutes the dataset derivative. DIVA could be used for dataset auto-curation, for example removing samples with faulty annotations, augmenting a dataset with additional relevant samples, or rebalancing. More generally, DIVA can be used to optimize the dataset, along with the parameters of the model, as part of the training process without the need for a separate validation dataset, unlike bi-level optimization methods customary in AutoML. To illustrate the flexibility of DIVA, we report experiments on sample auto-curation tasks such as outlier rejection, dataset extension, and automatic aggregation of multi-modal data.
Factoring isometries of quadratic spaces into reflections with Jon McCammond

Journal of Algebra 605, pp. 226-252, 2022

Show abstract arXiv (outdated) Journal

Factoring isometries of quadratic spaces into reflections

Let \(V\) be a vector space endowed with a non-degenerate quadratic form \(Q\). If the base field \(\mathbb{F}\) is different from \(\mathbb{F}_2\), it is known that every isometry can be written as a product of reflections. In this article, we detail the structure of the poset of all minimal length reflection factorizations of an isometry. If \(\mathbb{F}\) is an ordered field, we also study factorizations into positive reflections, i.e., reflections defined by vectors of positive norm. We characterize such factorizations, under the hypothesis that the squares of \(\mathbb{F}\) are dense in the positive elements (this includes Archimedean and Euclidean fields). In particular, we show that an isometry is a product of positive reflections if and only if its spinor norm is positive. As a final application, we explicitly describe the poset of all factorizations of isometries of the hyperbolic space.
Stacked residuals of dynamic layers for time series anomaly detection with Luca Zancato, Alessandro Achille, Alessandro Chiuso, and Stefano Soatto

arXiv preprint 2202.12457, 2022

Show abstract arXiv

Stacked residuals of dynamic layers for time series anomaly detection

We present an end-to-end differentiable neural network architecture to perform anomaly detection in multivariate time series by incorporating a Sequential Probability Ratio Test on the prediction residual. The architecture is a cascade of dynamical systems designed to separate linearly predictable components of the signal such as trends and seasonality, from the non-linear ones. The former are modeled by local Linear Dynamic Layers, and their residual is fed to a generic Temporal Convolutional Network that also aggregates global statistics from different time series as context for the local predictions of each one. The last layer implements the anomaly detector, which exploits the temporal structure of the prediction residuals to detect both isolated point anomalies and set-point changes. It is based on a novel application of the classic CUMSUM algorithm, adapted through the use of a variational approximation of f-divergences. The model automatically adapts to the time scales of the observed signals. It approximates a SARIMA model at the get-go, and auto-tunes to the statistics of the signal and its covariates, without the need for supervision, as more data is observed. The resulting system, which we call STRIC, outperforms both state-of-the-art robust statistical methods and deep neural network architectures on multiple anomaly detection benchmarks.
The dual approach to the \(K(\pi, 1)\) conjecture

arXiv preprint 2112.05255, 2021

Show abstract arXiv

The dual approach to the \(K(\pi, 1)\) conjecture

Dual presentations of Coxeter groups have recently led to breakthroughs in our understanding of affine Artin groups. In particular, they led to the proof of the \(K(\pi, 1)\) conjecture and to the solution of the word problem. Will the "dual approach" extend to more general classes of Coxeter and Artin groups? In this paper, we describe the techniques used to prove the \(K(\pi, 1)\) conjecture for affine Artin groups and we ask a series of questions that are mostly open beyond the spherical and affine cases.
Estimating informativeness of samples with smooth unique information with Hrayr Harutyunyan, Alessandro Achille, Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, and Stefano Soatto

International Conference on Learning Representations (ICLR), 2021

Show abstract arXiv (outdated) Conference (open access )

Estimating informativeness of samples with smooth unique information

We define a notion of information that an individual sample provides to the training of a neural network, and we specialize it to measure both how much a sample informs the final weights and how much it informs the function computed by the weights. Though related, we show that these quantities have a qualitatively different behavior. We give efficient approximations of these quantities using a linearized network and demonstrate empirically that the approximation is accurate for real-world architectures, such as pre-trained ResNets. We apply these measures to several problems, such as dataset summarization, analysis of under-sampled classes, comparison of informativeness of different data sources, and detection of adversarial and corrupted examples. Our work generalizes existing frameworks, but enjoys better computational properties for heavily over-parametrized models, which makes it possible to apply it to real-world networks.
Euclidean matchings and minimality of hyperplane arrangements with Davide Lofano

Discrete Mathematics 344 (3), 2021

Show abstract arXiv Journal

Euclidean matchings and minimality of hyperplane arrangements

We construct a new class of maximal acyclic matchings on the Salvetti complex of a locally finite hyperplane arrangement. Using discrete Morse theory, we then obtain an explicit proof of the minimality of the complement. Our construction provides interesting insights also in the well-studied case of finite arrangements, and gives a nice geometric description of the Betti numbers of the complement. In particular, we solve a conjecture of Drton and Klivans on the characteristic polynomial of finite reflection arrangements. The minimal complex is compatible with restrictions, and this allows us to prove the isomorphism of Brieskorn’s Lemma by a simple bijection of the critical cells. Finally, in the case of line arrangements, we describe the algebraic Morse complex which computes the homology with coefficients in an abelian local system.
Representations of torsion-free arithmetic matroids with Roberto Pagaria

European Journal of Combinatorics 93, 2021

Show abstract arXiv (outdated) Journal Code

Representations of torsion-free arithmetic matroids

We study the representability problem for torsion-free arithmetic matroids. By using a new operation called "reduction" and a "signed Hermite normal form", we provide and implement an algorithm to compute all the representations, up to equivalence. As an application, we disprove two conjectures about the poset of layers and the independence poset of a toric arrangement.
The information complexity of learning tasks, their structure and their distance with Alessandro Achille, Glen Mbeng, and Stefano Soatto

Information and Inference: a Journal of the IMA, 2021

Show abstract arXiv (outdated) Journal (open access )

The information complexity of learning tasks, their structure and their distance

We introduce an asymmetric distance in the space of learning tasks, and a framework to compute their complexity. These concepts are foundational to the practice of transfer learning, ubiquitous in Deep Learning, whereby a parametric model is pre-trained for a task, and then used for another after fine-tuning. The framework we develop is intrinsically non-asymptotic, capturing the finite nature of the training dataset, yet it allows distinguishing learning from memorization. It encompasses, as special cases, classical notions from Kolmogorov complexity, Shannon, and Fisher Information. However, unlike some of those frameworks, it can be applied easily to large-scale models and real-world datasets. It is the first framework to explicitly account for the optimization scheme, which plays a crucial role in Deep Learning, in measuring complexity and information.
Impossibility results on stability of phylogenetic consensus methods with Emanuele Delucchi and Linard Hoessly

Systematic Biology 69 (3), pp. 557-565, 2020

Show abstract arXiv Journal Code

Impossibility results on stability of phylogenetic consensus methods

We answer two questions raised by Bryant, Francis and Steel in their work on consensus methods in phylogenetics. Consensus methods apply to every practical instance where it is desired to aggregate a set of given phylogenetic trees (say, gene evolution trees) into a resulting, "consensus" tree (say, a species tree). Various stability criteria have been explored in this context, seeking to model desirable consistency properties of consensus methods as the experimental data is updated (e.g., more taxa, or more trees, are mapped). However, such stability conditions can be incompatible with some basic regularity properties that are widely accepted to be essential in any meaningful consensus method. Here, we prove that such an incompatibility does arise in the case of extension stability on binary trees and in the case of associative stability. Our methods combine general theoretical considerations with the use of computer programs tailored to the given stability requirements.
Where is the information in a deep neural network? with Alessandro Achille and Stefano Soatto

arXiv preprint 1905.12213, 2020

Show abstract arXiv

Where is the information in a deep neural network?

Whatever information a deep neural network has gleaned from past data is encoded in its weights. How this information affects the response of the network to future data is largely an open question. In fact, even how to define and measure information in a network entails some subtleties. We measure information in the weights of a deep neural network as the optimal trade-off between accuracy of the network and complexity of the weights relative to a prior. Depending on the prior, the definition reduces to known information measures such as Shannon Mutual Information and Fisher Information, but in general it affords added flexibility that enables us to relate it to generalization, via the PAC-Bayes bound, and to invariance. For the latter, we introduce a notion of effective information in the activations, which are deterministic functions of future inputs. We relate this to the information in the weights, and use this result to show that models of low (information) complexity not only generalize better, but are bound to learn invariant representations of future inputs. These relations hinge not only on the architecture of the model, but also on how it is trained.
Shellability of generalized Dowling posets

Journal of Combinatorial Theory, Series A 171, 2020

Show abstract arXiv Journal

Shellability of generalized Dowling posets

A generalization of Dowling lattices was recently introduced by Bibby and Gadish, in a work on orbit configuration spaces. The authors left open the question as to whether these posets are shellable. In this paper we prove EL-shellability and use it to determine the homotopy type. Our result generalizes shellability of Dowling lattices and of posets of layers of abelian arrangements defined by root systems. We also show that subposets corresponding to invariant subarrangements are not shellable in general.
On the local homology of Artin groups of finite and affine type

Algebraic & Geometric Topology 19 (7), pp. 3615-3639, 2019

Show abstract arXiv (outdated) Journal Code

On the local homology of Artin groups of finite and affine type

We study the local homology of Artin groups using weighted discrete Morse theory. In all finite and affine cases, we are able to construct Morse matchings of a special type (we call them "precise matchings"). The existence of precise matchings implies that the homology has a square-free torsion. This property was known for Artin groups of finite type, but not in general for Artin groups of affine type. We also use the constructed matchings to compute the local homology in all exceptional cases, correcting some results in the literature.
Shellability of posets of labeled partitions and arrangements defined by root systems with Emanuele Delucchi and Noriane Girard

Electronic Journal of Combinatorics 26 (4), 2019

Show abstract arXiv (outdated) Journal (open access )

Shellability of posets of labeled partitions and arrangements defined by root systems

We prove that the posets of connected components of intersections of toric and elliptic arrangements defined by root systems are EL-shellable and we compute their homotopy type. Our method rests on Bibby's description of such posets by means of "labeled partitions": after giving an EL-labeling and counting homology chains for general posets of labeled partitions, we obtain the stated results by considering the appropriate subposets.
Weighted sheaves and homology of Artin groups with Mario Salvetti

Algebraic & Geometric Topology 18 (7), pp. 3943-4000, 2018

Show abstract arXiv Journal

Weighted sheaves and homology of Artin groups

In this paper we expand the theory of weighted sheaves over posets, and use it to study the local homology of Artin groups. First, we use such theory to relate the homology of classical braid groups with the homology of certain independence complexes of graphs. Then, in the context of discrete Morse theory on weighted sheaves, we introduce a particular class of acyclic matchings. Explicit formulas for the homology of the corresponding Morse complexes are given, in terms of the ranks of the associated incidence matrices. We use such method to perform explicit computations for the new affine case \(\tilde C_n\), as well as for the cases \(A_n\), \(B_n\) and \(\tilde A_n\) (which were already done before by different methods).
Collapsibility to a subcomplex of a given dimension is NP-complete

Discrete & Computational Geometry 59 (1), pp. 246-251, 2018

Show abstract arXiv Journal

Collapsibility to a subcomplex of a given dimension is NP-complete

In this paper we extend the works of Tancer and of Malgouyres and Francés, showing that \((d,k)\)-collapsibility is NP-complete for \(d \geq k+2\) except \((2,0)\). By \((d,k)\)-collapsibility we mean the following problem: determine whether a given \(d\)-dimensional simplicial complex can be collapsed to some \(k\)-dimensional subcomplex. The question of establishing the complexity status of \((d,k)\)-collapsibility was asked by Tancer, who proved NP-completeness of \((d,0)\) and \((d,1)\)-collapsibility (for \(d \geq 3\)). Our extended result, together with the known polynomial-time algorithms for \((2,0)\) and \(d=k+1\), answers the question completely.
An algorithm for canonical forms of finite subsets of \(\mathbb{Z}^d\) up to affinities

Discrete & Computational Geometry 58 (2), pp. 293-312, 2017

Show abstract arXiv Journal

An algorithm for canonical forms of finite subsets of \(\mathbb{Z}^d\) up to affinities

In this paper, we describe an algorithm for the computation of canonical forms of finite subsets of \(\mathbb{Z}^d\), up to affinities over \(\mathbb{Z}\). For fixed dimension \(d\), the worst-case asymptotic complexity of this algorithm is \(O(n \log^2 n \, s \, \mu(s))\), where \(n\) is the number of points in the given subset, \(s\) is an upper bound to the size of the binary representation of any of the \(n\) points, and \(\mu(s)\) is an upper bound to the number of operations required to multiply two \(s\)-bit numbers. This problem arises e.g. in the context of invariants computation of finitely presented groups with abelianized group isomorphic to \(\mathbb{Z}^d\). In that context one needs to decide whether two Laurent polynomials in \(d\) indeterminates, considered as elements of the group ring over the abelianized group, are equivalent with respect to a change of basis.
On the classifying space of Artin monoids

Communications in Algebra 45 (11), pp. 4740-4757, 2017

Show abstract arXiv Journal

On the classifying space of Artin monoids

A theorem proved by Dobrinskaya in 2006 shows that there is a strong connection between the \(K(\pi,1)\) conjecture for Artin groups and the classifying space of Artin monoids. More recently Ozornova obtained a different proof of Dobrinskaya's theorem based on the application of discrete Morse theory to the standard CW model of the classifying space of an Artin monoid. In Ozornova's work there are hints at some deeper connections between the above-mentioned CW model and the Salvetti complex, a CW complex which arises in the combinatorial study of Artin groups. In this work we show that such connections actually exist, and as a consequence we derive yet another proof of Dobrinskaya's theorem.

Theses

Topology and combinatorics of affine reflection arrangements

PhD thesis (2019)

PDF
Discrete Morse theory and the \(K(\pi,1)\) conjecture

Master thesis (2015)

PDF

Research

Proof of the \(K(\pi,1)\) conjecture for affine Artin groups with Mario Salvetti

Structured prediction as translation between augmented natural languages with Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang, and Stefano Soatto

Dual structures on Coxeter and Artin groups of rank three with Emanuele Delucchi and Mario Salvetti

Rectangular analogues of the square paths conjecture and the univariate Delta conjecture with Alessandro Iraci, Roberto Pagaria, and Anna Vanden Wyngaerd

Taxonomy expansion for Named Entity Recognition with Karthikeyan K, Yogarshi Vyas, Jie Ma, Neha Anna John, Shuai Wang, Yassine Benajiba, Vittorio Castelli, Dan Roth, and Miguel Ballesteros

À-la-carte Prompt Tuning (APT): combining distinct data via composable prompting with Benjamin Bowman, Alessandro Achille, Luca Zancato, Matthew Trager, Pramuditha Perera, and Stefano Soatto

A table of \(n\)-component handlebody links of genus \(n+1\) up to six crossings with Giovanni Bellettini, Maurizio Paolini, and Yi-Sheng Wang

DIVA: dataset derivative of a learning task with Yonatan Dukler, Alessandro Achille, Avinash Ravichandran, Marzia Polito, and Stefano Soatto

Factoring isometries of quadratic spaces into reflections with Jon McCammond

Stacked residuals of dynamic layers for time series anomaly detection with Luca Zancato, Alessandro Achille, Alessandro Chiuso, and Stefano Soatto

The dual approach to the \(K(\pi, 1)\) conjecture

Estimating informativeness of samples with smooth unique information with Hrayr Harutyunyan, Alessandro Achille, Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, and Stefano Soatto

Euclidean matchings and minimality of hyperplane arrangements with Davide Lofano

Representations of torsion-free arithmetic matroids with Roberto Pagaria

The information complexity of learning tasks, their structure and their distance with Alessandro Achille, Glen Mbeng, and Stefano Soatto

Impossibility results on stability of phylogenetic consensus methods with Emanuele Delucchi and Linard Hoessly

Where is the information in a deep neural network? with Alessandro Achille and Stefano Soatto

Shellability of generalized Dowling posets

On the local homology of Artin groups of finite and affine type

Shellability of posets of labeled partitions and arrangements defined by root systems with Emanuele Delucchi and Noriane Girard

Weighted sheaves and homology of Artin groups with Mario Salvetti

Collapsibility to a subcomplex of a given dimension is NP-complete

An algorithm for canonical forms of finite subsets of \(\mathbb{Z}^d\) up to affinities

On the classifying space of Artin monoids

Topology and combinatorics of affine reflection arrangements

Discrete Morse theory and the \(K(\pi,1)\) conjecture