
Proof of the \(K(\pi,1)\) conjecture for affine Artin groups
with Mario Salvetti
Inventiones Mathematicae 224, pp. 487572,
2021
We prove the \(K(\pi,1)\) conjecture for affine Artin groups: the complexified complement of an affine reflection arrangement is a classifying space. This is a longstanding problem, due to Arnol'd, Pham, and Thom. Our proof is based on recent advancements in the theory of dual Coxeter and Artin groups, as well as on several new results and constructions. In particular: we show that all affine noncrossing partition posets are ELshellable; we use these posets to construct finite classifying spaces for dual affine Artin groups; we introduce new CW models for the orbit configuration spaces associated with arbitrary Coxeter groups; we construct finite classifying spaces for the braided crystallographic groups introduced by McCammond and Sulway.

Structured prediction as translation between augmented natural languages
with Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang, and Stefano Soatto
International Conference on Learning Representations (ICLR),
spotlight paper,
2021
We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. Instead of tackling the problem by training taskspecific discriminative classifiers, we frame it as a translation task between augmented natural languages, from which the taskrelevant information can be easily extracted. Our approach can match or outperform taskspecific models on all tasks, and in particular achieves new stateoftheart results on joint entity and relation extraction (CoNLL04, ADE, NYT, and ACE2005 datasets), relation classification (FewRel and TACRED), and semantic role labeling (CoNLL2005 and CoNLL2012). We accomplish this while using the same architecture and hyperparameters for all tasks, and even when training a single model to solve all tasks at the same time (multitask learning). Finally, we show that our framework can also significantly improve the performance in a lowresource regime, thanks to better use of label semantics.

Fewer truncations improve language modeling
with Hantian Ding, Zijian Wang, Varun Kumar, Anoop Deoras, Dan Roth, and Stefano Soatto
International Conference on Machine Learning (ICML), to appear,
2024
In large language model training, input documents are typically concatenated together and then split into sequences of equal length to avoid padding tokens. Despite its efficiency, the concatenation approach compromises data integrity  it inevitably breaks many documents into incomplete pieces, leading to excessive truncations that hinder the model from learning to compose logically coherent and factually consistent content that is grounded on the complete context. To address the issue, we propose Bestfit Packing, a scalable and efficient method that packs documents into training sequences through lengthaware combinatorial optimization. Our method completely eliminates unnecessary truncations while retaining the same training efficiency as concatenation. Empirical results from both text and code pretraining show that our method achieves superior performance (e.g., relatively +4.7% on reading comprehension; +16.8% in context following; and +9.2% on program synthesis), and reduces closeddomain hallucination effectively by up to 58.3%.

Dual structures on Coxeter and Artin groups of rank three
with Emanuele Delucchi and Mario Salvetti
Geometry & Topology, to appear,
2024
We extend the theory of dual Coxeter and Artin groups to all rankthree Coxeter systems, beyond the previously studied spherical and affine cases. Using geometric, combinatorial, and topological techniques, we show that rankthree noncrossing partition posets are ELshellable lattices and give rise to Garside groups isomorphic to the associated standard Artin groups. Within this framework, we prove the \(K(\pi,1)\) conjecture, the triviality of the center, and the solubility of the word problem for rankthree Artin groups. Some of our constructions apply to general Artin groups; we hope they will help develop complete solutions to the \(K(\pi,1)\) conjecture and other open problems in the area.

General Purpose Verification for Chain of Thought Prompting
with Robert Vacareanu, Anurag Pratik, Evangelia Spiliopoulou, Zheng Qi, Neha Anna John, Jie Ma, Yassine Benajiba, and Miguel Ballesteros
arXiv preprint 2405.00204,
2024
Many of the recent capabilities demonstrated by Large Language Models (LLMs) arise primarily from their ability to exploit contextual information. In this paper, we explore ways to improve reasoning capabilities of LLMs through (1) exploration of different chains of thought and (2) validation of the individual steps of the reasoning process. We propose three general principles that a model should adhere to while reasoning: (i) Relevance, (ii) Mathematical Accuracy, and (iii) Logical Consistency. We apply these constraints to the reasoning steps generated by the LLM to improve the accuracy of the final generation. The constraints are applied in the form of verifiers: the model itself is asked to verify if the generated steps satisfy each constraint. To further steer the generations towards highquality solutions, we use the perplexity of the reasoning steps as an additional verifier. We evaluate our method on 4 distinct types of reasoning tasks, spanning a total of 9 different datasets. Experiments show that our method is always better than vanilla generation, and, in 6 out of the 9 datasets, it is better than bestof N sampling which samples N reasoning chains and picks the lowest perplexity generation.

A weak supervision approach for fewshot aspect based sentiment analysis
with Robert Vacareanu, Siddharth Varia, Kishaloy Halder, Shuai Wang, Neha Anna John, Miguel Ballesteros, and Smaranda Muresan
Conference of the European Chapter of the Association for Computational Linguistics (EACL),
2024
We explore how weak supervision on abundant unlabeled data can be leveraged to improve fewshot performance in aspectbased sentiment analysis (ABSA) tasks. We propose a pipeline approach to construct a noisy ABSA dataset, and we use it to adapt a pretrained sequencetosequence model to the ABSA tasks. We test the resulting model on three widely used ABSA datasets, before and after finetuning. Our proposed method preserves the full finetuning performance while showing significant improvements (15.84 absolute F1) in the fewshot learning scenario for the harder tasks. In zeroshot (i.e., without finetuning), our method outperforms the previous state of the art on the aspect extraction sentiment classification (AESC) task and is, additionally, capable of performing the harder aspect sentiment triplet extraction (ASTE) task.

Taxonomy expansion for Named Entity Recognition
with Karthikeyan K, Yogarshi Vyas, Jie Ma, Neha Anna John, Shuai Wang, Yassine Benajiba, Vittorio Castelli, Dan Roth, and Miguel Ballesteros
Conference on Empirical Methods in Natural Language Processing (EMNLP),
2023
Training a Named Entity Recognition (NER) model often involves fixing a taxonomy of entity types. However, requirements evolve and we might need the NER model to recognize additional entity types. A simple approach is to reannotate entire dataset with both existing and additional entity types and then train the model on the reannotated dataset. However, this is an extremely laborious task. To remedy this, we propose a novel approach called Partial Label Model (PLM) that uses only partially annotated datasets. We experiment with 6 diverse datasets and show that PLM consistently performs better than most other approaches (0.5  2.5 F1), including in novel settings for taxonomy expansion not considered in prior work. The gap between PLM and all other approaches is especially large in settings where there is limited data available for the additional entity types (as much as 11 F1), thus suggesting a more cost effective approaches to taxonomy expansion.

Rectangular analogues of the square paths conjecture and the univariate Delta conjecture
with Alessandro Iraci, Roberto Pagaria, and Anna Vanden Wyngaerd
Combinatorial Theory 3 (2),
2023
In this paper, we extend the rectangular side of the shuffle conjecture by stating a rectangular analogue of the square paths conjecture. In addition, we describe a set of combinatorial objects and one statistic that are a first step towards a rectangular extension of (the rise version of) the Delta conjecture, and of (the rise version of) the Delta square conjecture, corresponding to the case \( q=1 \) of an expected general statement. We also prove our new rectangular paths conjecture in the special case when the sides of the rectangle are coprime.

Àlacarte Prompt Tuning (APT): combining distinct data via composable prompting
with Benjamin Bowman, Alessandro Achille, Luca Zancato, Matthew Trager, Pramuditha Perera, and Stefano Soatto
Conference on Computer Vision and Pattern Recognition (CVPR),
2023
We introduce Àlacarte Prompt Tuning (APT), a transformerbased scheme to tune prompts on distinct data so that they can be arbitrarily composed at inference time. The individual prompts can be trained in isolation, possibly on different devices, at different times, and on different distributions or domains. Furthermore each prompt only contains information about the subset of data it was exposed to during training. During inference, models can be assembled based on arbitrary selections of data sources, which we call "àlacarte learning". Àlacarte learning enables constructing bespoke models specific to each user's individual access rights and preferences. We can add or remove information from the model by simply adding or removing the corresponding prompts without retraining from scratch. We demonstrate that àlacarte built models achieve accuracy within 5% of models trained on the union of the respective sources, with comparable cost in terms of training and inference time. For the continual learning benchmarks Split CIFAR100 and CORe50, we achieve stateoftheart performance.

A table of \(n\)component handlebody links of genus \(n+1\) up to six crossings
with Giovanni Bellettini, Maurizio Paolini, and YiSheng Wang
Mathematical Proceedings of the Cambridge Philosophical Society 174 (1),
2023
A handlebody link is a union of handlebodies of positive genus embedded in 3space, which generalizes the notion of links in classical knot theory. In this paper, we consider handlebody links with a genus 2 handlebody and \(n1\) solid tori, \(n>1\). Our main result is the classification of such handlebody links with six crossings or less, up to ambient isotopy.

DIVA: dataset derivative of a learning task
with Yonatan Dukler, Alessandro Achille, Avinash Ravichandran, Marzia Polito, and Stefano Soatto
International Conference on Learning Representations (ICLR),
2022
We present a method to compute the derivative of a learning task with respect to a dataset. A learning task is a function from a training set to the validation error, which can be represented by a trained deep neural network (DNN). The "dataset derivative" is a linear operator, computed around the trained model, that informs how perturbations of the weight of each training sample affect the validation error, usually computed on a separate validation dataset. Our method, DIVA (Differentiable Validation) hinges on a closedform differentiable expression of the leaveoneout crossvalidation error around a pretrained DNN. Such expression constitutes the dataset derivative. DIVA could be used for dataset autocuration, for example removing samples with faulty annotations, augmenting a dataset with additional relevant samples, or rebalancing. More generally, DIVA can be used to optimize the dataset, along with the parameters of the model, as part of the training process without the need for a separate validation dataset, unlike bilevel optimization methods customary in AutoML. To illustrate the flexibility of DIVA, we report experiments on sample autocuration tasks such as outlier rejection, dataset extension, and automatic aggregation of multimodal data.

Factoring isometries of quadratic spaces into reflections
with Jon McCammond
Journal of Algebra 605, pp. 226252,
2022
Let \(V\) be a vector space endowed with a nondegenerate quadratic form \(Q\). If the base field \(\mathbb{F}\) is different from \(\mathbb{F}_2\), it is known that every isometry can be written as a product of reflections. In this article, we detail the structure of the poset of all minimal length reflection factorizations of an isometry. If \(\mathbb{F}\) is an ordered field, we also study factorizations into positive reflections, i.e., reflections defined by vectors of positive norm. We characterize such factorizations, under the hypothesis that the squares of \(\mathbb{F}\) are dense in the positive elements (this includes Archimedean and Euclidean fields). In particular, we show that an isometry is a product of positive reflections if and only if its spinor norm is positive. As a final application, we explicitly describe the poset of all factorizations of isometries of the hyperbolic space.

Stacked residuals of dynamic layers for time series anomaly detection
with Luca Zancato, Alessandro Achille, Alessandro Chiuso, and Stefano Soatto
arXiv preprint 2202.12457,
2022
We present an endtoend differentiable neural network architecture to perform anomaly detection in multivariate time series by incorporating a Sequential Probability Ratio Test on the prediction residual. The architecture is a cascade of dynamical systems designed to separate linearly predictable components of the signal such as trends and seasonality, from the nonlinear ones. The former are modeled by local Linear Dynamic Layers, and their residual is fed to a generic Temporal Convolutional Network that also aggregates global statistics from different time series as context for the local predictions of each one. The last layer implements the anomaly detector, which exploits the temporal structure of the prediction residuals to detect both isolated point anomalies and setpoint changes. It is based on a novel application of the classic CUMSUM algorithm, adapted through the use of a variational approximation of fdivergences. The model automatically adapts to the time scales of the observed signals. It approximates a SARIMA model at the getgo, and autotunes to the statistics of the signal and its covariates, without the need for supervision, as more data is observed. The resulting system, which we call STRIC, outperforms both stateoftheart robust statistical methods and deep neural network architectures on multiple anomaly detection benchmarks.

The dual approach to the \(K(\pi, 1)\) conjecture
arXiv preprint 2112.05255,
2021
Dual presentations of Coxeter groups have recently led to breakthroughs in our understanding of affine Artin groups. In particular, they led to the proof of the \(K(\pi, 1)\) conjecture and to the solution of the word problem. Will the "dual approach" extend to more general classes of Coxeter and Artin groups? In this paper, we describe the techniques used to prove the \(K(\pi, 1)\) conjecture for affine Artin groups and we ask a series of questions that are mostly open beyond the spherical and affine cases.

Estimating informativeness of samples with smooth unique information
with Hrayr Harutyunyan, Alessandro Achille, Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, and Stefano Soatto
International Conference on Learning Representations (ICLR),
2021
We define a notion of information that an individual sample provides to the training of a neural network, and we specialize it to measure both how much a sample informs the final weights and how much it informs the function computed by the weights. Though related, we show that these quantities have a qualitatively different behavior. We give efficient approximations of these quantities using a linearized network and demonstrate empirically that the approximation is accurate for realworld architectures, such as pretrained ResNets. We apply these measures to several problems, such as dataset summarization, analysis of undersampled classes, comparison of informativeness of different data sources, and detection of adversarial and corrupted examples. Our work generalizes existing frameworks, but enjoys better computational properties for heavily overparametrized models, which makes it possible to apply it to realworld networks.

Euclidean matchings and minimality of hyperplane arrangements
with Davide Lofano
Discrete Mathematics 344 (3),
2021
We construct a new class of maximal acyclic matchings on the Salvetti complex of a locally finite hyperplane arrangement. Using discrete Morse theory, we then obtain an explicit proof of the minimality of the complement. Our construction provides interesting insights also in the wellstudied case of finite arrangements, and gives a nice geometric description of the Betti numbers of the complement. In particular, we solve a conjecture of Drton and Klivans on the characteristic polynomial of finite reflection arrangements. The minimal complex is compatible with restrictions, and this allows us to prove the isomorphism of Brieskorn’s Lemma by a simple bijection of the critical cells. Finally, in the case of line arrangements, we describe the algebraic Morse complex which computes the homology with coefficients in an abelian local system.

Representations of torsionfree arithmetic matroids
with Roberto Pagaria
European Journal of Combinatorics 93,
2021
We study the representability problem for torsionfree arithmetic matroids. By using a new operation called "reduction" and a "signed Hermite normal form", we provide and implement an algorithm to compute all the representations, up to equivalence. As an application, we disprove two conjectures about the poset of layers and the independence poset of a toric arrangement.

The information complexity of learning tasks, their structure and their distance
with Alessandro Achille, Glen Mbeng, and Stefano Soatto
Information and Inference: a Journal of the IMA,
2021
We introduce an asymmetric distance in the space of learning tasks, and a framework to compute their complexity. These concepts are foundational to the practice of transfer learning, ubiquitous in Deep Learning, whereby a parametric model is pretrained for a task, and then used for another after finetuning. The framework we develop is intrinsically nonasymptotic, capturing the finite nature of the training dataset, yet it allows distinguishing learning from memorization. It encompasses, as special cases, classical notions from Kolmogorov complexity, Shannon, and Fisher Information. However, unlike some of those frameworks, it can be applied easily to largescale models and realworld datasets. It is the first framework to explicitly account for the optimization scheme, which plays a crucial role in Deep Learning, in measuring complexity and information.

Impossibility results on stability of phylogenetic consensus methods
with Emanuele Delucchi and Linard Hoessly
Systematic Biology 69 (3), pp. 557565,
2020
We answer two questions raised by Bryant, Francis and Steel in their work on consensus methods in phylogenetics. Consensus methods apply to every practical instance where it is desired to aggregate a set of given phylogenetic trees (say, gene evolution trees) into a resulting, "consensus" tree (say, a species tree). Various stability criteria have been explored in this context, seeking to model desirable consistency properties of consensus methods as the experimental data is updated (e.g., more taxa, or more trees, are mapped). However, such stability conditions can be incompatible with some basic regularity properties that are widely accepted to be essential in any meaningful consensus method. Here, we prove that such an incompatibility does arise in the case of extension stability on binary trees and in the case of associative stability. Our methods combine general theoretical considerations with the use of computer programs tailored to the given stability requirements.

Where is the information in a deep neural network?
with Alessandro Achille and Stefano Soatto
arXiv preprint 1905.12213,
2020
Whatever information a deep neural network has gleaned from past data is encoded in its weights. How this information affects the response of the network to future data is largely an open question. In fact, even how to define and measure information in a network entails some subtleties. We measure information in the weights of a deep neural network as the optimal tradeoff between accuracy of the network and complexity of the weights relative to a prior. Depending on the prior, the definition reduces to known information measures such as Shannon Mutual Information and Fisher Information, but in general it affords added flexibility that enables us to relate it to generalization, via the PACBayes bound, and to invariance. For the latter, we introduce a notion of effective information in the activations, which are deterministic functions of future inputs. We relate this to the information in the weights, and use this result to show that models of low (information) complexity not only generalize better, but are bound to learn invariant representations of future inputs. These relations hinge not only on the architecture of the model, but also on how it is trained.

Shellability of generalized Dowling posets
Journal of Combinatorial Theory, Series A 171,
2020
A generalization of Dowling lattices was recently introduced by Bibby and Gadish, in a work on orbit configuration spaces. The authors left open the question as to whether these posets are shellable. In this paper we prove ELshellability and use it to determine the homotopy type. Our result generalizes shellability of Dowling lattices and of posets of layers of abelian arrangements defined by root systems. We also show that subposets corresponding to invariant subarrangements are not shellable in general.

On the local homology of Artin groups of finite and affine type
Algebraic & Geometric Topology 19 (7), pp. 36153639,
2019
We study the local homology of Artin groups using weighted discrete Morse theory. In all finite and affine cases, we are able to construct Morse matchings of a special type (we call them "precise matchings"). The existence of precise matchings implies that the homology has a squarefree torsion. This property was known for Artin groups of finite type, but not in general for Artin groups of affine type. We also use the constructed matchings to compute the local homology in all exceptional cases, correcting some results in the literature.

Shellability of posets of labeled partitions and arrangements defined by root systems
with Emanuele Delucchi and Noriane Girard
Electronic Journal of Combinatorics 26 (4),
2019
We prove that the posets of connected components of intersections of toric and elliptic arrangements defined by root systems are ELshellable and we compute their homotopy type. Our method rests on Bibby's description of such posets by means of "labeled partitions": after giving an ELlabeling and counting homology chains for general posets of labeled partitions, we obtain the stated results by considering the appropriate subposets.

Weighted sheaves and homology of Artin groups
with Mario Salvetti
Algebraic & Geometric Topology 18 (7), pp. 39434000,
2018
In this paper we expand the theory of weighted sheaves over posets, and use it to study the local homology of Artin groups. First, we use such theory to relate the homology of classical braid groups with the homology of certain independence complexes of graphs. Then, in the context of discrete Morse theory on weighted sheaves, we introduce a particular class of acyclic matchings. Explicit formulas for the homology of the corresponding Morse complexes are given, in terms of the ranks of the associated incidence matrices. We use such method to perform explicit computations for the new affine case \(\tilde C_n\), as well as for the cases \(A_n\), \(B_n\) and \(\tilde A_n\) (which were already done before by different methods).

Collapsibility to a subcomplex of a given dimension is NPcomplete
Discrete & Computational Geometry 59 (1), pp. 246251,
2018
In this paper we extend the works of Tancer and of Malgouyres and Francés, showing that \((d,k)\)collapsibility is NPcomplete for \(d \geq k+2\) except \((2,0)\). By \((d,k)\)collapsibility we mean the following problem: determine whether a given \(d\)dimensional simplicial complex can be collapsed to some \(k\)dimensional subcomplex. The question of establishing the complexity status of \((d,k)\)collapsibility was asked by Tancer, who proved NPcompleteness of \((d,0)\) and \((d,1)\)collapsibility (for \(d \geq 3\)). Our extended result, together with the known polynomialtime algorithms for \((2,0)\) and \(d=k+1\), answers the question completely.

An algorithm for canonical forms of finite subsets of \(\mathbb{Z}^d\) up to affinities
Discrete & Computational Geometry 58 (2), pp. 293312,
2017
In this paper, we describe an algorithm for the computation of canonical forms of finite subsets of \(\mathbb{Z}^d\), up to affinities over \(\mathbb{Z}\). For fixed dimension \(d\), the worstcase asymptotic complexity of this algorithm is \(O(n \log^2 n \, s \, \mu(s))\), where \(n\) is the number of points in the given subset, \(s\) is an upper bound to the size of the binary representation of any of the \(n\) points, and \(\mu(s)\) is an upper bound to the number of operations required to multiply two \(s\)bit numbers. This problem arises e.g. in the context of invariants computation of finitely presented groups with abelianized group isomorphic to \(\mathbb{Z}^d\). In that context one needs to decide whether two Laurent polynomials in \(d\) indeterminates, considered as elements of the group ring over the abelianized group, are equivalent with respect to a change of basis.

On the classifying space of Artin monoids
Communications in Algebra 45 (11), pp. 47404757,
2017
A theorem proved by Dobrinskaya in 2006 shows that there is a strong connection between the \(K(\pi,1)\) conjecture for Artin groups and the classifying space of Artin monoids. More recently Ozornova obtained a different proof of Dobrinskaya's theorem based on the application of discrete Morse theory to the standard CW model of the classifying space of an Artin monoid. In Ozornova's work there are hints at some deeper connections between the abovementioned CW model and the Salvetti complex, a CW complex which arises in the combinatorial study of Artin groups. In this work we show that such connections actually exist, and as a consequence we derive yet another proof of Dobrinskaya's theorem.