arxiv-updates
arxiv-updates copied to clipboard
New submissions for Mon, 25 Dec 23
Keyword: sgd
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise
- Authors: Authors: Rui Pan, Yuxing Liu, Xiaoyu Wang, Tong Zhang
- Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2312.14567
- Pdf link: https://arxiv.org/pdf/2312.14567
- Abstract Heavy-ball momentum with decaying learning rates is widely used with SGD for optimizing deep learning models. In contrast to its empirical popularity, the understanding of its theoretical property is still quite limited, especially under the standard anisotropic gradient noise condition for quadratic regression problems. Although it is widely conjectured that heavy-ball momentum method can provide accelerated convergence and should work well in large batch settings, there is no rigorous theoretical analysis. In this paper, we fill this theoretical gap by establishing a non-asymptotic convergence bound for stochastic heavy-ball methods with step decay scheduler on quadratic objectives, under the anisotropic gradient noise condition. As a direct implication, we show that heavy-ball momentum can provide $\tilde{\mathcal{O}}(\sqrt{\kappa})$ accelerated convergence of the bias term of SGD while still achieving near-optimal convergence rate with respect to the stochastic variance term. The combined effect implies an overall convergence rate within log factors from the statistical minimax rate. This means SGD with heavy-ball momentum is useful in the large-batch settings such as distributed machine learning or federated learning, where a smaller number of iterations can significantly reduce the number of communication rounds, leading to acceleration in practice.
Keyword: optimization
Dynamic Topic Language Model on Heterogeneous Children's Mental Health Clinical Notes
- Authors: Authors: Hanwen Ye, Tatiana Moreno, Adrianne Alpern, Louis Ehwerhemuepha, Annie Qu
- Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2312.14180
- Pdf link: https://arxiv.org/pdf/2312.14180
- Abstract Mental health diseases affect children's lives and well-beings which have received increased attention since the COVID-19 pandemic. Analyzing psychiatric clinical notes with topic models is critical to evaluate children's mental status over time. However, few topic models are built for longitudinal settings, and they fail to keep consistent topics and capture temporal trajectories for each document. To address these challenges, we develop a longitudinal topic model with time-invariant topics and individualized temporal dependencies on the evolving document metadata. Our model preserves the semantic meaning of discovered topics over time and incorporates heterogeneity among documents. In particular, when documents can be categorized, we propose an unsupervised topics learning approach to maximize topic heterogeneity across different document groups. We also present an efficient variational optimization procedure adapted for the multistage longitudinal setting. In this case study, we apply our method to the psychiatric clinical notes from a large tertiary pediatric hospital in Southern California and achieve a 38% increase in the overall coherence of extracted topics. Our real data analysis reveals that children tend to express more negative emotions during state shutdowns and more positive when schools reopen. Furthermore, it suggests that sexual and gender minority (SGM) children display more pronounced reactions to major COVID-19 events and a greater sensitivity to vaccine-related news than non-SGM children. This study examines the progression of children's mental health during the pandemic and offers clinicians valuable insights to recognize the disparities in children's mental health related to their sexual and gender identities.
Optimizing Heat Alert Issuance for Public Health in the United States with Reinforcement Learning
- Authors: Authors: Ellen M. Considine, Rachel C. Nethery, Gregory A. Wellenius, Francesca Dominici, Mauricio Tec
- Subjects: Machine Learning (cs.LG); Applications (stat.AP)
- Arxiv link: https://arxiv.org/abs/2312.14196
- Pdf link: https://arxiv.org/pdf/2312.14196
- Abstract Alerting the public when heat may harm their health is a crucial service, especially considering that extreme heat events will be more frequent under climate change. Current practice for issuing heat alerts in the US does not take advantage of modern data science methods for optimizing local alert criteria. Specifically, application of reinforcement learning (RL) has the potential to inform more health-protective policies, accounting for regional and sociodemographic heterogeneity as well as sequential dependence of alerts. In this work, we formulate the issuance of heat alerts as a sequential decision making problem and develop modifications to the RL workflow to address challenges commonly encountered in environmental health settings. Key modifications include creating a simulator that pairs hierarchical Bayesian modeling of low-signal health effects with sampling of real weather trajectories (exogenous features), constraining the total number of alerts issued as well as preventing alerts on less-hot days, and optimizing location-specific policies. Post-hoc contrastive analysis offers insights into scenarios when using RL for heat alert issuance may protect public health better than the current or alternative policies. This work contributes to a broader movement of advancing data-driven policy optimization for public health and climate change adaptation.
Efficient Architecture Search via Bi-level Data Pruning
- Authors: Authors: Chongjun Tu, Peng Ye, Weihao Lin, Hancheng Ye, Chong Yu, Tao Chen, Baopu Li, Wanli Ouyang
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.14200
- Pdf link: https://arxiv.org/pdf/2312.14200
- Abstract Improving the efficiency of Neural Architecture Search (NAS) is a challenging but significant task that has received much attention. Previous works mainly adopted the Differentiable Architecture Search (DARTS) and improved its search strategies or modules to enhance search efficiency. Recently, some methods have started considering data reduction for speedup, but they are not tightly coupled with the architecture search process, resulting in sub-optimal performance. To this end, this work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization, and then proposes a novel Bi-level Data Pruning (BDP) paradigm that targets the weights and architecture levels of DARTS to enhance efficiency from a data perspective. Specifically, we introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric, to gradually prune unsuitable samples for DARTS during the search. An effective automatic class balance constraint is also integrated into BDP, to suppress potential class imbalances resulting from data-efficient algorithms. Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50% while achieving superior performance when applied to baseline DARTS. Besides, we demonstrate that BDP can harmoniously integrate with advanced DARTS variants, like PC-DARTS and \b{eta}-DARTS, offering an approximately 2 times speedup with minimal performance compromises.
Adversarial Infrared Curves: An Attack on Infrared Pedestrian Detectors in the Physical World
- Authors: Authors: Chengyin Hu, Weiwen Shi
- Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
- Arxiv link: https://arxiv.org/abs/2312.14217
- Pdf link: https://arxiv.org/pdf/2312.14217
- Abstract Deep neural network security is a persistent concern, with considerable research on visible light physical attacks but limited exploration in the infrared domain. Existing approaches, like white-box infrared attacks using bulb boards and QR suits, lack realism and stealthiness. Meanwhile, black-box methods with cold and hot patches often struggle to ensure robustness. To bridge these gaps, we propose Adversarial Infrared Curves (AdvIC). Using Particle Swarm Optimization, we optimize two Bezier curves and employ cold patches in the physical realm to introduce perturbations, creating infrared curve patterns for physical sample generation. Our extensive experiments confirm AdvIC's effectiveness, achieving 94.8% and 67.2% attack success rates for digital and physical attacks, respectively. Stealthiness is demonstrated through a comparative analysis, and robustness assessments reveal AdvIC's superiority over baseline methods. When deployed against diverse advanced detectors, AdvIC achieves an average attack success rate of 76.8%, emphasizing its robust nature. we explore adversarial defense strategies against AdvIC and examine its impact under various defense mechanisms. Given AdvIC's substantial security implications for real-world vision-based applications, urgent attention and mitigation efforts are warranted.
AutoAugment Input Transformation for Highly Transferable Targeted Attacks
- Authors: Authors: Haobo Lu, Xin Liu, Kun He
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.14218
- Pdf link: https://arxiv.org/pdf/2312.14218
- Abstract Deep Neural Networks (DNNs) are widely acknowledged to be susceptible to adversarial examples, wherein imperceptible perturbations are added to clean examples through diverse input transformation attacks. However, these methods originally designed for non-targeted attacks exhibit low success rates in targeted attacks. Recent targeted adversarial attacks mainly pay attention to gradient optimization, attempting to find the suitable perturbation direction. However, few of them are dedicated to input transformation.In this work, we observe a positive correlation between the logit/probability of the target class and diverse input transformation methods in targeted attacks. To this end, we propose a novel targeted adversarial attack called AutoAugment Input Transformation (AAIT). Instead of relying on hand-made strategies, AAIT searches for the optimal transformation policy from a transformation space comprising various operations. Then, AAIT crafts adversarial examples using the found optimal transformation policy to boost the adversarial transferability in targeted attacks. Extensive experiments conducted on CIFAR-10 and ImageNet-Compatible datasets demonstrate that the proposed AAIT surpasses other transfer-based targeted attacks significantly.
DCFL: Non-IID awareness Data Condensation aided Federated Learning
- Authors: Authors: Shaohan Sha, YaFeng Sun
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
- Arxiv link: https://arxiv.org/abs/2312.14219
- Pdf link: https://arxiv.org/pdf/2312.14219
- Abstract Federated learning is a decentralized learning paradigm wherein a central server trains a global model iteratively by utilizing clients who possess a certain amount of private datasets. The challenge lies in the fact that the client side private data may not be identically and independently distributed, significantly impacting the accuracy of the global model. Existing methods commonly address the Non-IID challenge by focusing on optimization, client selection and data complement. However, most approaches tend to overlook the perspective of the private data itself due to privacy constraints.Intuitively, statistical distinctions among private data on the client side can help mitigate the Non-IID degree. Besides, the recent advancements in dataset condensation technology have inspired us to investigate its potential applicability in addressing Non-IID issues while maintaining privacy. Motivated by this, we propose DCFL which divides clients into groups by using the Centered Kernel Alignment (CKA) method, then uses dataset condensation methods with non-IID awareness to complete clients. The private data from clients within the same group is complementary and their condensed data is accessible to all clients in the group. Additionally, CKA-guided client selection strategy, filtering mechanisms, and data enhancement techniques are incorporated to efficiently and precisely utilize the condensed data, enhance model performance, and minimize communication time. Experimental results demonstrate that DCFL achieves competitive performance on popular federated learning benchmarks including MNIST, FashionMNIST, SVHN, and CIFAR-10 with existing FL protocol.
Deep de Finetti: Recovering Topic Distributions from Large Language Models
- Authors: Authors: Liyi Zhang, R. Thomas McCoy, Theodore R. Sumers, Jian-Qiao Zhu, Thomas L. Griffiths
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2312.14226
- Pdf link: https://arxiv.org/pdf/2312.14226
- Abstract Large language models (LLMs) can produce long, coherent passages of text, suggesting that LLMs, although trained on next-word prediction, must represent the latent structure that characterizes a document. Prior work has found that internal representations of LLMs encode one aspect of latent structure, namely syntax; here we investigate a complementary aspect, namely the document's topic structure. We motivate the hypothesis that LLMs capture topic structure by connecting LLM optimization to implicit Bayesian inference. De Finetti's theorem shows that exchangeable probability distributions can be represented as a mixture with respect to a latent generating distribution. Although text is not exchangeable at the level of syntax, exchangeability is a reasonable starting assumption for topic structure. We thus hypothesize that predicting the next token in text will lead LLMs to recover latent topic distributions. We examine this hypothesis using Latent Dirichlet Allocation (LDA), an exchangeable probabilistic topic model, as a target, and we show that the representations formed by LLMs encode both the topics used to generate synthetic data and those used to explain natural corpus data.
Neural Spline Fields for Burst Image Fusion and Layer Separation
- Authors: Authors: Ilya Chugunov, David Shustin, Ruyu Yan, Chenyang Lei, Felix Heide
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.14235
- Pdf link: https://arxiv.org/pdf/2312.14235
- Abstract Each photo in an image burst can be considered a sample of a complex 3D scene: the product of parallax, diffuse and specular materials, scene motion, and illuminant variation. While decomposing all of these effects from a stack of misaligned images is a highly ill-conditioned task, the conventional align-and-merge burst pipeline takes the other extreme: blending them into a single image. In this work, we propose a versatile intermediate representation: a two-layer alpha-composited image plus flow model constructed with neural spline fields -- networks trained to map input coordinates to spline control points. Our method is able to, during test-time optimization, jointly fuse a burst image capture into one high-resolution reconstruction and decompose it into transmission and obstruction layers. Then, by discarding the obstruction layer, we can perform a range of tasks including seeing through occlusions, reflection suppression, and shadow removal. Validated on complex synthetic and in-the-wild captures we find that, with no post-processing steps or learned priors, our generalizable model is able to outperform existing dedicated single-image and multi-view obstruction removal approaches.
HElium: A Language and Compiler for Fully Homomorphic Encryption with Support for Proxy Re-Encryption
- Authors: Authors: Mirko Günther, Lars Schütze, Kilian Becher, Thorsten Strufe, Jeronimo Castrillon
- Subjects: Cryptography and Security (cs.CR); Programming Languages (cs.PL)
- Arxiv link: https://arxiv.org/abs/2312.14250
- Pdf link: https://arxiv.org/pdf/2312.14250
- Abstract Privacy-preserving analysis of confidential data can increase the value of such data and even improve peoples' lives. Fully homomorphic encryption (FHE) can enable privacy-preserving analysis. However, FHE adds a large amount of computational overhead and its efficient use requires a high level of expertise. Compilers can automate certain aspects such as parameterization and circuit optimizations. This in turn makes FHE accessible to non-cryptographers. Yet, multi-party scenarios remain complicated and exclude many promising use cases such as analyses of large amounts of health records for medical research. Proxy re-encryption (PRE), a technique that allows the conversion of data from multiple sources to a joint encryption key, can enable FHE for multi-party scenarios. Today, there are no optimizing compilers for FHE with PRE capabilities. We propose HElium, the first optimizing FHE compiler with native support for proxy re-encryption. HElium features HEDSL, a domain-specific language (DSL) specifically designed for multi-party scenarios. By tracking encryption keys and transforming the computation circuit during compilation, HElium minimizes the number of expensive PRE operations. We evaluate the effectiveness of HElium's optimizations based on the real-world use case of the tumor recurrence rate, a well-known subject of medical research. Our empirical evaluation shows that HElium substantially reduces the overhead introduced through complex PRE operations, an effect that increases for larger amounts of input data.
DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias Correction)
- Authors: Authors: Qiaoyue Tang, Frederick Shpilevskiy, Mathias Lécuyer
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2312.14334
- Pdf link: https://arxiv.org/pdf/2312.14334
- Abstract The Adam optimizer is a popular choice in contemporary deep learning, due to its strong empirical performance. However we observe that in privacy sensitive scenarios, the traditional use of Differential Privacy (DP) with the Adam optimizer leads to sub-optimal performance on several tasks. We find that this performance degradation is due to a DP bias in Adam's second moment estimator, introduced by the addition of independent noise in the gradient computation to enforce DP guarantees. This DP bias leads to a different scaling for low variance parameter updates, that is inconsistent with the behavior of non-private Adam. We propose DP-AdamBC, an optimization algorithm which removes the bias in the second moment estimation and retrieves the expected behaviour of Adam. Empirically, DP-AdamBC significantly improves the optimization performance of DP-Adam by up to 3.5% in final accuracy in image, text, and graph node classification tasks.
Constraint-Informed Learning for Warm Starting Trajectory Optimization
- Authors: Authors: Julia Briden, Changrak Choi, Kyongsik Yun, Richard Linares, Abhishek Cauligi
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2312.14336
- Pdf link: https://arxiv.org/pdf/2312.14336
- Abstract Future spacecraft and surface robotic missions require increasingly capable autonomy stacks for exploring challenging and unstructured domains and trajectory optimization will be a cornerstone of such autonomy stacks. However, the nonlinear optimization solvers required remain too slow for use on relatively resource constrained flight-grade computers. In this work, we turn towards amortized optimization, a learning-based technique for accelerating optimization run times, and present TOAST: Trajectory Optimization with Merit Function Warm Starts. Offline, using data collected from a simulation, we train a neural network to learn a mapping to the full primal and dual solutions given the problem parameters. Crucially, we build upon recent results from decision-focused learning and present a set of decision-focused loss functions using the notion of merit functions for optimization problems. We show that training networks with such constraint-informed losses can better encode the structure of the trajectory optimization problem and jointly learn to reconstruct the primal-dual solution while also yielding improved constraint satisfaction. Through numerical experiments on a Lunar rover problem, we demonstrate that TOAST outperforms benchmark approaches in terms of both computation times and network prediction constraint satisfaction.
Designing a Skilled Soccer Team for RoboCup: Exploring Skill-Set-Primitives through Reinforcement Learning
- Authors: Authors: Miguel Abreu, Luis Paulo Reis, Nuno Lau
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2312.14360
- Pdf link: https://arxiv.org/pdf/2312.14360
- Abstract The RoboCup 3D Soccer Simulation League serves as a competitive platform for showcasing innovation in autonomous humanoid robot agents through simulated soccer matches. Our team, FC Portugal, developed a new codebase from scratch in Python after RoboCup 2021. The team's performance is based on a set of skills centered around novel unifying primitives and a custom, symmetry-extended version of the Proximal Policy Optimization algorithm. Our methods have been thoroughly tested in official RoboCup matches, where FC Portugal has won the last two main competitions, in 2022 and 2023. This paper presents our training framework, as well as a timeline of skills developed using our skill-set-primitives, which considerably improve the sample efficiency and stability of skills, and motivate seamless transitions. We start with a significantly fast sprint-kick developed in 2021 and progress to the most recent skill set, which includes a multi-purpose omnidirectional walk, a dribble with unprecedented ball control, a solid kick, and a push skill. The push tackles both low-level collision-prone scenarios and high-level strategies to increase ball possession. We address the resource-intensive nature of this task through an innovative multi-agent learning approach. Finally, we release the codebase of our team to the RoboCup community, enabling other teams to transition to Python more easily and providing new teams with a robust and modern foundation upon which they can build new features.
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
- Authors: Authors: Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Yejin Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu
- Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Multimedia (cs.MM)
- Arxiv link: https://arxiv.org/abs/2312.14385
- Pdf link: https://arxiv.org/pdf/2312.14385
- Abstract As the development of large-scale Generative AI models evolve beyond text (1D) generation to include image (2D) and video (3D) generation, processing spatial and temporal information presents unique challenges to quality, performance, and efficiency. We present the first work towards understanding this new system design space for multi-modal text-to-image (TTI) and text-to-video (TTV) generation models. Current model architecture designs are bifurcated into 2 categories: Diffusion- and Transformer-based models. Our systematic performance characterization on a suite of eight representative TTI/TTV models shows that after state-of-the-art optimization techniques such as Flash Attention are applied, Convolution accounts for up to 44% of execution time for Diffusion-based TTI models, while Linear layers consume up to 49% of execution time for Transformer-based models. We additionally observe that Diffusion-based TTI models resemble the Prefill stage of LLM inference, and benefit from 1.1-2.5x greater speedup from Flash Attention than Transformer-based TTI models that resemble the Decode phase. Since optimizations designed for LLMs do not map directly onto TTI/TTV models, we must conduct a thorough characterization of these workloads to gain insights for new optimization opportunities. In doing so, we define sequence length in the context of TTI/TTV models and observe sequence length can vary up to 4x in Diffusion model inference. We additionally observe temporal aspects of TTV workloads pose unique system bottlenecks, with Temporal Attention accounting for over 60% of total Attention time. Overall, our in-depth system performance characterization is a critical first step towards designing efficient and deployable systems for emerging TTI/TTV workloads.
REBEL: A Regularization-Based Solution for Reward Overoptimization in Reinforcement Learning from Human Feedback
- Authors: Authors: Souradip Chakraborty, Amisha Bhaskar, Anukriti Singh, Pratap Tokekar, Dinesh Manocha, Amrit Singh Bedi
- Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.14436
- Pdf link: https://arxiv.org/pdf/2312.14436
- Abstract In this work, we propose REBEL, an algorithm for sample efficient reward regularization based robotic reinforcement learning from human feedback (RRLHF). Reinforcement learning (RL) performance for continuous control robotics tasks is sensitive to the underlying reward function. In practice, the reward function often ends up misaligned with human intent, values, social norms, etc., leading to catastrophic failures in the real world. We leverage human preferences to learn regularized reward functions and eventually align the agents with the true intended behavior. We introduce a novel notion of reward regularization to the existing RRLHF framework, which is termed as agent preferences. So, we not only consider human feedback in terms of preferences, we also propose to take into account the preference of the underlying RL agent while learning the reward function. We show that this helps to improve the over-optimization associated with the design of reward functions in RL. We experimentally show that REBEL exhibits up to 70% improvement in sample efficiency to achieve a similar level of episodic reward returns as compared to the state-of-the-art methods such as PEBBLE and PEBBLE+SURF.
Quantum-Assisted Joint Caching and Power Allocation for Integrated Satellite-Terrestrial Networks
- Authors: Authors: Yu Zhang, Yanmin Gong, Lei Fan, Yu Wang, Zhu Han, Yuanxiong Guo
- Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2312.14448
- Pdf link: https://arxiv.org/pdf/2312.14448
- Abstract Low earth orbit (LEO) satellite network can complement terrestrial networks for achieving global wireless coverage and improving delay-sensitive Internet services. This paper proposes an integrated satellite-terrestrial network (ISTN) architecture to provide ground users with seamless and reliable content delivery services. For optimal service provisioning in this architecture, we formulate an optimization model to maximize the network throughput by jointly optimizing content delivery policy, cache placement, and transmission power allocation. The resulting optimization model is a large-scale mixed-integer nonlinear program (MINLP) that is intractable for classical computer solvers. Inspired by quantum computing techniques, we propose a hybrid quantum-classical generalized Benders' decomposition (HQCGBD) algorithm to address this challenge. Specifically, we first exploit the generalized Benders' decomposition (GBD) to decompose the problem into a master problem and a subproblem and then leverage the state-of-art quantum annealer to solve the challenging master problem.
Dynamic Programming-based Approximate Optimal Control for Model-Based Reinforcement Learning
- Authors: Authors: Prakash Mallick, Zhiyong Chen
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2312.14463
- Pdf link: https://arxiv.org/pdf/2312.14463
- Abstract This article proposes an improved trajectory optimization approach for stochastic optimal control of dynamical systems affected by measurement noise by combining optimal control with maximum likelihood techniques to improve the reduction of the cumulative cost-to-go. A modified optimization objective function that incorporates dynamic programming-based controller design is presented to handle the noise in the system and sensors. Empirical results demonstrate the effectiveness of the approach in reducing stochasticity and allowing for an intermediate step to switch optimization that can allow an efficient balance of exploration and exploitation mechanism for complex tasks by constraining policy parameters to parameters obtained as a result of this improved optimization. This research study also includes theoretical work on the uniqueness of control parameter estimates and also leverages a structure of the likelihood function which has an established theoretical guarantees. Furthermore, a theoretical result is also explored that bridge the gap between the proposed optimization objective function and existing information theory (relative entropy) and optimal control dualities.
Adaptive Differential Evolution with Diversification: Addressing Optimization Challenges
- Authors: Authors: Sarit Maitra
- Subjects: Neural and Evolutionary Computing (cs.NE); Performance (cs.PF)
- Arxiv link: https://arxiv.org/abs/2312.14464
- Pdf link: https://arxiv.org/pdf/2312.14464
- Abstract The existing variants of the Differential Evolution (DE) algorithm come with certain limitations, such as poor local search and susceptibility to premature convergence. This study introduces Adaptive Differential Evolution with Diversification (ADED), a method that dynamically modifies the neighborhood structure by evaluating the trial solutions' fitness. Developed to work with both convex and nonconvex objective functions, ADED is validated with 22 benchmark functions, including Rosenbrock, Rastrigin, Ackley, and DeVilliers-Glasser02. The development is carried out in Google Cloud using Jupyter Notebook and Python v3.10.12, with additional testing conducted on the multi-objective benchmark ZDT test suite. ADED distinguishes itself with its adaptive and diverse approach, which includes adaptive mutation and crossover-rates, diverse mutation tactics, diversification measurements, local search mechanisms, and convergence monitoring. The unique combination of these features collectively enhances ADED's effectiveness in navigating complex and diverse landscapes, positioning it as a promising tool for addressing challenges in both single- and multi-objective optimization scenarios.
Adaptive Reconvergence-driven AIG Rewriting via Strategy Learning
- Authors: Authors: Liwei Ni, Zonglin Yang, Jiaxi Zhang, Junfeng Liu, Huawei Li, Biwei Xie, Xinquan Li
- Subjects: Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
- Arxiv link: https://arxiv.org/abs/2312.14536
- Pdf link: https://arxiv.org/pdf/2312.14536
- Abstract Rewriting is a common procedure in logic synthesis aimed at improving the performance, power, and area (PPA) of circuits. The traditional reconvergence-driven And-Inverter Graph (AIG) rewriting method focuses solely on optimizing the reconvergence cone through Boolean algebra minimization. However, there exist opportunities to incorporate other node-rewriting algorithms that are better suited for specific cones. In this paper, we propose an adaptive reconvergence-driven AIG rewriting algorithm that combines two key techniques: multi-strategy-based AIG rewriting and strategy learning-based algorithm selection. The multi-strategy-based rewriting method expands upon the traditional approach by incorporating support for multi-node-rewriting algorithms, thus expanding the optimization space. Additionally, the strategy learning-based algorithm selection method determines the most suitable node-rewriting algorithm for a given cone. Experimental results demonstrate that our proposed method yields a significant average improvement of 5.567% in size and 5.327% in depth.
Online Covering with Multiple Experts
- Authors: Authors: Enikő Kevi, Kim-Thang Nguyen
- Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.14564
- Pdf link: https://arxiv.org/pdf/2312.14564
- Abstract Designing online algorithms with machine learning predictions is a recent technique beyond the worst-case paradigm for various practically relevant online problems (scheduling, caching, clustering, ski rental, etc.). While most previous learning-augmented algorithm approaches focus on integrating the predictions of a single oracle, we study the design of online algorithms with \emph{multiple} experts. To go beyond the popular benchmark of a static best expert in hindsight, we propose a new \emph{dynamic} benchmark (linear combinations of predictions that change over time). We present a competitive algorithm in the new dynamic benchmark with a performance guarantee of $O(\log K)$, where $K$ is the number of experts, for $0-1$ online optimization problems. Furthermore, our multiple-expert approach provides a new perspective on how to combine in an online manner several online algorithms - a long-standing central subject in the online algorithm research community.
Structure-Guided Automated Reasoning
- Authors: Authors: Max Bannach, Markus Hecher
- Subjects: Logic in Computer Science (cs.LO); Computational Complexity (cs.CC)
- Arxiv link: https://arxiv.org/abs/2312.14620
- Pdf link: https://arxiv.org/pdf/2312.14620
- Abstract Algorithmic meta-theorems state that problems that can be formalized in a fixed logic can be solved efficiently on classes of structures with certain properties. A prominent example is Courcelle's Theorem, which states that all problems expressible in monadic second-order logic can be solved efficiently on structures of small treewidth. Such theorems are usually proven by a generic algorithm for the model-checking problem of the given logic, which is often complex and rarely leads to highly efficient solutions. Alternatively, we can solve the model-checking problem by grounding the given logic to propositional logic, for which dedicated solvers are available. Such encodings will, however, usually not preserve the input's treewidth. This paper investigates whether all problems definable in monadic second-order logic can efficiently be encoded into SAT such that the input's treewidth bounds the treewidth of the resulting formula. We answer this in the affirmative and, hence, provide an alternative proof of Courcelle's Theorem. Our technique can naturally be extended: There are treewidth-aware reductions from the optimization version of Courcelle's Theorem to MaxSAT and from the counting version of the theorem to #SAT. By using encodings to SAT, we obtain, ignoring polynomial factors, the same running time for the model-checking problem as we would with dedicated algorithms. We complement our upper bounds with new lower bounds based on ETH; and we show that the block size of the input's formula and the treewidth of the input's structure are tightly linked. We also provide matching upper and lower bounds for a fragment of guarded MSO, only using SAT-based techniques.
A Mathematical Guide to Operator Learning
- Authors: Authors: Nicolas Boullé, Alex Townsend
- Subjects: Numerical Analysis (math.NA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.14688
- Pdf link: https://arxiv.org/pdf/2312.14688
- Abstract Operator learning aims to discover properties of an underlying dynamical system or partial differential equation (PDE) from data. Here, we present a step-by-step guide to operator learning. We explain the types of problems and PDEs amenable to operator learning, discuss various neural network architectures, and explain how to employ numerical PDE solvers effectively. We also give advice on how to create and manage training data and conduct optimization. We offer intuition behind the various neural network architectures employed in operator learning by motivating them from the point-of-view of numerical linear algebra.
Inverse Transfer Multiobjective Optimization
- Authors: Authors: Jiao Liu, Abhishek Gupta, Yew-Soon Ong
- Subjects: Neural and Evolutionary Computing (cs.NE)
- Arxiv link: https://arxiv.org/abs/2312.14713
- Pdf link: https://arxiv.org/pdf/2312.14713
- Abstract Transfer optimization enables data-efficient optimization of a target task by leveraging experiential priors from related source tasks. This is especially useful in multiobjective optimization settings where a set of trade-off solutions is sought under tight evaluation budgets. In this paper, we introduce a novel concept of inverse transfer in multiobjective optimization. Inverse transfer stands out by employing probabilistic inverse models to map performance vectors in the objective space to population search distributions in task-specific decision space, facilitating knowledge transfer through objective space unification. Building upon this idea, we introduce the first Inverse Transfer Multiobjective Evolutionary Optimizer (invTrEMO). A key highlight of invTrEMO is its ability to harness the common objective functions prevalent in many application areas, even when decision spaces do not precisely align between tasks. This allows invTrEMO to uniquely and effectively utilize information from heterogeneous source tasks as well. Furthermore, invTrEMO yields high-precision inverse models as a significant byproduct, enabling the generation of tailored solutions on-demand based on user preferences. Empirical studies on multi- and many-objective benchmark problems, as well as a practical case study, showcase the faster convergence rate and modelling accuracy of the invTrEMO relative to state-of-the-art evolutionary and Bayesian optimization algorithms. The source code of the invTrEMO is made available at https://github.com/LiuJ-2023/invTrEMO.
Enhancing Text-to-SQL Translation for Financial System Design
- Authors: Authors: Yewei Song, Saad Ezzini, Xunzhu Tang, Cedric Lothritz, Jacques Klein, Tegawendé Bissyandé, Andrey Boytsov, Ulrick Ble, Anne Goujon
- Subjects: Software Engineering (cs.SE)
- Arxiv link: https://arxiv.org/abs/2312.14725
- Pdf link: https://arxiv.org/pdf/2312.14725
- Abstract Text-to-SQL, the task of translating natural language questions into SQL queries, is part of various business processes. Its automation, which is an emerging challenge, will empower software practitioners to seamlessly interact with relational databases using natural language, thereby bridging the gap between business needs and software capabilities. In this paper, we consider Large Language Models (LLMs), which have achieved state of the art for various NLP tasks. Specifically, we benchmark Text-to-SQL performance, the evaluation methodologies, as well as input optimization (e.g., prompting). In light of the empirical observations that we have made, we propose two novel metrics that were designed to adequately measure the similarity between SQL queries. Overall, we share with the community various findings, notably on how to select the right LLM on Text-to-SQL tasks. We further demonstrate that a tree-based edit distance constitutes a reliable metric for assessing the similarity between generated SQL queries and the oracle for benchmarking Text2SQL approaches. This metric is important as it relieves researchers from the need to perform computationally expensive experiments such as executing generated queries as done in prior works. Our work implements financial domain use cases and, therefore contributes to the advancement of Text2SQL systems and their practical adoption in this domain.
ESBMC v7.4: Harnessing the Power of Intervals
- Authors: Authors: Rafael Menezes, Mohannad Aldughaim, Bruno Farias, Xianzhiyu Li, Edoardo Manino, Fedor Shmarov, Kunjian Song, Franz Brauße, Mikhail R. Gadelha, Norbert Tihanyi, Konstantin Korovin, Lucas C. Cordeiro
- Subjects: Software Engineering (cs.SE)
- Arxiv link: https://arxiv.org/abs/2312.14746
- Pdf link: https://arxiv.org/pdf/2312.14746
- Abstract ESBMC implements many state-of-the-art techniques for model checking. We report on new and improved features that allow us to obtain verification results for previously unsupported programs and properties. ESBMC employs a new static interval analysis of expressions in programs to increase verification performance. This includes interval-based reasoning over booleans and integers, forward and backward contractors, and particular optimizations related to singleton intervals because of their ubiquity. Other relevant improvements concern the verification of concurrent programs, as well as several operational models, internal ones, and also those of libraries such as pthread and the C mathematics library. An extended memory safety analysis now allows tracking of memory leaks that are considered still reachable.
Enhanced Latent Multi-view Subspace Clustering
- Authors: Authors: Long Shi, Lei Cao, Jun Wang, Badong Chen
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.14763
- Pdf link: https://arxiv.org/pdf/2312.14763
- Abstract Latent multi-view subspace clustering has been demonstrated to have desirable clustering performance. However, the original latent representation method vertically concatenates the data matrices from multiple views into a single matrix along the direction of dimensionality to recover the latent representation matrix, which may result in an incomplete information recovery. To fully recover the latent space representation, we in this paper propose an Enhanced Latent Multi-view Subspace Clustering (ELMSC) method. The ELMSC method involves constructing an augmented data matrix that enhances the representation of multi-view data. Specifically, we stack the data matrices from various views into the block-diagonal locations of the augmented matrix to exploit the complementary information. Meanwhile, the non-block-diagonal entries are composed based on the similarity between different views to capture the consistent information. In addition, we enforce a sparse regularization for the non-diagonal blocks of the augmented self-representation matrix to avoid redundant calculations of consistency information. Finally, a novel iterative algorithm based on the framework of Alternating Direction Method of Multipliers (ADMM) is developed to solve the optimization problem for ELMSC. Extensive experiments on real-world datasets demonstrate that our proposed ELMSC is able to achieve higher clustering performance than some state-of-art multi-view clustering methods.
EMF-Constrained Artificial Noise for Secrecy Rates with Stochastic Eavesdropper Channels
- Authors: Authors: Stefan Roth, Aydin Sezgin
- Subjects: Information Theory (cs.IT)
- Arxiv link: https://arxiv.org/abs/2312.14791
- Pdf link: https://arxiv.org/pdf/2312.14791
- Abstract An information-theoretic confidential communication is achievable if the eavesdropper has a degraded channel compared to the legitimate receiver. In wireless channels, beamforming and artificial noise can enable such confidentiality. However, only distribution knowledge of the eavesdropper channels can be assumed. Moreover, the transmission of artificial noise can lead to an increased electromagnetic field (EMF) exposure, which depends on the considered location and can thus also be seen as a random variable. Hence, we optimize the $\varepsilon$-outage secrecy rate under a $\delta$-outage exposure constraint in a setup, where the base station (BS) is communicating to a user equipment (UE), while a single-antenna eavesdropper with Rayleigh distributed channels is present. Therefore, we calculate the secrecy outage probability (SOP) in closed-form. Based on this, we convexify the optimization problem and optimize the $\varepsilon$-outage secrecy rate iteratively. Numerical results show that for a moderate exposure constraint, artificial noise from the BS has a relatively large impact due to beamforming, while for a strict exposure constraint artificial noise from the UE is more important.
Accelerating Bayesian Optimal Experimental Design with Derivative-Informed Neural Operators
- Authors: Authors: Jinwoo Go, Peng Chen
- Subjects: Computational Engineering, Finance, and Science (cs.CE); Optimization and Control (math.OC); Methodology (stat.ME)
- Arxiv link: https://arxiv.org/abs/2312.14810
- Pdf link: https://arxiv.org/pdf/2312.14810
- Abstract We consider optimal experimental design (OED) for nonlinear Bayesian inverse problems governed by large-scale partial differential equations (PDEs). For the optimality criteria of Bayesian OED, we consider both expected information gain and summary statistics including the trace and determinant of the information matrix that involves the evaluation of the parameter-to-observable (PtO) map and its derivatives. However, it is prohibitive to compute and optimize these criteria when the PDEs are very expensive to solve, the parameters to estimate are high-dimensional, and the optimization problem is combinatorial, high-dimensional, and non-convex. To address these challenges, we develop an accurate, scalable, and efficient computational framework to accelerate the solution of Bayesian OED. In particular, the framework is developed based on derivative-informed neural operator (DINO) surrogates with proper dimension reduction techniques and a modified swapping greedy algorithm. We demonstrate the high accuracy of the DINO surrogates in the computation of the PtO map and the optimality criteria compared to high-fidelity finite element approximations. We also show that the proposed method is scalable with increasing parameter dimensions. Moreover, we demonstrate that it achieves high efficiency with over 1000X speedup compared to a high-fidelity Bayesian OED solution for a three-dimensional PDE example with tens of thousands of parameters, including both online evaluation and offline construction costs of the surrogates.
Learning Lagrangian Multipliers for the Travelling Salesman Problem
- Authors: Authors: Augustin Parjadis, Quentin Cappart, Bistra Dilkina, Aaron Ferber, Louis-Martin Rousseau
- Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2312.14836
- Pdf link: https://arxiv.org/pdf/2312.14836
- Abstract Lagrangian relaxation is a versatile mathematical technique employed to relax constraints in an optimization problem, enabling the generation of dual bounds to prove the optimality of feasible solutions and the design of efficient propagators in constraint programming (such as the weighted circuit constraint). However, the conventional process of deriving Lagrangian multipliers (e.g., using subgradient methods) is often computationally intensive, limiting its practicality for large-scale or time-sensitive problems. To address this challenge, we propose an innovative unsupervised learning approach that harnesses the capabilities of graph neural networks to exploit the problem structure, aiming to generate accurate Lagrangian multipliers efficiently. We apply this technique to the well-known Held-Karp Lagrangian relaxation for the travelling salesman problem. The core idea is to predict accurate Lagrangian multipliers and to employ them as a warm start for generating Held-Karp relaxation bounds. These bounds are subsequently utilized to enhance the filtering process carried out by branch-and-bound algorithms. In contrast to much of the existing literature, which primarily focuses on finding feasible solutions, our approach operates on the dual side, demonstrating that learning can also accelerate the proof of optimality. We conduct experiments across various distributions of the metric travelling salesman problem, considering instances with up to 200 cities. The results illustrate that our approach can improve the filtering level of the weighted circuit global constraint, reduce the optimality gap by a factor two for unsolved instances up to a timeout, and reduce the execution time for solved instances by 10%.
Automating the Design of Multigrid Methods with Evolutionary Program Synthesis
- Authors: Authors: Jonas Schmitt
- Subjects: Numerical Analysis (math.NA); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Formal Languages and Automata Theory (cs.FL); Neural and Evolutionary Computing (cs.NE)
- Arxiv link: https://arxiv.org/abs/2312.14875
- Pdf link: https://arxiv.org/pdf/2312.14875
- Abstract Many of the most fundamental laws of nature can be formulated as partial differential equations (PDEs). Understanding these equations is, therefore, of exceptional importance for many branches of modern science and engineering. However, since the general solution of many PDEs is unknown, the efficient approximate solution of these equations is one of humanity's greatest challenges. While multigrid represents one of the most effective methods for solving PDEs numerically, in many cases, the design of an efficient or at least working multigrid solver is an open problem. This thesis demonstrates that grammar-guided genetic programming, an evolutionary program synthesis technique, can discover multigrid methods of unprecedented structure that achieve a high degree of efficiency and generalization. For this purpose, we develop a novel context-free grammar that enables the automated generation of multigrid methods in a symbolically-manipulable formal language, based on which we can apply the same multigrid-based solver to problems of different sizes without having to adapt its internal structure. Treating the automated design of an efficient multigrid method as a program synthesis task allows us to find novel sequences of multigrid operations, including the combination of different smoothing and coarse-grid correction steps on each level of the discretization hierarchy. To prove the feasibility of this approach, we present its implementation in the form of the Python framework EvoStencils, which is freely available as open-source software. This implementation comprises all steps from representing the algorithmic sequence of a multigrid method in the form of a directed acyclic graph of Python objects to its automatic generation and optimization using the capabilities of the code generation framework ExaStencils and the evolutionary computation library DEAP.
Strong anti-Hebbian plasticity alters the convexity of network attractor landscapes
- Authors: Authors: Lulu Gong, Xudong Chen, ShiNung Ching
- Subjects: Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY); Dynamical Systems (math.DS); Neurons and Cognition (q-bio.NC)
- Arxiv link: https://arxiv.org/abs/2312.14896
- Pdf link: https://arxiv.org/pdf/2312.14896
- Abstract In this paper, we study recurrent neural networks in the presence of pairwise learning rules. We are specifically interested in how the attractor landscapes of such networks become altered as a function of the strength and nature (Hebbian vs. anti-Hebbian) of learning, which may have a bearing on the ability of such rules to mediate large-scale optimization problems. Through formal analysis, we show that a transition from Hebbian to anti-Hebbian learning brings about a pitchfork bifurcation that destroys convexity in the network attractor landscape. In larger-scale settings, this implies that anti-Hebbian plasticity will bring about multiple stable equilibria, and such effects may be outsized at interconnection or `choke' points. Furthermore, attractor landscapes are more sensitive to slower learning rates than faster ones. These results provide insight into the types of objective functions that can be encoded via different pairwise plasticity rules.
Keyword: adam
DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias Correction)
- Authors: Authors: Qiaoyue Tang, Frederick Shpilevskiy, Mathias Lécuyer
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2312.14334
- Pdf link: https://arxiv.org/pdf/2312.14334
- Abstract The Adam optimizer is a popular choice in contemporary deep learning, due to its strong empirical performance. However we observe that in privacy sensitive scenarios, the traditional use of Differential Privacy (DP) with the Adam optimizer leads to sub-optimal performance on several tasks. We find that this performance degradation is due to a DP bias in Adam's second moment estimator, introduced by the addition of independent noise in the gradient computation to enforce DP guarantees. This DP bias leads to a different scaling for low variance parameter updates, that is inconsistent with the behavior of non-private Adam. We propose DP-AdamBC, an optimization algorithm which removes the bias in the second moment estimation and retrieves the expected behaviour of Adam. Empirically, DP-AdamBC significantly improves the optimization performance of DP-Adam by up to 3.5% in final accuracy in image, text, and graph node classification tasks.
Keyword: gradient
On Early Detection of Hallucinations in Factual Question Answering
- Authors: Authors: Ben Snyder, Marius Moisescu, Muhammad Bilal Zafar
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.14183
- Pdf link: https://arxiv.org/pdf/2312.14183
- Abstract While large language models (LLMs) have taken great strides towards helping humans with a plethora of tasks like search and summarization, hallucinations remain a major impediment towards gaining user trust. The fluency and coherence of model generations even when hallucinating makes it difficult to detect whether or not a model is hallucinating. In this work, we explore if the artifacts associated with the model generations can provide hints that the generation will contain hallucinations. Specifically, we probe LLMs at 1) the inputs via Integrated Gradients based token attribution, 2) the outputs via the Softmax probabilities, and 3) the internal state via self-attention and fully-connected layer activations for signs of hallucinations on open-ended question answering tasks. Our results show that the distributions of these artifacts differ between hallucinated and non-hallucinated generations. Building on this insight, we train binary classifiers that use these artifacts as input features to classify model generations into hallucinations and non-hallucinations. These hallucination classifiers achieve up to 0.80 AUROC. We further show that tokens preceding a hallucination can predict the subsequent hallucination before it occurs.
AutoAugment Input Transformation for Highly Transferable Targeted Attacks
- Authors: Authors: Haobo Lu, Xin Liu, Kun He
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.14218
- Pdf link: https://arxiv.org/pdf/2312.14218
- Abstract Deep Neural Networks (DNNs) are widely acknowledged to be susceptible to adversarial examples, wherein imperceptible perturbations are added to clean examples through diverse input transformation attacks. However, these methods originally designed for non-targeted attacks exhibit low success rates in targeted attacks. Recent targeted adversarial attacks mainly pay attention to gradient optimization, attempting to find the suitable perturbation direction. However, few of them are dedicated to input transformation.In this work, we observe a positive correlation between the logit/probability of the target class and diverse input transformation methods in targeted attacks. To this end, we propose a novel targeted adversarial attack called AutoAugment Input Transformation (AAIT). Instead of relying on hand-made strategies, AAIT searches for the optimal transformation policy from a transformation space comprising various operations. Then, AAIT crafts adversarial examples using the found optimal transformation policy to boost the adversarial transferability in targeted attacks. Extensive experiments conducted on CIFAR-10 and ImageNet-Compatible datasets demonstrate that the proposed AAIT surpasses other transfer-based targeted attacks significantly.
DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias Correction)
- Authors: Authors: Qiaoyue Tang, Frederick Shpilevskiy, Mathias Lécuyer
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2312.14334
- Pdf link: https://arxiv.org/pdf/2312.14334
- Abstract The Adam optimizer is a popular choice in contemporary deep learning, due to its strong empirical performance. However we observe that in privacy sensitive scenarios, the traditional use of Differential Privacy (DP) with the Adam optimizer leads to sub-optimal performance on several tasks. We find that this performance degradation is due to a DP bias in Adam's second moment estimator, introduced by the addition of independent noise in the gradient computation to enforce DP guarantees. This DP bias leads to a different scaling for low variance parameter updates, that is inconsistent with the behavior of non-private Adam. We propose DP-AdamBC, an optimization algorithm which removes the bias in the second moment estimation and retrieves the expected behaviour of Adam. Empirically, DP-AdamBC significantly improves the optimization performance of DP-Adam by up to 3.5% in final accuracy in image, text, and graph node classification tasks.
Training Neural Networks with Internal State, Unconstrained Connectivity, and Discrete Activations
- Authors: Authors: Alexander Grushin
- Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
- Arxiv link: https://arxiv.org/abs/2312.14359
- Pdf link: https://arxiv.org/pdf/2312.14359
- Abstract Today's most powerful machine learning approaches are typically designed to train stateless architectures with predefined layers and differentiable activation functions. While these approaches have led to unprecedented successes in areas such as natural language processing and image recognition, the trained models are also susceptible to making mistakes that a human would not. In this paper, we take the view that true intelligence may require the ability of a machine learning model to manage internal state, but that we have not yet discovered the most effective algorithms for training such models. We further postulate that such algorithms might not necessarily be based on gradient descent over a deep architecture, but rather, might work best with an architecture that has discrete activations and few initial topological constraints (such as multiple predefined layers). We present one attempt in our ongoing efforts to design such a training algorithm, applied to an architecture with binary activations and only a single matrix of weights, and show that it is able to form useful representations of natural language text, but is also limited in its ability to leverage large quantities of training data. We then provide ideas for improving the algorithm and for designing other training algorithms for similar architectures. Finally, we discuss potential benefits that could be gained if an effective training algorithm is found, and suggest experiments for evaluating whether these benefits exist in practice.
Federated Learning with Projected Trajectory Regularization
- Authors: Authors: Tiejin Chen, Yuanpu Cao, Yujia Wang, Cho-Jui Hsieh, Jinghui Chen
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.14380
- Pdf link: https://arxiv.org/pdf/2312.14380
- Abstract Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients, which leads to deteriorated model training performances. Prior works in this line of research mainly focus on utilizing last-step global model parameters/gradients or the linear combinations of the past model parameters/gradients, which do not fully exploit the potential of global information from the model training trajectory. In this paper, we propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data heterogeneity issue, which proposes a unique way to better extract the essential global information from the model training trajectory. Specifically, FedPTR allows local clients or the server to optimize an auxiliary (synthetic) dataset that mimics the learning dynamics of the recent model update and utilizes it to project the next-step model trajectory for local training regularization. We conduct rigorous theoretical analysis for our proposed framework under nonconvex stochastic settings to verify its fast convergence under heterogeneous data distributions. Experiments on various benchmark datasets and non-i.i.d. settings validate the effectiveness of our proposed framework.
A Generalized Shuffle Framework for Privacy Amplification: Strengthening Privacy Guarantees and Enhancing Utility
- Authors: Authors: E Chen, Yang Cao, Yifei Ge
- Subjects: Cryptography and Security (cs.CR); Combinatorics (math.CO)
- Arxiv link: https://arxiv.org/abs/2312.14388
- Pdf link: https://arxiv.org/pdf/2312.14388
- Abstract The shuffle model of local differential privacy is an advanced method of privacy amplification designed to enhance privacy protection with high utility. It achieves this by randomly shuffling sensitive data, making linking individual data points to specific individuals more challenging. However, most existing studies have focused on the shuffle model based on $(\epsilon_0,0)$-Locally Differentially Private (LDP) randomizers, with limited consideration for complex scenarios such as $(\epsilon_0,\delta_0)$-LDP or personalized LDP (PLDP). This hinders a comprehensive understanding of the shuffle model's potential and limits its application in various settings. To bridge this research gap, we propose a generalized shuffle framework that can be applied to any $(\epsilon_i,\delta_i)$-PLDP setting with personalized privacy parameters. This generalization allows for a broader exploration of the privacy-utility trade-off and facilitates the design of privacy-preserving analyses in diverse contexts. We prove that shuffled $(\epsilon_i,\delta_i)$-PLDP process approximately preserves $\mu$-Gaussian Differential Privacy with \mu = \sqrt{\frac{2}{\sum_{i=1}^{n} \frac{1-\delta_i}{1+e^{\epsilon_i}}-\max_{i}{\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}}}}. $ This approach allows us to avoid the limitations and potential inaccuracies associated with inequality estimations. To strengthen the privacy guarantee, we improve the lower bound by utilizing hypothesis testing} instead of relying on rough estimations like the Chernoff bound or Hoeffding's inequality. Furthermore, extensive comparative evaluations clearly show that our approach outperforms existing methods in achieving strong central privacy guarantees while preserving the utility of the global model. We have also carefully designed corresponding algorithms for average function, frequency estimation, and stochastic gradient descent.
GROOD: GRadient-aware Out-Of-Distribution detection in interpolated manifolds
- Authors: Authors: Mostafa ElAraby, Sabyasachi Sahoo, Yann Pequignot, Paul Novello, Liam Paull
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.14427
- Pdf link: https://arxiv.org/pdf/2312.14427
- Abstract Deep neural networks (DNNs) often fail silently with over-confident predictions on out-of-distribution (OOD) samples, posing risks in real-world deployments. Existing techniques predominantly emphasize either the feature representation space or the gradient norms computed with respect to DNN parameters, yet they overlook the intricate gradient distribution and the topology of classification regions. To address this gap, we introduce GRadient-aware Out-Of-Distribution detection in interpolated manifolds (GROOD), a novel framework that relies on the discriminative power of gradient space to distinguish between in-distribution (ID) and OOD samples. To build this space, GROOD relies on class prototypes together with a prototype that specifically captures OOD characteristics. Uniquely, our approach incorporates a targeted mix-up operation at an early intermediate layer of the DNN to refine the separation of gradient spaces between ID and OOD samples. We quantify OOD detection efficacy using the distance to the nearest neighbor gradients derived from the training set, yielding a robust OOD score. Experimental evaluations substantiate that the introduction of targeted input mix-upamplifies the separation between ID and OOD in the gradient space, yielding impressive results across diverse datasets. Notably, when benchmarked against ImageNet-1k, GROOD surpasses the established robustness of state-of-the-art baselines. Through this work, we establish the utility of leveraging gradient spaces and class prototypes for enhanced OOD detection for DNN in image classification.
Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks
- Authors: Authors: Haz Sameen Shahgir, Xianghao Kong, Greg Ver Steeg, Yue Dong
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2312.14440
- Pdf link: https://arxiv.org/pdf/2312.14440
- Abstract The widespread use of Text-to-Image (T2I) models in content generation requires careful examination of their safety, including their robustness to adversarial attacks. Despite extensive research into this, the reasons for their effectiveness are underexplored. This paper presents an empirical study on adversarial attacks against T2I models, focusing on analyzing factors associated with attack success rates (ASRs). We introduce a new attack objective - entity swapping using adversarial suffixes and two gradient-based attack algorithms. Human and automatic evaluations reveal the asymmetric nature of ASRs on entity swap: for example, it is easier to replace "human" with "robot" in the prompt "a human dancing in the rain." with an adversarial suffix but is significantly harder in reverse. We further propose probing metrics to establish indicative signals from the model's beliefs to the adversarial ASR. We identify conditions resulting in a 60% success probability for adversarial attacks and others where this likelihood drops below 5%.
Hutchinson Trace Estimation for High-Dimensional and High-Order Physics-Informed Neural Networks
- Authors: Authors: Zheyuan Hu, Zekun Shi, George Em Karniadakis, Kenji Kawaguchi
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS); Numerical Analysis (math.NA); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2312.14499
- Pdf link: https://arxiv.org/pdf/2312.14499
- Abstract Physics-Informed Neural Networks (PINNs) have proven effective in solving partial differential equations (PDEs), especially when some data are available by blending seamlessly data and physics. However, extending PINNs to high-dimensional and even high-order PDEs encounters significant challenges due to the computational cost associated with automatic differentiation in the residual loss. Herein, we address the limitations of PINNs in handling high-dimensional and high-order PDEs by introducing Hutchinson Trace Estimation (HTE). Starting with the second-order high-dimensional PDEs ubiquitous in scientific computing, HTE transforms the calculation of the entire Hessian matrix into a Hessian vector product (HVP). This approach alleviates the computational bottleneck via Taylor-mode automatic differentiation and significantly reduces memory consumption from the Hessian matrix to HVP. We further showcase HTE's convergence to the original PINN loss and its unbiased behavior under specific conditions. Comparisons with Stochastic Dimension Gradient Descent (SDGD) highlight the distinct advantages of HTE, particularly in scenarios with significant variance among dimensions. We further extend HTE to higher-order and higher-dimensional PDEs, specifically addressing the biharmonic equation. By employing tensor-vector products (TVP), HTE efficiently computes the colossal tensor associated with the fourth-order high-dimensional biharmonic equation, saving memory and enabling rapid computation. The effectiveness of HTE is illustrated through experimental setups, demonstrating comparable convergence rates with SDGD under memory and speed constraints. Additionally, HTE proves valuable in accelerating the Gradient-Enhanced PINN (gPINN) version as well as the Biharmonic equation. Overall, HTE opens up a new capability in scientific machine learning for tackling high-order and high-dimensional PDEs.
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise
- Authors: Authors: Rui Pan, Yuxing Liu, Xiaoyu Wang, Tong Zhang
- Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2312.14567
- Pdf link: https://arxiv.org/pdf/2312.14567
- Abstract Heavy-ball momentum with decaying learning rates is widely used with SGD for optimizing deep learning models. In contrast to its empirical popularity, the understanding of its theoretical property is still quite limited, especially under the standard anisotropic gradient noise condition for quadratic regression problems. Although it is widely conjectured that heavy-ball momentum method can provide accelerated convergence and should work well in large batch settings, there is no rigorous theoretical analysis. In this paper, we fill this theoretical gap by establishing a non-asymptotic convergence bound for stochastic heavy-ball methods with step decay scheduler on quadratic objectives, under the anisotropic gradient noise condition. As a direct implication, we show that heavy-ball momentum can provide $\tilde{\mathcal{O}}(\sqrt{\kappa})$ accelerated convergence of the bias term of SGD while still achieving near-optimal convergence rate with respect to the stochastic variance term. The combined effect implies an overall convergence rate within log factors from the statistical minimax rate. This means SGD with heavy-ball momentum is useful in the large-batch settings such as distributed machine learning or federated learning, where a smaller number of iterations can significantly reduce the number of communication rounds, leading to acceleration in practice.
Explainable Multi-Camera 3D Object Detection with Transformer-Based Saliency Maps
- Authors: Authors: Till Beemelmanns, Wassim Zahr, Lutz Eckstein
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.14606
- Pdf link: https://arxiv.org/pdf/2312.14606
- Abstract Vision Transformers (ViTs) have achieved state-of-the-art results on various computer vision tasks, including 3D object detection. However, their end-to-end implementation also makes ViTs less explainable, which can be a challenge for deploying them in safety-critical applications, such as autonomous driving, where it is important for authorities, developers, and users to understand the model's reasoning behind its predictions. In this paper, we propose a novel method for generating saliency maps for a DetR-like ViT with multiple camera inputs used for 3D object detection. Our method is based on the raw attention and is more efficient than gradient-based methods. We evaluate the proposed method on the nuScenes dataset using extensive perturbation tests and show that it outperforms other explainability methods in terms of visual quality and quantitative metrics. We also demonstrate the importance of aggregating attention across different layers of the transformer. Our work contributes to the development of explainable AI for ViTs, which can help increase trust in AI applications by establishing more transparency regarding the inner workings of AI models.
Resilient asynchronous primal Schur method
- Authors: Authors: Guillaume Gbikpi-Benissan, Frédéric Magoulès
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2312.14715
- Pdf link: https://arxiv.org/pdf/2312.14715
- Abstract This paper introduces the application of the asynchronous iterations theory within the framework of the primal Schur domain decomposition method. A suitable relaxation scheme is designed, which asynchronous convergence is established under classical spectral radius conditions. For the usual case where the local Schur complement matrices are not constructed, suitable splittings only based on explicitly generated matrices are provided. Numerical experiments are conducted on a supercomputer for both Poisson's and linear elasticity problems. The asynchronous Schur solver outperformed the classical conjugate-gradient-based one in case of compute node failures.
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models
- Authors: Authors: Alan Chan, Ben Bucknall, Herbie Bradley, David Krueger
- Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
- Arxiv link: https://arxiv.org/abs/2312.14751
- Pdf link: https://arxiv.org/pdf/2312.14751
- Abstract Public release of the weights of pretrained foundation models, otherwise known as downloadable access \citep{solaiman_gradient_2023}, enables fine-tuning without the prohibitive expense of pretraining. Our work argues that increasingly accessible fine-tuning of downloadable models may increase hazards. First, we highlight research to improve the accessibility of fine-tuning. We split our discussion into research that A) reduces the computational cost of fine-tuning and B) improves the ability to share that cost across more actors. Second, we argue that increasingly accessible fine-tuning methods may increase hazard through facilitating malicious use and making oversight of models with potentially dangerous capabilities more difficult. Third, we discuss potential mitigatory measures, as well as benefits of more accessible fine-tuning. Given substantial remaining uncertainty about hazards, we conclude by emphasizing the urgent need for the development of mitigations.
Learning Lagrangian Multipliers for the Travelling Salesman Problem
- Authors: Authors: Augustin Parjadis, Quentin Cappart, Bistra Dilkina, Aaron Ferber, Louis-Martin Rousseau
- Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2312.14836
- Pdf link: https://arxiv.org/pdf/2312.14836
- Abstract Lagrangian relaxation is a versatile mathematical technique employed to relax constraints in an optimization problem, enabling the generation of dual bounds to prove the optimality of feasible solutions and the design of efficient propagators in constraint programming (such as the weighted circuit constraint). However, the conventional process of deriving Lagrangian multipliers (e.g., using subgradient methods) is often computationally intensive, limiting its practicality for large-scale or time-sensitive problems. To address this challenge, we propose an innovative unsupervised learning approach that harnesses the capabilities of graph neural networks to exploit the problem structure, aiming to generate accurate Lagrangian multipliers efficiently. We apply this technique to the well-known Held-Karp Lagrangian relaxation for the travelling salesman problem. The core idea is to predict accurate Lagrangian multipliers and to employ them as a warm start for generating Held-Karp relaxation bounds. These bounds are subsequently utilized to enhance the filtering process carried out by branch-and-bound algorithms. In contrast to much of the existing literature, which primarily focuses on finding feasible solutions, our approach operates on the dual side, demonstrating that learning can also accelerate the proof of optimality. We conduct experiments across various distributions of the metric travelling salesman problem, considering instances with up to 200 cities. The results illustrate that our approach can improve the filtering level of the weighted circuit global constraint, reduce the optimality gap by a factor two for unsolved instances up to a timeout, and reduce the execution time for solved instances by 10%.
Keyword: super-resolution
There is no result