arxiv-updates
arxiv-updates copied to clipboard
New submissions for Tue, 14 Nov 23
Keyword: sgd
Inference and Interference: The Role of Clipping, Pruning and Loss Landscapes in Differentially Private Stochastic Gradient Descent
- Authors: Authors: Lauren Watson, Eric Gan, Mohan Dantam, Baharan Mirzasoleiman, Rik Sarkar
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2311.06839
- Pdf link: https://arxiv.org/pdf/2311.06839
- Abstract Differentially private stochastic gradient descent (DP-SGD) is known to have poorer training and test performance on large neural networks, compared to ordinary stochastic gradient descent (SGD). In this paper, we perform a detailed study and comparison of the two processes and unveil several new insights. By comparing the behavior of the two processes separately in early and late epochs, we find that while DP-SGD makes slower progress in early stages, it is the behavior in the later stages that determines the end result. This separate analysis of the clipping and noise addition steps of DP-SGD shows that while noise introduces errors to the process, gradient descent can recover from these errors when it is not clipped, and clipping appears to have a larger impact than noise. These effects are amplified in higher dimensions (large neural networks), where the loss basin occupies a lower dimensional space. We argue theoretically and using extensive experiments that magnitude pruning can be a suitable dimension reduction technique in this regard, and find that heavy pruning can improve the test accuracy of DPSGD.
DAGC: Data-Volume-Aware Adaptive Sparsification Gradient Compression for Distributed Machine Learning in Mobile Computing
- Authors: Authors: Rongwei Lu, Yutong Jiang, Yinan Mao, Chen Tang, Bin Chen, Laizhong Cui, Zhi Wang
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.07324
- Pdf link: https://arxiv.org/pdf/2311.07324
- Abstract Distributed machine learning (DML) in mobile environments faces significant communication bottlenecks. Gradient compression has emerged as an effective solution to this issue, offering substantial benefits in environments with limited bandwidth and metered data. Yet, they encounter severe performance drop in non-IID environments due to a one-size-fits-all compression approach, which does not account for the varying data volumes across workers. Assigning varying compression ratios to workers with distinct data distributions and volumes is thus a promising solution. This study introduces an analysis of distributed SGD with non-uniform compression, which reveals that the convergence rate (indicative of the iterations needed to achieve a certain accuracy) is influenced by compression ratios applied to workers with differing volumes. Accordingly, we frame relative compression ratio assignment as an $n$-variables chi-square nonlinear optimization problem, constrained by a fixed and limited communication budget. We propose DAGC-R, which assigns the worker handling larger data volumes the conservative compression. Recognizing the computational limitations of mobile devices, we DAGC-A, which are computationally less demanding and enhances the robustness of the absolute gradient compressor in non-IID scenarios. Our experiments confirm that both the DAGC-A and DAGC-R can achieve better performance when dealing with highly imbalanced data volume distribution and restricted communication.
Keyword: optimization
A novel method of restoration path optimization for the AC-DC bulk power grid after a major blackout
- Authors: Authors: Chao Yang, Gaoshen Liang, Tianle Cheng, Yang Li, Shaoyan Li
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.06279
- Pdf link: https://arxiv.org/pdf/2311.06279
- Abstract The restoration control of the modern alternating current-direct current (AC-DC) hybrid power grid after a major blackout is difficult and complex. Taking into account the interaction between the line-commutated converter high-voltage direct current (LCC-HVDC) and the AC power grid, this paper proposes a novel optimization method of restoration path to reconfigure the skeleton network for the blackout power grid. Based on the system strength, the supporting capability of the AC power grid for the LCC-HVDC is first analysed from the aspects of start-up and operation of LCC-HVDCs. Subsequently, the quantitative relationship between the restoration path and the restoration characteristic of LCC-HVDC is derived in detail based on the system strength indices of the short-circuit capacity and the frequency regulation capability. Then, an optimization model of restoration path considering non-tree paths is formulated and a feasible optimization algorithm is proposed to achieve the optimal path restoration scheme. A modified IEEE 39-bus system and a partial power grid of Southwest China are simulated to show that the proposed method is suitable for the restoration of AC-DC power grids and can improve restoration efficiency. This research can be an important guidance for operators to rapidly restore the AC-DC power grid.
Data-driven Spatio-Temporal Scaling of Travel Times for AMoD Simulations
- Authors: Authors: Arslan Ali Syed, Yunfei Zhang, Klaus Bogenberger
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.06291
- Pdf link: https://arxiv.org/pdf/2311.06291
- Abstract With the widespread adoption of mobility-on-demand (MoD) services and the advancements in autonomous vehicle (AV) technology, the research interest into the AVs based MoD (AMoD) services has grown immensely. Often agent-based simulation frameworks are used to study the AMoD services using the trip data of current Taxi or MoD services. For reliable results of AMoD simulations, a realistic city network and travel times play a crucial part. However, many times the researchers do not have access to the actual network state corresponding to the trip data used for AMoD simulations reducing the reliability of results. Therefore, this paper introduces a spatio-temporal optimization strategy for scaling the link-level network travel times using the simulated trip data without additional data sources on the network state. The method is tested on the widely used New York City (NYC) Taxi data and shows that the travel times produced using the scaled network are very close to the recorded travel times in the original data. Additionally, the paper studies the performance differences of AMoD simulation when the scaled network is used. The results indicate that realistic travel times can significantly impact AMoD simulation outcomes.
Can Machine Learning Uncover Insights into Vehicle Travel Demand from Our Built Environment?
- Authors: Authors: Zixun Huang, Hao Zheng
- Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
- Arxiv link: https://arxiv.org/abs/2311.06321
- Pdf link: https://arxiv.org/pdf/2311.06321
- Abstract In this paper, we propose a machine learning-based approach to address the lack of ability for designers to optimize urban land use planning from the perspective of vehicle travel demand. Research shows that our computational model can help designers quickly obtain feedback on the vehicle travel demand, which includes its total amount and temporal distribution based on the urban function distribution designed by the designers. It also assists in design optimization and evaluation of the urban function distribution from the perspective of vehicle travel. We obtain the city function distribution information and vehicle hours traveled (VHT) information by collecting the city point-of-interest (POI) data and online vehicle data. The artificial neural networks (ANNs) with the best performance in prediction are selected. By using data sets collected in different regions for mutual prediction and remapping the predictions onto a map for visualization, we evaluate the extent to which the computational model sees use across regions in an attempt to reduce the workload of future urban researchers. Finally, we demonstrate the application of the computational model to help designers obtain feedback on vehicle travel demand in the built environment and combine it with genetic algorithms to optimize the current state of the urban environment to provide recommendations to designers.
Relation Extraction in underexplored biomedical domains: A diversity-optimised sampling and synthetic data generation approach
- Authors: Authors: Maxime Delmas, Magdalena Wysocka, André Freitas
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2311.06364
- Pdf link: https://arxiv.org/pdf/2311.06364
- Abstract The sparsity of labelled data is an obstacle to the development of Relation Extraction models and the completion of databases in various biomedical areas. While being of high interest in drug-discovery, the natural-products literature, reporting the identification of potential bioactive compounds from organisms, is a concrete example of such an overlooked topic. To mark the start of this new task, we created the first curated evaluation dataset and extracted literature items from the LOTUS database to build training sets. To this end, we developed a new sampler inspired by diversity metrics in ecology, named Greedy Maximum Entropy sampler, or GME-sampler (https://github.com/idiap/gme-sampler). The strategic optimization of both balance and diversity of the selected items in the evaluation set is important given the resource-intensive nature of manual curation. After quantifying the noise in the training set, in the form of discrepancies between the input abstracts text and the expected output labels, we explored different strategies accordingly. Framing the task as an end-to-end Relation Extraction, we evaluated the performance of standard fine-tuning as a generative task and few-shot learning with open Large Language Models (LLaMA 7B-65B). In addition to their evaluation in few-shot settings, we explore the potential of open Large Language Models (Vicuna-13B) as synthetic data generator and propose a new workflow for this purpose. All evaluated models exhibited substantial improvements when fine-tuned on synthetic abstracts rather than the original noisy data. We provide our best performing (f1-score=59.0) BioGPT-Large model for end-to-end RE of natural-products relationships along with all the generated synthetic data and the evaluation dataset. See more details at https://github.com/idiap/abroad-re.
Flatness-aware Adversarial Attack
- Authors: Authors: Mingyuan Fan, Xiaodan Li, Cen Chen, Yinggui Wang
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2311.06423
- Pdf link: https://arxiv.org/pdf/2311.06423
- Abstract The transferability of adversarial examples can be exploited to launch black-box attacks. However, adversarial examples often present poor transferability. To alleviate this issue, by observing that the diversity of inputs can boost transferability, input regularization based methods are proposed, which craft adversarial examples by combining several transformed inputs. We reveal that input regularization based methods make resultant adversarial examples biased towards flat extreme regions. Inspired by this, we propose an attack called flatness-aware adversarial attack (FAA) which explicitly adds a flatness-aware regularization term in the optimization target to promote the resultant adversarial examples towards flat extreme regions. The flatness-aware regularization term involves gradients of samples around the resultant adversarial examples but optimizing gradients requires the evaluation of Hessian matrix in high-dimension spaces which generally is intractable. To address the problem, we derive an approximate solution to circumvent the construction of Hessian matrix, thereby making FAA practical and cheap. Extensive experiments show the transferability of adversarial examples crafted by FAA can be considerably boosted compared with state-of-the-art baselines.
Electronic Communication Data Link Encryption Simulation Based on Wireless Communication
- Authors: Authors: Rulin Bai
- Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2311.06462
- Pdf link: https://arxiv.org/pdf/2311.06462
- Abstract In order to improve the simulation effect of electronic communication data link encryption, the author proposes a solution based on wireless communication. The main content of this technology is based on the research of wireless communication, improve the elliptic curve cryptographic algorithm to build a system encryption model, obtain legal and valid node private keys, evaluate and analyze the relevant security attributes of the system, verify the security of the keys, and realize the encryption optimization of wireless network communication. Experimental results show that: Using the improved elliptic curve to simulate the system data chain encryption under the certificateless public key cryptosystem in network communication, the time is only 2.31 milliseconds, which is lower than other algorithms. Conclusion: It is proved that the technology research based on wireless communication can effectively improve the encryption simulation effect of electronic communication data link.
Nonsmooth-Optimization-Based Bandwidth Optimal Control for Precision Motion Systems
- Authors: Authors: Jingjie Wu, Lei Zhou
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.06491
- Pdf link: https://arxiv.org/pdf/2311.06491
- Abstract Precision motion systems are at the core of various manufacturing equipment. The rapidly increasing demand for higher productivity necessitates higher control bandwidth in the motion systems to effectively reject disturbances while maintaining excellent positioning accuracy. However, most existing optimal control methods do not explicitly optimize for control bandwidth, and the classic loop-shaping method suffers from conservative designs and fails to address cross-couplings, which motivates the development of new control solutions for bandwidth optimization. This paper proposes a novel bandwidth optimal control formulation based on nonsmooth optimization for precision motion systems. Our proposed method explicitly optimizes the system's MIMO control bandwidth while constraining the H-infinity norm of the closed-loop sensitivity function for robustness. A nonsmooth optimization solver, GRANSO, is used to solve the proposed program, and an augmented quadratic programming (QP)--based descent direction search is proposed to facilitate convergence. Simulation evaluations show that the bandwidth optimal control method can achieve a 23% higher control bandwidth than conventional loop-shaping design, and the QP-based descent direction search can reduce iteration number by 60%, which illustrates the effectiveness and efficiency of the proposed approach.
An Intelligent Social Learning-based Optimization Strategy for Black-box Robotic Control with Reinforcement Learning
- Authors: Authors: Xubo Yang, Jian Gao, Ting Wang, Yaozhen He
- Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2311.06576
- Pdf link: https://arxiv.org/pdf/2311.06576
- Abstract Implementing intelligent control of robots is a difficult task, especially when dealing with complex black-box systems, because of the lack of visibility and understanding of how these robots work internally. This paper proposes an Intelligent Social Learning (ISL) algorithm to enable intelligent control of black-box robotic systems. Inspired by mutual learning among individuals in human social groups, ISL includes learning, imitation, and self-study styles. Individuals in the learning style use the Levy flight search strategy to learn from the best performer and form the closest relationships. In the imitation style, individuals mimic the best performer with a second-level rapport by employing a random perturbation strategy. In the self-study style, individuals learn independently using a normal distribution sampling method while maintaining a distant relationship with the best performer. Individuals in the population are regarded as autonomous intelligent agents in each style. Neural networks perform strategic actions in three styles to interact with the environment and the robot and iteratively optimize the network policy. Overall, ISL builds on the principles of intelligent optimization, incorporating ideas from reinforcement learning, and possesses strong search capabilities, fast computation speed, fewer hyperparameters, and insensitivity to sparse rewards. The proposed ISL algorithm is compared with four state-of-the-art methods on six continuous control benchmark cases in MuJoCo to verify its effectiveness and advantages. Furthermore, ISL is adopted in the simulation and experimental grasping tasks of the UR3 robot for validations, and satisfactory solutions are yielded.
Hub-Based Platoon Formation: Optimal Release Policies and Approximate Solutions
- Authors: Authors: Alexander Johansson, Ehsan Nekouei, Xiaotong Sun, Karl Henrik Johansson, Jonas Mårtensson
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.06604
- Pdf link: https://arxiv.org/pdf/2311.06604
- Abstract This paper studies the optimal hub-based platoon formation at hubs along a highway under decentralized, distributed, and centralized policies. Hubs are locations along highways where trucks can wait for other trucks to form platoons. A coordinator at each hub decides the departure time of trucks, and the released trucks from the hub will form platoons. The problem is cast as an optimization problem where the objective is to maximize the platooning reward. We first show that the optimal release policy in the decentralized case, where the hubs do not exchange information, is to release all trucks at the hub when the number of trucks exceeds a threshold computed by dynamic programming. We develop efficient approximate release policies for the dependent arrival case using this result. To study the value of information exchange among hubs on platoon formation, we next study the distributed and centralized platoon formation policies which require information exchange among hubs. To this end, we develop receding horizon solutions for the distributed and centralized platoon formation at hubs using the dynamic programming technique. Finally, we perform a simulation study over three hubs in northern Sweden. The profits of the decentralized policies are shown to be approximately 3.5% lower than the distributed policy and 8% lower than the centralized release policy. This observation suggests that decentralized policies are prominent solutions for hub-based platooning as they do not require information exchange among hubs and can achieve a similar performance compared with distributed and centralized policies.
Metric Optimization and Mainstream Bias Mitigation in Recommender Systems
- Authors: Authors: Roger Zhe Li
- Subjects: Information Retrieval (cs.IR)
- Arxiv link: https://arxiv.org/abs/2311.06689
- Pdf link: https://arxiv.org/pdf/2311.06689
- Abstract The first part of this thesis focuses on maximizing the overall recommendation accuracy. This accuracy is usually evaluated with some user-oriented metric tailored to the recommendation scenario, but because recommendation is usually treated as a machine learning problem, recommendation models are trained to maximize some other generic criteria that does not necessarily align with the criteria ultimately captured by the user-oriented evaluation metric. Recent research aims at bridging this gap between training and evaluation via direct ranking optimization, but still assumes that the metric used for evaluation should also be the metric used for training. We challenge this assumption, mainly because some metrics are more informative than others. Indeed, we show that models trained via the optimization of a loss inspired by Rank-Biased Precision (RBP) tend to yield higher accuracy, even when accuracy is measured with metrics other than RBP. However, the superiority of this RBP-inspired loss stems from further benefiting users who are already well-served, rather than helping those who are not. This observation inspires the second part of this thesis, where our focus turns to helping non-mainstream users. These are users who are difficult to recommend to either because there is not enough data to model them, or because they have niche taste and thus few similar users to look at when recommending in a collaborative way. These differences in mainstreamness introduce a bias reflected in an accuracy gap between users or user groups, which we try to narrow.
Equal Incremental Cost-Based Optimization Method to Enhance Efficiency for IPOP-Type Converters
- Authors: Authors: Hanfeng Cai, Haiyang Liu, Heyang Sun, Qiao Wang
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.06705
- Pdf link: https://arxiv.org/pdf/2311.06705
- Abstract Systematic optimization over a wide power range is often achieved through the combination of modules of different power levels. This paper addresses the issue of enhancing the efficiency of a multiple module system connected in parallel during operation and proposes an algorithm based on equal incremental cost for dynamic load allocation. Initially, a polynomial fitting technique is employed to fit efficiency test points for individual modules. Subsequently, the equal incremental cost-based optimization is utilized to formulate an efficiency optimization and allocation scheme for the multi-module system. A simulated annealing algorithm is applied to determine the optimal power output strategy for each module at given total power flow requirement. Finally, a dual active bridge (DAB) experimental prototype with two input-parallel-output-parallel (IPOP) configurations is constructed to validate the effectiveness of the proposed strategy. Experimental results demonstrate that under the 800W operating condition, the approach in this paper achieves an efficiency improvement of over 0.74% by comparison with equal power sharing between both modules.
Learning Predictive Safety Filter via Decomposition of Robust Invariant Set
- Authors: Authors: Zeyang Li, Chuxiong Hu, Weiye Zhao, Changliu Liu
- Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.06769
- Pdf link: https://arxiv.org/pdf/2311.06769
- Abstract Ensuring safety of nonlinear systems under model uncertainty and external disturbances is crucial, especially for real-world control tasks. Predictive methods such as robust model predictive control (RMPC) require solving nonconvex optimization problems online, which leads to high computational burden and poor scalability. Reinforcement learning (RL) works well with complex systems, but pays the price of losing rigorous safety guarantee. This paper presents a theoretical framework that bridges the advantages of both RMPC and RL to synthesize safety filters for nonlinear systems with state- and action-dependent uncertainty. We decompose the robust invariant set (RIS) into two parts: a target set that aligns with terminal region design of RMPC, and a reach-avoid set that accounts for the rest of RIS. We propose a policy iteration approach for robust reach-avoid problems and establish its monotone convergence. This method sets the stage for an adversarial actor-critic deep RL algorithm, which simultaneously synthesizes a reach-avoid policy network, a disturbance policy network, and a reach-avoid value network. The learned reach-avoid policy network is utilized to generate nominal trajectories for online verification, which filters potentially unsafe actions that may drive the system into unsafe regions when worst-case disturbances are applied. We formulate a second-order cone programming (SOCP) approach for online verification using system level synthesis, which optimizes for the worst-case reach-avoid value of any possible trajectories. The proposed safety filter requires much lower computational complexity than RMPC and still enjoys persistent robust safety guarantee. The effectiveness of our method is illustrated through a numerical example.
Data-Driven Moving Horizon Estimation Using Bayesian Optimization
- Authors: Authors: Qing Sun, Shuai Niu, Minrui Fei
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.06787
- Pdf link: https://arxiv.org/pdf/2311.06787
- Abstract In this work, an innovative data-driven moving horizon state estimation is proposed for model dynamic-unknown systems based on Bayesian optimization. As long as the measurement data is received, a locally linear dynamics model can be obtained from one Bayesian optimization-based offline learning framework. Herein, the learned model is continuously updated iteratively based on the actual observed data to approximate the actual system dynamic with the intent of minimizing the cost function of the moving horizon estimator until the desired performance is achieved. Meanwhile, the characteristics of Bayesian optimization can guarantee the closest approximation of the learned model to the actual system dynamic. Thus, one effective data-driven moving horizon estimator can be designed further on the basis of this learned model. Finally, the efficiency of the proposed state estimation algorithm is demonstrated by several numerical simulations.
Energy-efficient Beamforming for RISs-aided Communications: Gradient Based Meta Learning
- Authors: Authors: Xinquan Wang, Fenghao Zhu, Qianyun Zhou, Qihao Yu, Chongwen Huang, Ahmed Alhammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.06861
- Pdf link: https://arxiv.org/pdf/2311.06861
- Abstract Reconfigurable intelligent surfaces (RISs) have become a promising technology to meet the requirements of energy efficiency and scalability in future six-generation (6G) communications. However, a significant challenge in RISs-aided communications is the joint optimization of active and passive beamforming at base stations (BSs) and RISs respectively. Specifically, the main difficulty is attributed to the highly non-convex optimization space of beamforming matrices at both BSs and RISs, as well as the diversity and mobility of communication scenarios. To address this, we present a greenly gradient based meta learning beamforming (GMLB) approach. Unlike traditional deep learning based methods which take channel information directly as input, GMLB feeds the gradient of sum rate into neural networks. Coherently, we design a differential regulator to address the phase shift optimization of RISs. Moreover, we use the meta learning to iteratively optimize the beamforming matrices of BSs and RISs. These techniques make the proposed method to work well without requiring energy-consuming pre-training. Simulations show that GMLB could achieve higher sum rate than that of typical alternating optimization algorithms with the energy consumption by two orders of magnitude less.
Distributed Sequential Receding Horizon Control of Multi-Agent Systems under Recurring Signal Temporal Logic
- Authors: Authors: Eleftherios E. Vlahakis, Lars Lindemann, Dimos V. Dimarogonas
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.06890
- Pdf link: https://arxiv.org/pdf/2311.06890
- Abstract We consider the synthesis problem of a multi-agent system under Signal Temporal Logic (STL) specifications representing bounded-time tasks that need to be satisfied recurrently over an infinite horizon. Motivated by the limited approaches to handling recurring STL systematically, we tackle the infinite-horizon control problem with a receding horizon scheme equipped with additional STL constraints that introduce minimal complexity and a backward-reachability-based terminal condition that is straightforward to construct and ensures recursive feasibility. Subsequently, assuming a separable performance index, we decompose the global receding horizon optimization problem defined at the multi-agent level into local programs at the individual-agent level the objective of which is to minimize the local cost function subject to local and joint STL constraints. We propose a scheduling policy that allows individual agents to sequentially optimize their control actions while maintaining recursive feasibility. This results in a distributed strategy that can operate online as a model predictive controller. Last, we illustrate the effectiveness of our method via a multi-agent system example assigned a surveillance task.
Symbol-Error Probability Constrained Power Minimization for Reconfigurable Intelligent Surfaces-based Passive Transmitter
- Authors: Authors: Erico S. P. Lopes, Lukas T. N. Landau
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.06900
- Pdf link: https://arxiv.org/pdf/2311.06900
- Abstract This study considers a virtual multiuser multiple-input multiple-output system with PSK modulation realized via the reconfigurable intelligent surface-based passive transmitter setup. Under this framework, the study derives the formulation for the union-bound symbol-error probability, which is an upper bound on the actual symbol-error probability. Based on this, a symbol-level precoding power minimization problem under the condition that the union-bound symbol-error probability is below a given requirement is proposed. The problem is formulated as a constrained optimization on an oblique manifold, and solved via a bisection method. The method consists of successively optimizing transmit power while evaluating the feasibility of the union-bound symbol-error probability requisite by solving, via the Riemannian conjugate gradient algorithm, an auxiliary problem dependent only on the reflection coefficients of the reconfigurable intelligent surface elements. Numerical results demonstrate the effectiveness of the proposed approach in minimizing the transmit power for different symbol-error probability requirements.
Robust Regression over Averaged Uncertainty
- Authors: Authors: Dimitris Bertsimas, Yu Ma
- Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2311.06960
- Pdf link: https://arxiv.org/pdf/2311.06960
- Abstract We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least-squared regression problem. We show that this formulation surprisingly recovers ridge regression and establishes the missing link between robust optimization and the mean squared error approaches for existing regression problems. We first prove the equivalence for four uncertainty sets: ellipsoidal, box, diamond, and budget, and provide closed-form formulations of the penalty term as a function of the sample size, feature size, as well as perturbation protection strength. We then show in synthetic datasets with different levels of perturbations, a consistent improvement of the averaged formulation over the existing worst-case formulation in out-of-sample performance. Importantly, as the perturbation level increases, the improvement increases, confirming our method's advantage in high-noise environments. We report similar improvements in the out-of-sample datasets in real-world regression problems obtained from UCI datasets.
An Expandable Machine Learning-Optimization Framework to Sequential Decision-Making
- Authors: Authors: Dogacan Yilmaz, İ. Esra Büyüktahtakın
- Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2311.06972
- Pdf link: https://arxiv.org/pdf/2311.06972
- Abstract We present an integrated prediction-optimization (PredOpt) framework to efficiently solve sequential decision-making problems by predicting the values of binary decision variables in an optimal solution. We address the key issues of sequential dependence, infeasibility, and generalization in machine learning (ML) to make predictions for optimal solutions to combinatorial problems. The sequential nature of the combinatorial optimization problems considered is captured with recurrent neural networks and a sliding-attention window. We integrate an attention-based encoder-decoder neural network architecture with an infeasibility-elimination and generalization framework to learn high-quality feasible solutions to time-dependent optimization problems. In this framework, the required level of predictions is optimized to eliminate the infeasibility of the ML predictions. These predictions are then fixed in mixed-integer programming (MIP) problems to solve them quickly with the aid of a commercial solver. We demonstrate our approach to tackling the two well-known dynamic NP-Hard optimization problems: multi-item capacitated lot-sizing (MCLSP) and multi-dimensional knapsack (MSMK). Our results show that models trained on shorter and smaller-dimensional instances can be successfully used to predict longer and larger-dimensional problems. The solution time can be reduced by three orders of magnitude with an average optimality gap below 0.1%. We compare PredOpt with various specially designed heuristics and show that our framework outperforms them. PredOpt can be advantageous for solving dynamic MIP problems that need to be solved instantly and repetitively.
Managing Large Enclaves in a Data Center
- Authors: Authors: Sandeep Kumar, Smruti R. Sarangi
- Subjects: Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2311.06991
- Pdf link: https://arxiv.org/pdf/2311.06991
- Abstract Live migration of an application or VM is a well-known technique for load balancing, performance optimization, and resource management. To minimize the total downtime during migration, two popular methods -- pre-copy or post-copy -- are used in practice. These methods scale to large VMs and applications since the downtime is independent of the memory footprint of an application. However, in a secure, trusted execution environment (TEE) like Intel's scalable SGX, the state-of-the-art still uses the decade-old stop-and-copy method, where the total downtime is proportional to the application's memory footprint. This is primarily due to the fact that TEEs like Intel SGX do not expose memory and page table accesses to the OS, quite unlike unsecure applications. However, with modern TEE solutions that efficiently support large applications, such as Intel's Scalable SGX and AMD's Epyc, it is high time that TEE migration methods also evolve to enable live migration of large TEE applications with minimal downtime (stop-and-copy cannot be used any more). We present OptMig, an end-to-end solution for live migrating large memory footprints in TEE-enabled applications. Our approach does not require a developer to modify the application; however, we need a short, separate compilation pass and specialized software library support. Our optimizations reduce the total downtime by 98% for a representative microbenchmark that uses 20GB of secure memory and by 90 -- 96% for a suite of Intel SGX applications that have multi-GB memory footprints.
Improved Task Scheduling for Virtual Machines in the Cloud based on the Gravitational Search Algorithm
- Authors: Authors: Basilis Mamalis, Marios Perlitis
- Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
- Arxiv link: https://arxiv.org/abs/2311.07004
- Pdf link: https://arxiv.org/pdf/2311.07004
- Abstract The rapid and convenient provision of the available computing resources is a crucial requirement in modern cloud computing environments. However, if only the execution time is taken into account when the resources are scheduled, it could lead to imbalanced workloads as well as to significant under-utilisation of the involved Virtual Machines (VMs). In the present work a novel task scheduling scheme is introduced, which is based on the proper adaptation of a modern and quite effective evolutionary optimization method, the Gravitational Search Algorithm (GSA). The proposed scheme aims at optimizing the entire scheduling procedure, in terms of both the tasks execution time and the system (VMs) resource utilisation. Moreover, the fitness function was properly selected considering both the above factors in an appropriately weighted function in order to obtain better results for large inputs. Sufficient simulation experiments show the efficiency of the proposed scheme, as well as its excellence over related approaches of the bibliography, with similar objectives.
Embarassingly Simple Dataset Distillation
- Authors: Authors: Feng Yunzhen, Vedantam Ramakrishna, Kempe Julia
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.07025
- Pdf link: https://arxiv.org/pdf/2311.07025
- Abstract Dataset distillation extracts a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample. In this work, we tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. Re-examining the foundational back-propagation through time method, we study the pronounced variance in the gradients, computational burden, and long-term dependencies. We introduce an improved method: Random Truncated Backpropagation Through Time (RaT-BPTT) to address them. RaT-BPTT incorporates a truncation coupled with a random window, effectively stabilizing the gradients and speeding up the optimization while covering long dependencies. This allows us to establish new state-of-the-art for a variety of standard dataset benchmarks. A deeper dive into the nature of distilled data unveils pronounced intercorrelation. In particular, subsets of distilled datasets tend to exhibit much worse performance than directly distilled smaller datasets of the same size. Leveraging RaT-BPTT, we devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets.
PROPANE: Prompt design as an inverse problem
- Authors: Authors: Rimon Melamed, Lucas H. McCabe, Tanay Wakhare, Yejin Kim, H. Howie Huang, Enric Boix-Adsera
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2311.07064
- Pdf link: https://arxiv.org/pdf/2311.07064
- Abstract Carefully-designed prompts are key to inducing desired behavior in Large Language Models (LLMs). As a result, great effort has been dedicated to engineering prompts that guide LLMs toward particular behaviors. In this work, we propose an automatic prompt optimization framework, PROPANE, which aims to find a prompt that induces semantically similar outputs to a fixed set of examples without user intervention. We further demonstrate that PROPANE can be used to (a) improve existing prompts, and (b) discover semantically obfuscated prompts that transfer between models.
Sensing Mutual Information with Random Signals in Gaussian Channels
- Authors: Authors: Lei Xie, Fan Liu, Zhanyuan Xie, Zheng Jiang, Shenghui Song
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.07081
- Pdf link: https://arxiv.org/pdf/2311.07081
- Abstract Sensing performance is typically evaluated by classical metrics, such as Cramer-Rao bound and signal-to-clutter-plus-noise ratio. The recent development of the integrated sensing and communication (ISAC) framework motivated the efforts to unify the metric for sensing and communication, where researchers have proposed to utilize mutual information (MI) to measure the sensing performance with deterministic signals. However, the need to communicate in ISAC systems necessitates the use of random signals for sensing applications and the closed-form evaluation for the sensing mutual information (SMI) with random signals is not yet available in the literature. This paper investigates the achievable performance and precoder design for sensing applications with random signals. For that purpose, we first derive the closed-form expression for the SMI with random signals by utilizing random matrix theory. The result reveals some interesting physical insights regarding the relation between the SMI with deterministic and random signals. The derived SMI is then utilized to optimize the precoder by leveraging a manifold-based optimization approach. The effectiveness of the proposed methods is validated by simulation results.
Optimal Configuration of Reconfigurable Intelligent Surfaces with Arbitrary Discrete Phase Shifts
- Authors: Authors: Seyedkhashayar Hashemi, Hai Jiang, Masoud Ardakani
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.07096
- Pdf link: https://arxiv.org/pdf/2311.07096
- Abstract We address the reflection optimization problem for a reconfigurable intelligent surface (RIS), where the RIS elements feature a set of non-uniformly spaced discrete phase shifts. This is motivated by the actual behavior of practical RIS elements, where it is shown that a uniform phase shift assumption is not realistic. A problem is formulated to find the optimal refection amplitudes and reflection phase shifts of the RIS elements such that the channel capacity of the target user is maximized. We first prove that in the optimal configuration, each RIS element is either turned off or operates at maximum amplitude. We then develop a method that finds the optimal reflection amplitudes and phases with complexity linear in the number of RIS elements. Some new and interesting insight into the reflection optimization problem is also provided.
Secure Wireless Communication via Movable-Antenna Array
- Authors: Authors: Guojie Hu, Qingqing Wu, Kui Xu, Jiangbo Si, Naofal Al-Dhahir
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.07104
- Pdf link: https://arxiv.org/pdf/2311.07104
- Abstract Movable antenna (MA) array is a novel technology recently developed where positions of transmit/receive antennas can be flexibly adjusted in the specified region to reconfigure the wireless channel and achieve a higher capacity. In this letter, we, for the first time, investigate the MA array-assisted physical-layer security where the confidential information is transmitted from a MA array-enabled Alice to a single-antenna Bob, in the presence of multiple single-antenna and colluding eavesdroppers. We aim to maximize the achievable secrecy rate by jointly designing the transmit beamforming and positions of all antennas at Alice subject to the transmit power budget and specified regions for positions of all transmit antennas. The resulting problem is highly non-convex, for which the projected gradient ascent (PGA) and the alternating optimization methods are utilized to obtain a high-quality suboptimal solution. Simulation results demonstrate that since the additional spatial degree of freedom (DoF) can be fully exploited, the MA array significantly enhances the secrecy rate compared to the conventional fixed-position antenna (FPA) array.
A Survey of Source Code Search: A 3-Dimensional Perspective
- Authors: Authors: Weisong Sun, Chunrong Fang, Yifei Ge, Yuling Hu, Yuchen Chen, Quanjun Zhang, Xiuting Ge, Yang Liu, Zhenyu Chen
- Subjects: Software Engineering (cs.SE)
- Arxiv link: https://arxiv.org/abs/2311.07107
- Pdf link: https://arxiv.org/pdf/2311.07107
- Abstract (Source) code search is widely concerned by software engineering researchers because it can improve the productivity and quality of software development. Given a functionality requirement usually described in a natural language sentence, a code search system can retrieve code snippets that satisfy the requirement from a large-scale code corpus, e.g., GitHub. To realize effective and efficient code search, many techniques have been proposed successively. These techniques improve code search performance mainly by optimizing three core components, including query understanding component, code understanding component, and query-code matching component. In this paper, we provide a 3-dimensional perspective survey for code search. Specifically, we categorize existing code search studies into query-end optimization techniques, code-end optimization techniques, and match-end optimization techniques according to the specific components they optimize. Considering that each end can be optimized independently and contributes to the code search performance, we treat each end as a dimension. Therefore, this survey is 3-dimensional in nature, and it provides a comprehensive summary of each dimension in detail. To understand the research trends of the three dimensions in existing code search studies, we systematically review 68 relevant literatures. Different from existing code search surveys that only focus on the query end or code end or introduce various aspects shallowly (including codebase, evaluation metrics, modeling technique, etc.), our survey provides a more nuanced analysis and review of the evolution and development of the underlying techniques used in the three ends. Based on a systematic review and summary of existing work, we outline several open challenges and opportunities at the three ends that remain to be addressed in future work.
Sum Rate Maximization under AoI Constraints for RIS-Assisted mmWave Communications
- Authors: Authors: Ziqi Guo, Yong Niu, Shiwen Mao, Changming Zhang, Ning Wang, Zhangdui Zhong, Bo Ai
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.07128
- Pdf link: https://arxiv.org/pdf/2311.07128
- Abstract The concept of age of information (AoI) has been proposed to quantify information freshness, which is crucial for time-sensitive applications. However, in millimeter wave (mmWave) communication systems, the link blockage caused by obstacles and the severe path loss greatly impair the freshness of information received by the user equipments (UEs). In this paper, we focus on reconfigurable intelligent surface (RIS)-assisted mmWave communications, where beamforming is performed at transceivers to provide directional beam gain and a RIS is deployed to combat link blockage. We aim to maximize the system sum rate while satisfying the information freshness requirements of UEs by jointly optimizing the beamforming at transceivers, the discrete RIS reflection coefficients, and the UE scheduling strategy. To facilitate a practical solution, we decompose the problem into two subproblems. For the first per-UE data rate maximization problem, we further decompose it into a beamforming optimization subproblem and a RIS reflection coefficient optimization subproblem. Considering the difficulty of channel estimation, we utilize the hierarchical search method for the former and the local search method for the latter, and then adopt the block coordinate descent (BCD) method to alternately solve them. For the second scheduling strategy design problem, a low-complexity heuristic scheduling algorithm is designed. Simulation results show that the proposed algorithm can effectively improve the system sum rate while satisfying the information freshness requirements of all UEs.
Learning Symmetrization for Equivariance with Orbit Distance Minimization
- Authors: Authors: Tien Dat Nguyen, Jinwoo Kim, Hongseok Yang, Seunghoon Hong
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.07143
- Pdf link: https://arxiv.org/pdf/2311.07143
- Abstract We present a general framework for symmetrizing an arbitrary neural-network architecture and making it equivariant with respect to a given group. We build upon the proposals of Kim et al. (2023); Kaba et al. (2023) for symmetrization, and improve them by replacing their conversion of neural features into group representations, with an optimization whose loss intuitively measures the distance between group orbits. This change makes our approach applicable to a broader range of matrix groups, such as the Lorentz group O(1, 3), than these two proposals. We experimentally show our method's competitiveness on the SO(2) image classification task, and also its increased generality on the task with O(1, 3). Our implementation will be made accessible at https://github.com/tiendatnguyen-vision/Orbit-symmetrize.
Solving Inverse Obstacle Scattering Problem with Latent Surface Representations
- Authors: Authors: Junqing Chen, Bangti Jin, Haibo Liu
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2311.07187
- Pdf link: https://arxiv.org/pdf/2311.07187
- Abstract We propose a novel iterative numerical method to solve the three-dimensional inverse obstacle scattering problem of recovering the shape of the obstacle from far-field measurements. To address the inherent ill-posed nature of the inverse problem, we advocate the use of a trained latent representation of surfaces as the generative prior. This prior enjoys excellent expressivity within the given class of shapes, and meanwhile, the latent dimensionality is low, which greatly facilitates the computation. Thus, the admissible manifold of surfaces is realistic and the resulting optimization problem is less ill-posed. We employ the shape derivative to evolve the latent surface representation, by minimizing the loss, and we provide a local convergence analysis of a gradient descent type algorithm to a stationary point of the loss. We present several numerical examples, including also backscattered and phaseless data, to showcase the effectiveness of the proposed algorithm.
On Elastic Language Models
- Authors: Authors: Chen Zhang, Benyou Wang, Dawei Song
- Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.07204
- Pdf link: https://arxiv.org/pdf/2311.07204
- Abstract Large-scale pretrained language models have achieved compelling performance in a wide range of language understanding and information retrieval tasks. Knowledge distillation offers an opportunity to compress a large language model to a small one, in order to reach a reasonable latency-performance tradeoff. However, for scenarios where the number of requests (e.g., queries submitted to a search engine) is highly variant, the static tradeoff attained by the compressed language model might not always fit. Once a model is assigned with a static tradeoff, it could be inadequate in that the latency is too high when the number of requests is large or the performance is too low when the number of requests is small. To this end, we propose an elastic language model (ElasticLM) that elastically adjusts the tradeoff according to the request stream. The basic idea is to introduce a compute elasticity to the compressed language model, so that the tradeoff could vary on-the-fly along scalable and controllable compute. Specifically, we impose an elastic structure to enable ElasticLM with compute elasticity and design an elastic optimization to learn ElasticLM under compute elasticity. To serve ElasticLM, we apply an elastic schedule. Considering the specificity of information retrieval, we adapt ElasticLM to dense retrieval and reranking and present ElasticDenser and ElasticRanker respectively. Offline evaluation is conducted on a language understanding benchmark GLUE; and several information retrieval tasks including Natural Question, Trivia QA, and MS MARCO. The results show that ElasticLM along with ElasticDenser and ElasticRanker can perform correctly and competitively compared with an array of static baselines. Furthermore, online simulation with concurrency is also carried out. The results demonstrate that ElasticLM can provide elastic tradeoffs with respect to varying request stream.
DAGC: Data-Volume-Aware Adaptive Sparsification Gradient Compression for Distributed Machine Learning in Mobile Computing
- Authors: Authors: Rongwei Lu, Yutong Jiang, Yinan Mao, Chen Tang, Bin Chen, Laizhong Cui, Zhi Wang
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.07324
- Pdf link: https://arxiv.org/pdf/2311.07324
- Abstract Distributed machine learning (DML) in mobile environments faces significant communication bottlenecks. Gradient compression has emerged as an effective solution to this issue, offering substantial benefits in environments with limited bandwidth and metered data. Yet, they encounter severe performance drop in non-IID environments due to a one-size-fits-all compression approach, which does not account for the varying data volumes across workers. Assigning varying compression ratios to workers with distinct data distributions and volumes is thus a promising solution. This study introduces an analysis of distributed SGD with non-uniform compression, which reveals that the convergence rate (indicative of the iterations needed to achieve a certain accuracy) is influenced by compression ratios applied to workers with differing volumes. Accordingly, we frame relative compression ratio assignment as an $n$-variables chi-square nonlinear optimization problem, constrained by a fixed and limited communication budget. We propose DAGC-R, which assigns the worker handling larger data volumes the conservative compression. Recognizing the computational limitations of mobile devices, we DAGC-A, which are computationally less demanding and enhances the robustness of the absolute gradient compressor in non-IID scenarios. Our experiments confirm that both the DAGC-A and DAGC-R can achieve better performance when dealing with highly imbalanced data volume distribution and restricted communication.
Throughput Maximization in Multi-Band Optical Networks with Column Generation
- Authors: Authors: Cao Chen, Shilin Xiao, Fen Zhou, Massimo Tornatore
- Subjects: Networking and Internet Architecture (cs.NI)
- Arxiv link: https://arxiv.org/abs/2311.07335
- Pdf link: https://arxiv.org/pdf/2311.07335
- Abstract Multi-band transmission is a promising technical direction for spectrum and capacity expansion of existing optical networks. Due to the increase in the number of usable wavelengths in multi-band optical networks, the complexity of resource allocation problems becomes a major concern. Moreover, the transmission performance, spectrum width, and cost constraint across optical bands may be heterogeneous. Assuming a worst-case transmission margin in U, L, and C-bands, this paper investigates the problem of throughput maximization in multi-band optical networks, including the optimization of route, wavelength, and band assignment. We propose a low-complexity decomposition approach based on Column Generation (CG) to address the scalability issue faced by traditional methodologies. We numerically compare the results obtained by our CG-based approach to an integer linear programming model, confirming the near-optimal network throughput. Our results also demonstrate the scalability of the CG-based approach when the number of wavelengths increases, with the computation time in the magnitude order of 10 s for cases varying from 75 to 1200 wavelength channels per link in a 14-node network.
Goal-oriented Estimation of Multiple Markov Sources in Resource-constrained Systems
- Authors: Authors: Jiping Luo, Nikolaos Pappas
- Subjects: Systems and Control (eess.SY); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
- Arxiv link: https://arxiv.org/abs/2311.07346
- Pdf link: https://arxiv.org/pdf/2311.07346
- Abstract This paper investigates goal-oriented communication for remote estimation of multiple Markov sources in resource-constrained networks. An agent selects the update order of the sources and transmits the packet to a remote destination over an unreliable delay channel. The destination is tasked with source reconstruction for the purpose of actuation. We utilize the metric cost of actuation error (CAE) to capture the significance (semantics) of error at the point of actuation. We aim to find an optimal sampling policy that minimizes the time-averaged CAE subject to average resource constraints. We formulate this problem as an average-cost constrained Markov Decision Process (CMDP) and transform it into an unconstrained MDP by utilizing Lyapunov drift techniques. Then, we propose a low-complexity drift-plus-penalty(DPP) policy for systems with known source/channel statistics and a Lyapunov optimization-based deep reinforcement learning (LO-DRL) policy for unknown environments. Our policies achieve near-optimal performance in CAE minimization and significantly reduce the number of uninformative transmissions.
Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse
- Authors: Authors: Ang Lv, Kaiyi Zhang, Shufang Xie, Quan Tu, Yuhan Chen, Ji-Rong Wen, Rui Yan
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.07468
- Pdf link: https://arxiv.org/pdf/2311.07468
- Abstract Recent studies have highlighted a phenomenon in large language models (LLMs) known as "the reversal curse," in which the order of knowledge entities in the training data biases the models' comprehension. For example, if a model is trained on sentences where entity A consistently appears before entity B, it can respond to queries about A by providing B. However, it may encounter confusion when presented with questions concerning B. We contend that the reversal curse is partially a result of specific model training objectives, particularly evident in the prevalent use of the next-token prediction within most causal language models. For the next-token prediction, models solely focus on a token's preceding context, resulting in a restricted comprehension of the input. In contrast, we illustrate that the GLM, trained using the autoregressive blank infilling objective where tokens to be predicted have access to the entire context, exhibits better resilience against the reversal curse. We propose a novel training method, BIdirectional Casual language modeling Optimization (BICO), designed to mitigate the reversal curse when fine-tuning pretrained causal language models on new data. BICO modifies the causal attention mechanism to function bidirectionally and employs a mask denoising optimization. In the task designed to assess the reversal curse, our approach improves Llama's accuracy from the original 0% to around 70%. We hope that more attention can be focused on exploring and addressing these inherent weaknesses of the current LLMs, in order to achieve a higher level of intelligence.
Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks
- Authors: Authors: Jake Ryland Williams, Haoran Zhao
- Subjects: Machine Learning (cs.LG); Probability (math.PR); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.07498
- Pdf link: https://arxiv.org/pdf/2311.07498
- Abstract Iterative differential approximation methods that rely upon backpropagation have enabled the optimization of neural networks; however, at present, they remain computationally expensive, especially when training models at scale. In this paper, we propose a computationally efficient alternative for optimizing neural networks that can both reduce the costs of scaling neural networks and provide high-efficiency optimizations for low-resource applications. We derive an explicit solution to a simple feed-forward language model (LM) by mathematically analyzing its gradients. This solution generalizes from single-layer LMs to the class of all single-layer feed-forward softmax-activated neural models trained on positive-valued features, as is demonstrated by our extension of this solution application to MNIST digit classification. For both LM and digit classifiers, we find computationally that explicit solutions perform near-optimality in experiments showing that 1) iterative optimization only marginally improves the explicit solution parameters and 2) randomly initialized parameters iteratively optimize towards the explicit solution. We also preliminarily apply the explicit solution locally by layer in multi-layer networks and discuss how the solution's computational savings increase with model complexity -- for both single- and mult-layer applications of the explicit solution, we emphasize that the optima achieved cannot be reached by backpropagation alone, i.e., better optima appear discoverable only after explicit solutions are applied. Finally, we discuss the solution's computational savings alongside its impact on model interpretability and suggest future directions for the derivation of explicit solutions to complex- and multi-layer architectures.
Explicit Foundation Model Optimization with Self-Attentive Feed-Forward Neural Units
- Authors: Authors: Jake Ryland Williams, Haoran Zhao
- Subjects: Machine Learning (cs.LG); Probability (math.PR); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.07510
- Pdf link: https://arxiv.org/pdf/2311.07510
- Abstract Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive, especially when used at scale. This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications. We will discuss a general result about feed-forward neural networks and then extend this solution to compositional (mult-layer) networks, which are applied to a simplified transformer block containing feed-forward and self-attention layers. These models are used to train highly-specified and complex multi-layer neural architectures that we refer to as self-attentive feed-forward unit (SAFFU) layers, which we use to develop a transformer that appears to generalize well over small, cognitively-feasible, volumes of data. Testing demonstrates explicit solutions outperform models optimized by backpropagation alone. Moreover, further application of backpropagation after explicit solutions leads to better optima from smaller scales of data, training effective models from much less data is enabled by explicit solution warm starts. We then carry out ablation experiments training a roadmap of about 250 transformer models over 1-million tokens to determine ideal settings. We find that multiple different architectural variants produce highly-performant models, and discover from this ablation that some of the best are not the most parameterized. This appears to indicate well-generalized models could be reached using less data by using explicit solutions, and that architectural exploration using explicit solutions pays dividends in guiding the search for efficient variants with fewer parameters, and which could be incorporated into low-resource hardware where AI might be embodied.
Interpretable Fine-Tuning for Graph Neural Network Surrogate Models
- Authors: Authors: Shivam Barwey, Romit Maulik
- Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)
- Arxiv link: https://arxiv.org/abs/2311.07548
- Pdf link: https://arxiv.org/pdf/2311.07548
- Abstract Data-based surrogate modeling has surged in capability in recent years with the emergence of graph neural networks (GNNs), which can operate directly on mesh-based representations of data. The goal of this work is to introduce an interpretable fine-tuning strategy for GNNs, with application to unstructured mesh-based fluid dynamics modeling. The end result is a fine-tuned GNN that adds interpretability to a pre-trained baseline GNN through an adaptive sub-graph sampling strategy that isolates regions in physical space intrinsically linked to the forecasting task, while retaining the predictive capability of the baseline. The structures identified by the fine-tuned GNNs, which are adaptively produced in the forward pass as explicit functions of the input, serve as an accessible link between the baseline model architecture, the optimization goal, and known problem-specific physics. Additionally, through a regularization procedure, the fine-tuned GNNs can also be used to identify, during inference, graph nodes that correspond to a majority of the anticipated forecasting error, adding a novel interpretable error-tagging capability to baseline models. Demonstrations are performed using unstructured flow data sourced from flow over a backward-facing step at high Reynolds numbers.
Sound Gradual Verification with Symbolic Execution
- Authors: Authors: Conrad Zimmerman, Jenna DiVincenzo, Jonathan Aldrich
- Subjects: Programming Languages (cs.PL)
- Arxiv link: https://arxiv.org/abs/2311.07559
- Pdf link: https://arxiv.org/pdf/2311.07559
- Abstract Gradual verification, which supports explicitly partial specifications and verifies them with a combination of static and dynamic checks, makes verification more incremental and provides earlier feedback to developers. While an abstract, weakest precondition-based approach to gradual verification was previously proven sound, the approach did not provide sufficient guidance for implementation and optimization of the required run-time checks. More recently, gradual verification was implemented using symbolic execution techniques, but the soundness of the approach (as with related static checkers based on implicit dynamic frames) was an open question. This paper puts practical gradual verification on a sound footing with a formalization of symbolic execution, optimized run-time check generation, and run time execution. We prove our approach is sound; our proof also covers a core subset of the Viper tool, for which we are aware of no previous soundness result. Our formalization enabled us to find a soundness bug in an implemented gradual verification tool and describe the fix necessary to make it sound.
Keyword: adam
Convolution quadrature for Hadamard fractional calculus and correction methods for the subdiffusion with singular source terms
- Authors: Authors: Baoli Yin, Guoyu Zhang, Yang Liu, Hong Li
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2311.06869
- Pdf link: https://arxiv.org/pdf/2311.06869
- Abstract The convolution quadrature method originally developed for the Riemann-Liouville fractional calculus is extended in this work to the Hadamard fractional calculus by using the exponential type meshes. Local truncation error analysis is presented for singular solutions. By adopting the fractional BDF-$p(1\leq p \leq 6)$ for the Caputo-Hadamard fractional derivative in solving subdiffusion problem with singular source terms, and using the finite element method to discretize the space variable, we carry out the sharp error analysis rigorously and obtain the optimal accuracy by the novel correction technique. Our correction method is a natural generalization of the one developed for subdiffusion problems with smooth source terms. Numerical tests confirm the correctness of our theoretical results.
ADAMM: Anomaly Detection of Attributed Multi-graphs with Metadata: A Unified Neural Network Approach
- Authors: Authors: Konstantinos Sotiropoulos, Lingxiao Zhao, Pierre Jinghong Liang, Leman Akoglu
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.07355
- Pdf link: https://arxiv.org/pdf/2311.07355
- Abstract Given a complex graph database of node- and edge-attributed multi-graphs as well as associated metadata for each graph, how can we spot the anomalous instances? Many real-world problems can be cast as graph inference tasks where the graph representation could capture complex relational phenomena (e.g., transactions among financial accounts in a journal entry), along with metadata reflecting tabular features (e.g. approver, effective date, etc.). While numerous anomaly detectors based on Graph Neural Networks (GNNs) have been proposed, none are capable of directly handling directed graphs with multi-edges and self-loops. Furthermore, the simultaneous handling of relational and tabular features remains an unexplored area. In this work we propose ADAMM, a novel graph neural network model that handles directed multi-graphs, providing a unified end-to-end architecture that fuses metadata and graph-level representation learning through an unsupervised anomaly detection objective. Experiments on datasets from two different domains, namely, general-ledger journal entries from different firms (accounting) as well as human GPS trajectories from thousands of individuals (urban mobility) validate ADAMM's generality and detection effectiveness of expert-guided and ground-truth anomalies. Notably, ADAMM outperforms existing baselines that handle the two data modalities (graph and metadata) separately with post hoc synthesis efforts.
Keyword: gradient
R$^2$NMPC: A Real-Time Reduced Robustified Nonlinear Model Predictive Control with Ellipsoidal Uncertainty Sets for Autonomous Vehicle Motion Control
- Authors: Authors: Baha Zarrouki, João Nunes, Johannes Betz
- Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2311.06420
- Pdf link: https://arxiv.org/pdf/2311.06420
- Abstract In this paper, we present a novel Reduced Robustified NMPC (R$^2$NMPC) algorithm that has the same complexity as an equivalent nominal NMPC while enhancing it with robustified constraints based on the dynamics of ellipsoidal uncertainty sets. This promises both a closed-loop- and constraint satisfaction performance equivalent to common Robustified NMPC approaches, while drastically reducing the computational complexity. The main idea lies in approximating the ellipsoidal uncertainty sets propagation over the prediction horizon with the system dynamics' sensitivities inferred from the last optimal control problem (OCP) solution, and similarly for the gradients to robustify the constraints. Thus, we do not require the decision variables related to the uncertainty propagation within the OCP, rendering it computationally tractable. Next, we illustrate the real-time control capabilities of our algorithm in handling a complex, high-dimensional, and highly nonlinear system, namely the trajectory following of an autonomous passenger vehicle modeled with a dynamic nonlinear single-track model. Our experimental findings, alongside a comparative assessment against other Robust NMPC approaches, affirm the robustness of our method in effectively tracking an optimal racetrack trajectory while satisfying the nonlinear constraints. This performance is achieved while fully utilizing the vehicle's interface limits, even at high speeds of up to 37.5m/s, and successfully managing state estimation disturbances. Remarkably, our approach maintains a mean solving frequency of 144Hz.
Flatness-aware Adversarial Attack
- Authors: Authors: Mingyuan Fan, Xiaodan Li, Cen Chen, Yinggui Wang
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2311.06423
- Pdf link: https://arxiv.org/pdf/2311.06423
- Abstract The transferability of adversarial examples can be exploited to launch black-box attacks. However, adversarial examples often present poor transferability. To alleviate this issue, by observing that the diversity of inputs can boost transferability, input regularization based methods are proposed, which craft adversarial examples by combining several transformed inputs. We reveal that input regularization based methods make resultant adversarial examples biased towards flat extreme regions. Inspired by this, we propose an attack called flatness-aware adversarial attack (FAA) which explicitly adds a flatness-aware regularization term in the optimization target to promote the resultant adversarial examples towards flat extreme regions. The flatness-aware regularization term involves gradients of samples around the resultant adversarial examples but optimizing gradients requires the evaluation of Hessian matrix in high-dimension spaces which generally is intractable. To address the problem, we derive an approximate solution to circumvent the construction of Hessian matrix, thereby making FAA practical and cheap. Extensive experiments show the transferability of adversarial examples crafted by FAA can be considerably boosted compared with state-of-the-art baselines.
Modeling Power Systems Dynamics with Symbolic Physics-Informed Neural Networks
- Authors: Authors: Huynh T. T. Tran, Hieu T. Nguyen
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.06580
- Pdf link: https://arxiv.org/pdf/2311.06580
- Abstract In recent years, scientific machine learning, particularly physic-informed neural networks (PINNs), has introduced new innovative methods to understanding the differential equations that describe power system dynamics, providing a more efficient alternative to traditional methods. However, using a single neural network to capture patterns of all variables requires a large enough size of networks, leading to a long time of training and still high computational costs. In this paper, we utilize the interfacing of PINNs with symbolic techniques to construct multiple single-output neural networks by taking the loss function apart and integrating it over the relevant domain. Also, we reweigh the factors of the components in the loss function to improve the performance of the network for instability systems. Our results show that the symbolic PINNs provide higher accuracy with significantly fewer parameters and faster training time. By using the adaptive weight method, the symbolic PINNs can avoid the vanishing gradient problem and numerical instability.
Deep JKO: time-implicit particle methods for general nonlinear gradient flows
- Authors: Authors: Wonjun Lee, Li Wang, Wuchen Li
- Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
- Arxiv link: https://arxiv.org/abs/2311.06700
- Pdf link: https://arxiv.org/pdf/2311.06700
- Abstract We develop novel neural network-based implicit particle methods to compute high-dimensional Wasserstein-type gradient flows with linear and nonlinear mobility functions. The main idea is to use the Lagrangian formulation in the Jordan--Kinderlehrer--Otto (JKO) framework, where the velocity field is approximated using a neural network. We leverage the formulations from the neural ordinary differential equation (neural ODE) in the context of continuous normalizing flow for efficient density computation. Additionally, we make use of an explicit recurrence relation for computing derivatives, which greatly streamlines the backpropagation process. Our methodology demonstrates versatility in handling a wide range of gradient flows, accommodating various potential functions and nonlinear mobility scenarios. Extensive experiments demonstrate the efficacy of our approach, including an illustrative example from Bayesian inverse problems. This underscores that our scheme provides a viable alternative solver for the Kalman-Wasserstein gradient flow.
Personalized Federated Learning via ADMM with Moreau Envelope
- Authors: Authors: Shengkun Zhu, Jinshan Zeng, Sheng Wang, Yuan Sun, Zhiyong Peng
- Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
- Arxiv link: https://arxiv.org/abs/2311.06756
- Pdf link: https://arxiv.org/pdf/2311.06756
- Abstract Personalized federated learning (PFL) is an approach proposed to address the issue of poor convergence on heterogeneous data. However, most existing PFL frameworks require strong assumptions for convergence. In this paper, we propose an alternating direction method of multipliers (ADMM) for training PFL models with Moreau envelope (FLAME), which achieves a sublinear convergence rate, relying on the relatively weak assumption of gradient Lipschitz continuity. Moreover, due to the gradient-free nature of ADMM, FLAME alleviates the need for hyperparameter tuning, particularly in avoiding the adjustment of the learning rate when training the global model. In addition, we propose a biased client selection strategy to expedite the convergence of training of PFL models. Our theoretical analysis establishes the global convergence under both unbiased and biased client selection strategies. Our experiments validate that FLAME, when trained on heterogeneous data, outperforms state-of-the-art methods in terms of model performance. Regarding communication efficiency, it exhibits an average speedup of 3.75x compared to the baselines. Furthermore, experimental results validate that the biased client selection strategy speeds up the convergence of both personalized and global models.
Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for Cross-Lingual Machine Reading Comprehension
- Authors: Authors: Tingfeng Cao, Chengyu Wang, Chuanqi Tan, Jun Huang, Jinhui Zhu
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2311.06758
- Pdf link: https://arxiv.org/pdf/2311.06758
- Abstract In cross-lingual language understanding, machine translation is often utilized to enhance the transferability of models across languages, either by translating the training data from the source language to the target, or from the target to the source to aid inference. However, in cross-lingual machine reading comprehension (MRC), it is difficult to perform a deep level of assistance to enhance cross-lingual transfer because of the variation of answer span positions in different languages. In this paper, we propose X-STA, a new approach for cross-lingual MRC. Specifically, we leverage an attentive teacher to subtly transfer the answer spans of the source language to the answer output space of the target. A Gradient-Disentangled Knowledge Sharing technique is proposed as an improved cross-attention block. In addition, we force the model to learn semantic alignments from multiple granularities and calibrate the model outputs with teacher guidance to enhance cross-lingual transferability. Experiments on three multi-lingual MRC datasets show the effectiveness of our method, outperforming state-of-the-art approaches.
Inference and Interference: The Role of Clipping, Pruning and Loss Landscapes in Differentially Private Stochastic Gradient Descent
- Authors: Authors: Lauren Watson, Eric Gan, Mohan Dantam, Baharan Mirzasoleiman, Rik Sarkar
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2311.06839
- Pdf link: https://arxiv.org/pdf/2311.06839
- Abstract Differentially private stochastic gradient descent (DP-SGD) is known to have poorer training and test performance on large neural networks, compared to ordinary stochastic gradient descent (SGD). In this paper, we perform a detailed study and comparison of the two processes and unveil several new insights. By comparing the behavior of the two processes separately in early and late epochs, we find that while DP-SGD makes slower progress in early stages, it is the behavior in the later stages that determines the end result. This separate analysis of the clipping and noise addition steps of DP-SGD shows that while noise introduces errors to the process, gradient descent can recover from these errors when it is not clipped, and clipping appears to have a larger impact than noise. These effects are amplified in higher dimensions (large neural networks), where the loss basin occupies a lower dimensional space. We argue theoretically and using extensive experiments that magnitude pruning can be a suitable dimension reduction technique in this regard, and find that heavy pruning can improve the test accuracy of DPSGD.
Energy-efficient Beamforming for RISs-aided Communications: Gradient Based Meta Learning
- Authors: Authors: Xinquan Wang, Fenghao Zhu, Qianyun Zhou, Qihao Yu, Chongwen Huang, Ahmed Alhammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.06861
- Pdf link: https://arxiv.org/pdf/2311.06861
- Abstract Reconfigurable intelligent surfaces (RISs) have become a promising technology to meet the requirements of energy efficiency and scalability in future six-generation (6G) communications. However, a significant challenge in RISs-aided communications is the joint optimization of active and passive beamforming at base stations (BSs) and RISs respectively. Specifically, the main difficulty is attributed to the highly non-convex optimization space of beamforming matrices at both BSs and RISs, as well as the diversity and mobility of communication scenarios. To address this, we present a greenly gradient based meta learning beamforming (GMLB) approach. Unlike traditional deep learning based methods which take channel information directly as input, GMLB feeds the gradient of sum rate into neural networks. Coherently, we design a differential regulator to address the phase shift optimization of RISs. Moreover, we use the meta learning to iteratively optimize the beamforming matrices of BSs and RISs. These techniques make the proposed method to work well without requiring energy-consuming pre-training. Simulations show that GMLB could achieve higher sum rate than that of typical alternating optimization algorithms with the energy consumption by two orders of magnitude less.
Symbol-Error Probability Constrained Power Minimization for Reconfigurable Intelligent Surfaces-based Passive Transmitter
- Authors: Authors: Erico S. P. Lopes, Lukas T. N. Landau
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.06900
- Pdf link: https://arxiv.org/pdf/2311.06900
- Abstract This study considers a virtual multiuser multiple-input multiple-output system with PSK modulation realized via the reconfigurable intelligent surface-based passive transmitter setup. Under this framework, the study derives the formulation for the union-bound symbol-error probability, which is an upper bound on the actual symbol-error probability. Based on this, a symbol-level precoding power minimization problem under the condition that the union-bound symbol-error probability is below a given requirement is proposed. The problem is formulated as a constrained optimization on an oblique manifold, and solved via a bisection method. The method consists of successively optimizing transmit power while evaluating the feasibility of the union-bound symbol-error probability requisite by solving, via the Riemannian conjugate gradient algorithm, an auxiliary problem dependent only on the reflection coefficients of the reconfigurable intelligent surface elements. Numerical results demonstrate the effectiveness of the proposed approach in minimizing the transmit power for different symbol-error probability requirements.
Resource-Aware Hierarchical Federated Learning for Video Caching in Wireless Networks
- Authors: Authors: Md Ferdous Pervej, Andreas F Molisch
- Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.06918
- Pdf link: https://arxiv.org/pdf/2311.06918
- Abstract Video caching can significantly improve backhaul traffic congestion by locally storing the popular content that users frequently request. A privacy-preserving method is desirable to learn how users' demands change over time. As such, this paper proposes a novel resource-aware hierarchical federated learning (RawHFL) solution to predict users' future content requests under the realistic assumptions that content requests are sporadic and users' datasets can only be updated based on the requested content's information. Considering a partial client participation case, we first derive the upper bound of the global gradient norm that depends on the clients' local training rounds and the successful reception of their accumulated gradients over the wireless links. Under delay, energy and radio resource constraints, we then optimize client selection and their local rounds and central processing unit (CPU) frequencies to minimize a weighted utility function that facilitates RawHFL's convergence in an energy-efficient way. Our simulation results show that the proposed solution significantly outperforms the considered baselines in terms of prediction accuracy and total energy expenditure.
AGRAMPLIFIER: Defending Federated Learning Against Poisoning Attacks Through Local Update Amplification
- Authors: Authors: Zirui Gong, Liyue Shen, Yanjun Zhang, Leo Yu Zhang, Jingwei Wang, Guangdong Bai, Yong Xiang
- Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2311.06996
- Pdf link: https://arxiv.org/pdf/2311.06996
- Abstract The collaborative nature of federated learning (FL) poses a major threat in the form of manipulation of local training data and local updates, known as the Byzantine poisoning attack. To address this issue, many Byzantine-robust aggregation rules (AGRs) have been proposed to filter out or moderate suspicious local updates uploaded by Byzantine participants. This paper introduces a novel approach called AGRAMPLIFIER, aiming to simultaneously improve the robustness, fidelity, and efficiency of the existing AGRs. The core idea of AGRAMPLIFIER is to amplify the "morality" of local updates by identifying the most repressive features of each gradient update, which provides a clearer distinction between malicious and benign updates, consequently improving the detection effect. To achieve this objective, two approaches, namely AGRMP and AGRXAI, are proposed. AGRMP organizes local updates into patches and extracts the largest value from each patch, while AGRXAI leverages explainable AI methods to extract the gradient of the most activated features. By equipping AGRAMPLIFIER with the existing Byzantine-robust mechanisms, we successfully enhance the model's robustness, maintaining its fidelity and improving overall efficiency. AGRAMPLIFIER is universally compatible with the existing Byzantine-robust mechanisms. The paper demonstrates its effectiveness by integrating it with all mainstream AGR mechanisms. Extensive evaluations conducted on seven datasets from diverse domains against seven representative poisoning attacks consistently show enhancements in robustness, fidelity, and efficiency, with average gains of 40.08%, 39.18%, and 10.68%, respectively.
Embarassingly Simple Dataset Distillation
- Authors: Authors: Feng Yunzhen, Vedantam Ramakrishna, Kempe Julia
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.07025
- Pdf link: https://arxiv.org/pdf/2311.07025
- Abstract Dataset distillation extracts a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample. In this work, we tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. Re-examining the foundational back-propagation through time method, we study the pronounced variance in the gradients, computational burden, and long-term dependencies. We introduce an improved method: Random Truncated Backpropagation Through Time (RaT-BPTT) to address them. RaT-BPTT incorporates a truncation coupled with a random window, effectively stabilizing the gradients and speeding up the optimization while covering long dependencies. This allows us to establish new state-of-the-art for a variety of standard dataset benchmarks. A deeper dive into the nature of distilled data unveils pronounced intercorrelation. In particular, subsets of distilled datasets tend to exhibit much worse performance than directly distilled smaller datasets of the same size. Leveraging RaT-BPTT, we devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets.
Non-approximability of constructive global $\mathcal{L}^2$ minimizers by gradient descent in Deep Learning
- Authors: Authors: Thomas Chen, Patricia Muñoz Ewald
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Mathematical Physics (math-ph); Optimization and Control (math.OC); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.07065
- Pdf link: https://arxiv.org/pdf/2311.07065
- Abstract We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL) networks. In particular, we prove that the globally minimizing weights and biases for the $\mathcal{L}^2$ cost obtained constructively in [Chen-Munoz Ewald 2023] for underparametrized ReLU DL networks can generically not be approximated via the gradient descent flow. We therefore conclude that the method introduced in [Chen-Munoz Ewald 2023] is disjoint from the gradient descent method.
Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning
- Authors: Authors: Yue Yu, Jiaming Shen, Tianqi Liu, Zhen Qin, Jing Nathan Yan, Jialu Liu, Chao Zhang, Michael Bendersky
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2311.07099
- Pdf link: https://arxiv.org/pdf/2311.07099
- Abstract Large language models (LLMs) have shown remarkable capabilities in various natural language understanding tasks. With only a few demonstration examples, these LLMs can quickly adapt to target tasks without expensive gradient updates. Common strategies to boost such 'in-context' learning ability are to ensemble multiple model decoded results and require the model to generate an explanation along with the prediction. However, these models often treat different class predictions equally and neglect the potential discrepancy between the explanations and predictions. To fully unleash the power of explanations, we propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs. We design two techniques, explanation-guided ensemble, and soft probability aggregation, to mitigate the effect of unreliable explanations and improve the consistency between explanations and final predictions. Experiments on seven natural language understanding tasks and four varying-size LLMs demonstrate the effectiveness of our proposed framework.
Secure Wireless Communication via Movable-Antenna Array
- Authors: Authors: Guojie Hu, Qingqing Wu, Kui Xu, Jiangbo Si, Naofal Al-Dhahir
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.07104
- Pdf link: https://arxiv.org/pdf/2311.07104
- Abstract Movable antenna (MA) array is a novel technology recently developed where positions of transmit/receive antennas can be flexibly adjusted in the specified region to reconfigure the wireless channel and achieve a higher capacity. In this letter, we, for the first time, investigate the MA array-assisted physical-layer security where the confidential information is transmitted from a MA array-enabled Alice to a single-antenna Bob, in the presence of multiple single-antenna and colluding eavesdroppers. We aim to maximize the achievable secrecy rate by jointly designing the transmit beamforming and positions of all antennas at Alice subject to the transmit power budget and specified regions for positions of all transmit antennas. The resulting problem is highly non-convex, for which the projected gradient ascent (PGA) and the alternating optimization methods are utilized to obtain a high-quality suboptimal solution. Simulation results demonstrate that since the additional spatial degree of freedom (DoF) can be fully exploited, the MA array significantly enhances the secrecy rate compared to the conventional fixed-position antenna (FPA) array.
Solving Inverse Obstacle Scattering Problem with Latent Surface Representations
- Authors: Authors: Junqing Chen, Bangti Jin, Haibo Liu
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2311.07187
- Pdf link: https://arxiv.org/pdf/2311.07187
- Abstract We propose a novel iterative numerical method to solve the three-dimensional inverse obstacle scattering problem of recovering the shape of the obstacle from far-field measurements. To address the inherent ill-posed nature of the inverse problem, we advocate the use of a trained latent representation of surfaces as the generative prior. This prior enjoys excellent expressivity within the given class of shapes, and meanwhile, the latent dimensionality is low, which greatly facilitates the computation. Thus, the admissible manifold of surfaces is realistic and the resulting optimization problem is less ill-posed. We employ the shape derivative to evolve the latent surface representation, by minimizing the loss, and we provide a local convergence analysis of a gradient descent type algorithm to a stationary point of the loss. We present several numerical examples, including also backscattered and phaseless data, to showcase the effectiveness of the proposed algorithm.
Input Convex LSTM: A Convex Approach for Fast Lyapunov-Based Model Predictive Control
- Authors: Authors: Zihao Wang, Zhe Wu
- Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.07202
- Pdf link: https://arxiv.org/pdf/2311.07202
- Abstract Leveraging Input Convex Neural Networks (ICNNs), ICNN-based Model Predictive Control (MPC) successfully attains globally optimal solutions by upholding convexity within the MPC framework. However, current ICNN architectures encounter the issue of vanishing gradients, which limits their ability to serve as deep neural networks for complex tasks. Additionally, the current neural network-based MPC, including conventional neural network-based MPC and ICNN-based MPC, faces slower convergence speed when compared to MPC based on first-principles models. In this study, we leverage the principles of ICNNs to propose a novel Input Convex LSTM for Lyapunov-based MPC, with the specific goal of reducing convergence time and mitigating the vanishing gradient problem while ensuring closed-loop stability. From a simulation study of a nonlinear chemical reactor, we observed a mitigation of vanishing gradient problem and a reduction in convergence time, with a percentage decrease of 46.7%, 31.3%, and 20.2% compared to baseline plain RNN, plain LSTM, and Input Convex Recurrent Neural Network, respectively.
How are Prompts Different in Terms of Sensitivity?
- Authors: Authors: Sheng Lu, Hendrik Schuff, Iryna Gurevych
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2311.07230
- Pdf link: https://arxiv.org/pdf/2311.07230
- Abstract In-context learning (ICL) has become one of the most popular learning paradigms. While there is a growing body of literature focusing on prompt engineering, there is a lack of systematic analysis comparing the effects of prompts across different models and tasks. To address this gap, we present a comprehensive prompt analysis based on the sensitivity of a function. Our analysis reveals that sensitivity is an unsupervised proxy for model performance, as it exhibits a strong negative correlation with accuracy. We use gradient-based saliency scores to empirically demonstrate how different prompts affect the relevance of input tokens to the output, resulting in different levels of sensitivity. Furthermore, we introduce sensitivity-aware decoding which incorporates sensitivity estimation as a penalty term in the standard greedy decoding. We show that this approach is particularly helpful when information in the input is scarce. Our work provides a fresh perspective on the analysis of prompts, and contributes to a better understanding of the mechanism of ICL.
DAGC: Data-Volume-Aware Adaptive Sparsification Gradient Compression for Distributed Machine Learning in Mobile Computing
- Authors: Authors: Rongwei Lu, Yutong Jiang, Yinan Mao, Chen Tang, Bin Chen, Laizhong Cui, Zhi Wang
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.07324
- Pdf link: https://arxiv.org/pdf/2311.07324
- Abstract Distributed machine learning (DML) in mobile environments faces significant communication bottlenecks. Gradient compression has emerged as an effective solution to this issue, offering substantial benefits in environments with limited bandwidth and metered data. Yet, they encounter severe performance drop in non-IID environments due to a one-size-fits-all compression approach, which does not account for the varying data volumes across workers. Assigning varying compression ratios to workers with distinct data distributions and volumes is thus a promising solution. This study introduces an analysis of distributed SGD with non-uniform compression, which reveals that the convergence rate (indicative of the iterations needed to achieve a certain accuracy) is influenced by compression ratios applied to workers with differing volumes. Accordingly, we frame relative compression ratio assignment as an $n$-variables chi-square nonlinear optimization problem, constrained by a fixed and limited communication budget. We propose DAGC-R, which assigns the worker handling larger data volumes the conservative compression. Recognizing the computational limitations of mobile devices, we DAGC-A, which are computationally less demanding and enhances the robustness of the absolute gradient compressor in non-IID scenarios. Our experiments confirm that both the DAGC-A and DAGC-R can achieve better performance when dealing with highly imbalanced data volume distribution and restricted communication.
Boolean Variation and Boolean Logic BackPropagation
- Authors: Authors: Van Minh Nguyen
- Subjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM); Logic in Computer Science (cs.LO); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2311.07427
- Pdf link: https://arxiv.org/pdf/2311.07427
- Abstract The notion of variation is introduced for the Boolean set and based on which Boolean logic backpropagation principle is developed. Using this concept, deep models can be built with weights and activations being Boolean numbers and operated with Boolean logic instead of real arithmetic. In particular, Boolean deep models can be trained directly in the Boolean domain without latent weights. No gradient but logic is synthesized and backpropagated through layers.
Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks
- Authors: Authors: Jake Ryland Williams, Haoran Zhao
- Subjects: Machine Learning (cs.LG); Probability (math.PR); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.07498
- Pdf link: https://arxiv.org/pdf/2311.07498
- Abstract Iterative differential approximation methods that rely upon backpropagation have enabled the optimization of neural networks; however, at present, they remain computationally expensive, especially when training models at scale. In this paper, we propose a computationally efficient alternative for optimizing neural networks that can both reduce the costs of scaling neural networks and provide high-efficiency optimizations for low-resource applications. We derive an explicit solution to a simple feed-forward language model (LM) by mathematically analyzing its gradients. This solution generalizes from single-layer LMs to the class of all single-layer feed-forward softmax-activated neural models trained on positive-valued features, as is demonstrated by our extension of this solution application to MNIST digit classification. For both LM and digit classifiers, we find computationally that explicit solutions perform near-optimality in experiments showing that 1) iterative optimization only marginally improves the explicit solution parameters and 2) randomly initialized parameters iteratively optimize towards the explicit solution. We also preliminarily apply the explicit solution locally by layer in multi-layer networks and discuss how the solution's computational savings increase with model complexity -- for both single- and mult-layer applications of the explicit solution, we emphasize that the optima achieved cannot be reached by backpropagation alone, i.e., better optima appear discoverable only after explicit solutions are applied. Finally, we discuss the solution's computational savings alongside its impact on model interpretability and suggest future directions for the derivation of explicit solutions to complex- and multi-layer architectures.
Finding planted cliques using Markov chain Monte Carlo
- Authors: Authors: Reza Gheissari, Aukosh Jagannath, Yiming Xu
- Subjects: Data Structures and Algorithms (cs.DS); Probability (math.PR); Statistics Theory (math.ST)
- Arxiv link: https://arxiv.org/abs/2311.07540
- Pdf link: https://arxiv.org/pdf/2311.07540
- Abstract The planted clique problem is a paradigmatic model of statistical-to-computational gaps: the planted clique is information-theoretically detectable if its size $k\ge 2\log_2 n$ but polynomial-time algorithms only exist for the recovery task when $k= \Omega(\sqrt{n})$. By now, there are many simple and fast algorithms that succeed as soon as $k = \Omega(\sqrt{n})$. Glaringly, however, no MCMC approach to the problem had been shown to work, including the Metropolis process on cliques studied by Jerrum since 1992. In fact, Chen, Mossel, and Zadik recently showed that any Metropolis process whose state space is the set of cliques fails to find any sub-linear sized planted clique in polynomial time if initialized naturally from the empty set. Here, we redeem MCMC performance for the planted clique problem by relaxing the state space to all vertex subsets and adding a corresponding energy penalty for missing edges. With that, we prove that energy-minimizing Markov chains (gradient descent and a low-temperature relaxation of it) succeed at recovering planted cliques of size $k = \Omega(\sqrt{n})$ if initialized from the full graph. Importantly, initialized from the empty set, the relaxation still does not help the gradient descent find sub-linear planted cliques. We also demonstrate robustness of these Markov chain approaches under a natural contamination model.
Keyword: super-resolution
There is no result