arxiv-updates New submissions for Wed, 4 Oct 23

New submissions for Wed, 4 Oct 23

Open zoq opened this issue 1 year ago • 0 comments

Keyword: sgd

Stochastic Gradient Descent with Preconditioned Polyak Step-size

Authors: Authors: Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Martin Takáč
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.02093
Pdf link: https://arxiv.org/pdf/2310.02093
Abstract Stochastic Gradient Descent (SGD) is one of the many iterative optimization methods that are widely used in solving machine learning problems. These methods display valuable properties and attract researchers and industrial machine learning engineers with their simplicity. However, one of the weaknesses of this type of methods is the necessity to tune learning rate (step-size) for every loss function and dataset combination to solve an optimization problem and get an efficient performance in a given time budget. Stochastic Gradient Descent with Polyak Step-size (SPS) is a method that offers an update rule that alleviates the need of fine-tuning the learning rate of an optimizer. In this paper, we propose an extension of SPS that employs preconditioning techniques, such as Hutchinson's method, Adam, and AdaGrad, to improve its performance on badly scaled and/or ill-conditioned datasets.

Symmetric Single Index Learning

Authors: Authors: Aaron Zweig, Joan Bruna
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.02117
Pdf link: https://arxiv.org/pdf/2310.02117
Abstract Few neural architectures lend themselves to provable learning with gradient based methods. One popular model is the single-index model, in which labels are produced by composing an unknown linear projection with a possibly unknown scalar link function. Learning this model with SGD is relatively well-understood, whereby the so-called information exponent of the link function governs a polynomial sample complexity rate. However, extending this analysis to deeper or more complicated architectures remains challenging. In this work, we consider single index learning in the setting of symmetric neural networks. Under analytic assumptions on the activation and maximum degree assumptions on the link function, we prove that gradient flow recovers the hidden planted direction, represented as a finitely supported vector in the feature space of power sum polynomials. We characterize a notion of information exponent adapted to our setting that controls the efficiency of learning.

Chunking: Forgetting Matters in Continual Learning even without Changing Tasks

Authors: Authors: Thomas L. Lee, Amos Storkey
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.02206
Pdf link: https://arxiv.org/pdf/2310.02206
Abstract Work on continual learning (CL) has largely focused on the problems arising from the dynamically-changing data distribution. However, CL can be decomposed into two sub-problems: (a) shifts in the data distribution, and (b) dealing with the fact that the data is split into chunks and so only a part of the data is available to be trained on at any point in time. In this work, we look at the latter sub-problem -- the chunking of data -- and note that previous analysis of chunking in the CL literature is sparse. We show that chunking is an important part of CL, accounting for around half of the performance drop from offline learning in our experiments. Furthermore, our results reveal that current CL algorithms do not address the chunking sub-problem, only performing as well as plain SGD training when there is no shift in the data distribution. We analyse why performance drops when learning occurs on chunks of data, and find that forgetting, which is often seen to be a problem due to distribution shift, still arises and is a significant problem. Motivated by an analysis of the linear case, we show that per-chunk weight averaging improves performance in the chunking setting and that this performance transfers to the full CL setting. Hence, we argue that work on chunking can help advance CL in general.

Keyword: optimization

Enhancing Secrecy in UAV RSMA Networks: Deep Unfolding Meets Deep Reinforcement Learning

Authors: Authors: Abuzar B. M. Adam, Mohammed A. M. Elhassan
Subjects: Cryptography and Security (cs.CR); Emerging Technologies (cs.ET); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.01437
Pdf link: https://arxiv.org/pdf/2310.01437
Abstract In this paper, we consider the maximization of the secrecy rate in multiple unmanned aerial vehicles (UAV) rate-splitting multiple access (RSMA) network. A joint beamforming, rate allocation, and UAV trajectory optimization problem is formulated which is nonconvex. Hence, the problem is transformed into a Markov decision problem and a novel multiagent deep reinforcement learning (DRL) framework is designed. The proposed framework (named DUN-DRL) combines deep unfolding to design beamforming and rate allocation, data-driven to design the UAV trajectory, and deep deterministic policy gradient (DDPG) for the learning procedure. The proposed DUN-DRL have shown great performance and outperformed other DRL-based methods in the literature.

Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets

Authors: Authors: Aakash Tripathi, Asim Waqas, Kavya Venkatesan, Yasin Yilmaz, Ghulam Rasool
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01438
Pdf link: https://arxiv.org/pdf/2310.01438
Abstract The advancements in data acquisition, storage, and processing techniques have resulted in the rapid growth of heterogeneous medical data. Integrating radiological scans, histopathology images, and molecular information with clinical data is essential for developing a holistic understanding of the disease and optimizing treatment. The need for integrating data from multiple sources is further pronounced in complex diseases such as cancer for enabling precision medicine and personalized treatments. This work proposes Multimodal Integration of Oncology Data System (MINDS) - a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources such as the Cancer Research Data Commons (CRDC) into an interconnected, patient-centric framework. MINDS offers an interface for exploring relationships across data types and building cohorts for developing large-scale multimodal machine learning models. By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability to uncover diagnostic and prognostic insights and enable evidence-based personalized care. MINDS tracks granular end-to-end data provenance, ensuring reproducibility and transparency. The cloud-native architecture of MINDS can handle exponential data growth in a secure, cost-optimized manner while ensuring substantial storage optimization, replication avoidance, and dynamic access capabilities. Auto-scaling, access controls, and other mechanisms guarantee pipelines' scalability and security. MINDS overcomes the limitations of existing biomedical data silos via an interoperable metadata-driven approach that represents a pivotal step toward the future of oncology data integration.

FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models

Authors: Authors: Jingwei Sun, Ziyue Xu, Hongxu Yin, Dong Yang, Daguang Xu, Yiran Chen, Holger R. Roth
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01467
Pdf link: https://arxiv.org/pdf/2310.01467
Abstract Pre-trained language models (PLM) have revolutionized the NLP landscape, achieving stellar performances across diverse tasks. These models, while benefiting from vast training data, often require fine-tuning on specific data to cater to distinct downstream tasks. However, this data adaptation process has inherent security and privacy concerns, primarily when leveraging user-generated, device-residing data. Federated learning (FL) provides a solution, allowing collaborative model fine-tuning without centralized data collection. However, applying FL to finetune PLMs is hampered by challenges, including restricted model parameter access, high computational requirements, and communication overheads. This paper introduces Federated Black-box Prompt Tuning (FedBPT), a framework designed to address these challenges. FedBPT does not require the clients to access the model parameters. By focusing on training optimal prompts and utilizing gradient-free optimization methods, FedBPT reduces the number of exchanged variables, boosts communication efficiency, and minimizes computational and storage costs. Experiments highlight the framework's ability to drastically cut communication and memory costs while maintaining competitive performance. Ultimately, FedBPT presents a promising solution for efficient, privacy-preserving fine-tuning of PLM in the age of large language models.

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

Authors: Authors: Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.01506
Pdf link: https://arxiv.org/pdf/2310.01506
Abstract Text-guided diffusion models have revolutionized image generation and editing, offering exceptional realism and diversity. Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt, the process commences by acquiring a noisy latent vector corresponding to the source image via the diffusion model. This vector is subsequently fed into separate source and target diffusion branches for editing. The accuracy of this inversion process significantly impacts the final editing outcome, influencing both essential content preservation of the source image and edit fidelity according to the target prompt. Prior inversion techniques aimed at finding a unified solution in both the source and target diffusion branches. However, our theoretical and empirical analyses reveal that disentangling these branches leads to a distinct separation of responsibilities for preserving essential content and ensuring edit fidelity. Building on this insight, we introduce "Direct Inversion," a novel technique achieving optimal performance of both branches with just three lines of code. To assess image editing performance, we present PIE-Bench, an editing benchmark with 700 images showcasing diverse scenes and editing types, accompanied by versatile annotations and comprehensive evaluation metrics. Compared to state-of-the-art optimization-based inversion techniques, our solution not only yields superior performance across 8 editing methods but also achieves nearly an order of speed-up.

Decision-Oriented Intervention Cost Prediction for Multi-robot Persistent Monitoring

Authors: Authors: Guangyao Shi, Chak Lam Shek, Nare Karapetyan, Pratap Tokekar
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.01519
Pdf link: https://arxiv.org/pdf/2310.01519
Abstract In this paper, we present a differentiable, decision-oriented learning technique for a class of vehicle routing problems. Specifically, we consider a scenario where a team of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) are persistently monitoring an environment. The UGVs are occasionally taken over by humans to take detours to recharge the depleted UAVs. The goal is to select routes for the UGVs so that they can efficiently monitor the environment while reducing the cost of interventions. The former is modeled as a monotone, submodular function whereas the latter is a linear function of the routes of the UGVs. We consider a scenario where the former is known but the latter depends on the context (e.g., wind and terrain conditions) that must be learned. Typically, we first learn to predict the cost function and then solve the optimization problem. However, the loss function used in prediction may be misaligned with our final goal of finding good routes. We propose a \emph{decision-oriented learning} framework that incorporates task optimization as a differentiable layer in the prediction phase. To make the task optimization (which is a non-monotone submodular function) differentiable, we propose the Differentiable Cost Scaled Greedy algorithm. We demonstrate the efficacy of the proposed framework through numerical simulations. The results show that the proposed framework can result in better performance than the traditional approach.

Primal-dual hybrid gradient algorithms for computing time-implicit Hamilton-Jacobi equations

Authors: Authors: Tingwei Meng, Wenbo Hao, Siting Liu, Stanley J. Osher, Wuchen Li
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.01605
Pdf link: https://arxiv.org/pdf/2310.01605
Abstract Hamilton-Jacobi (HJ) partial differential equations (PDEs) have diverse applications spanning physics, optimal control, game theory, and imaging sciences. This research introduces a first-order optimization-based technique for HJ PDEs, which formulates the time-implicit update of HJ PDEs as saddle point problems. We remark that the saddle point formulation for HJ equations is aligned with the primal-dual formulation of optimal transport and potential mean-field games (MFGs). This connection enables us to extend MFG techniques and design numerical schemes for solving HJ PDEs. We employ the primal-dual hybrid gradient (PDHG) method to solve the saddle point problems, benefiting from the simple structures that enable fast computations in updates. Remarkably, the method caters to a broader range of Hamiltonians, encompassing non-smooth and spatiotemporally dependent cases. The approach's effectiveness is verified through various numerical examples in both one-dimensional and two-dimensional examples, such as quadratic and $L^1$ Hamiltonians with spatial and time dependence.

Distributionally Robust Path Integral Control

Authors: Authors: Hyuk Park, Duo Zhou, Grani A. Hanasusanto, Takashi Tanaka
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.01633
Pdf link: https://arxiv.org/pdf/2310.01633
Abstract We consider a continuous-time continuous-space stochastic optimal control problem, where the controller lacks exact knowledge of the underlying diffusion process, relying instead on a finite set of historical disturbance trajectories. In situations where data collection is limited, the controller synthesized from empirical data may exhibit poor performance. To address this issue, we introduce a novel approach named Distributionally Robust Path Integral (DRPI). The proposed method employs distributionally robust optimization (DRO) to robustify the resulting policy against the unknown diffusion process. Notably, the DRPI scheme shows similarities with risk-sensitive control, which enables us to utilize the path integral control (PIC) framework as an efficient solution scheme. We derive theoretical performance guarantees for the DRPI scheme, which closely aligns with selecting a risk parameter in risk-sensitive control. We validate the efficacy of our scheme and showcase its superiority when compared to risk-neutral PIC policies in the absence of the true diffusion process.

Estimating and Implementing Conventional Fairness Metrics With Probabilistic Protected Features

Authors: Authors: Hadi Elzayn, Emily Black, Patrick Vossler, Nathanael Jo, Jacob Goldin, Daniel E. Ho
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.01679
Pdf link: https://arxiv.org/pdf/2310.01679
Abstract The vast majority of techniques to train fair models require access to the protected attribute (e.g., race, gender), either at train time or in production. However, in many important applications this protected attribute is largely unavailable. In this paper, we develop methods for measuring and reducing fairness violations in a setting with limited access to protected attribute labels. Specifically, we assume access to protected attribute labels on a small subset of the dataset of interest, but only probabilistic estimates of protected attribute labels (e.g., via Bayesian Improved Surname Geocoding) for the rest of the dataset. With this setting in mind, we propose a method to estimate bounds on common fairness metrics for an existing model, as well as a method for training a model to limit fairness violations by solving a constrained non-convex optimization problem. Unlike similar existing approaches, our methods take advantage of contextual information -- specifically, the relationships between a model's predictions and the probabilistic prediction of protected attributes, given the true protected attribute, and vice versa -- to provide tighter bounds on the true disparity. We provide an empirical illustration of our methods using voting data. First, we show our measurement method can bound the true disparity up to 5.5x tighter than previous methods in these applications. Then, we demonstrate that our training technique effectively reduces disparity while incurring lesser fairness-accuracy trade-offs than other fair optimization methods with limited access to protected attributes.

Decentralized Micro Water-Energy Co-Optimization for Small Communities

Authors: Authors: Jesus Silva-Rodriguez, Xingpeng Li
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.01681
Pdf link: https://arxiv.org/pdf/2310.01681
Abstract The water-energy nexus encompasses the interdependencies between water and energy resources identifying the existing links between the production and distribution of these resources. Therefore, understanding the water-energy nexus is crucial for developing sustainable and integrated resource management approaches. This paper proposes a decentralized co-optimization model for a micro water-energy nexus system (MWEN), aiming to optimize the combined supply of both resources to end consumers. The approach respects the separate ownership and management of the water and energy sectors while bridging the gap between their optimized operations. An enhanced version of the alternating direction method of multipliers (ADMM) is proposed, the objective-based ADMM (OB-ADMM), which is able to robustly optimize each system independently towards a common objective, only sharing information about the power consumption of water management, providing privacy for each resource provider.

DynAMO: Multi-agent reinforcement learning for dynamic anticipatory mesh optimization with applications to hyperbolic conservation laws

Authors: Authors: Tarik Dzanic, Ketan Mittal, Dohyun Kim, Jiachen Yang, Socratis Petrides, Brendan Keith, Robert Anderson
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2310.01695
Pdf link: https://arxiv.org/pdf/2310.01695
Abstract We introduce DynAMO, a reinforcement learning paradigm for Dynamic Anticipatory Mesh Optimization. Adaptive mesh refinement is an effective tool for optimizing computational cost and solution accuracy in numerical methods for partial differential equations. However, traditional adaptive mesh refinement approaches for time-dependent problems typically rely only on instantaneous error indicators to guide adaptivity. As a result, standard strategies often require frequent remeshing to maintain accuracy. In the DynAMO approach, multi-agent reinforcement learning is used to discover new local refinement policies that can anticipate and respond to future solution states by producing meshes that deliver more accurate solutions for longer time intervals. By applying DynAMO to discontinuous Galerkin methods for the linear advection and compressible Euler equations in two dimensions, we demonstrate that this new mesh refinement paradigm can outperform conventional threshold-based strategies while also generalizing to different mesh sizes, remeshing and simulation times, and initial conditions.

RETRO: Reactive Trajectory Optimization for Real-Time Robot Motion Planning in Dynamic Environments

Authors: Authors: Apan Dastider, Hao Fang, Mingjie Lin
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.01738
Pdf link: https://arxiv.org/pdf/2310.01738
Abstract Reactive trajectory optimization for robotics presents formidable challenges, demanding the rapid generation of purposeful robot motion in complex and swiftly changing dynamic environments. While much existing research predominantly addresses robotic motion planning with predefined objectives, emerging problems in robotic trajectory optimization frequently involve dynamically evolving objectives and stochastic motion dynamics. However, effectively addressing such reactive trajectory optimization challenges for robot manipulators proves difficult due to inefficient, high-dimensional trajectory representations and a lack of consideration for time optimization. In response, we introduce a novel trajectory optimization framework called RETRO. RETRO employs adaptive optimization techniques that span both spatial and temporal dimensions. As a result, it achieves a remarkable computing complexity of $O(T^{2.4}) + O(Tn^{2})$, a significant improvement over the traditional application of DDP, which leads to a complexity of $O(n^{4})$ when reasonable time step sizes are used. To evaluate RETRO's performance in terms of error, we conducted a comprehensive analysis of its regret bounds, comparing it to an Oracle value function obtained through an Oracle trajectory optimization algorithm. Our analytical findings demonstrate that RETRO's total regret can be upper-bounded by a function of the chosen time step size. Moreover, our approach delivers smoothly optimized robot trajectories within the joint space, offering flexibility and adaptability for various tasks. It can seamlessly integrate task-specific requirements such as collision avoidance while maintaining real-time control rates. We validate the effectiveness of our framework through extensive simulations and real-world robot experiments in closed-loop manipulation scenarios.

Randomized Dimension Reduction with Statistical Guarantees

Authors: Authors: Yijun Dong
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.01739
Pdf link: https://arxiv.org/pdf/2310.01739
Abstract Large models and enormous data are essential driving forces of the unprecedented successes achieved by modern algorithms, especially in scientific computing and machine learning. Nevertheless, the growing dimensionality and model complexity, as well as the non-negligible workload of data pre-processing, also bring formidable costs to such successes in both computation and data aggregation. As the deceleration of Moore's Law slackens the cost reduction of computation from the hardware level, fast heuristics for expensive classical routines and efficient algorithms for exploiting limited data are increasingly indispensable for pushing the limit of algorithm potency. This thesis explores some of such algorithms for fast execution and efficient data utilization. From the computational efficiency perspective, we design and analyze fast randomized low-rank decomposition algorithms for large matrices based on "matrix sketching", which can be regarded as a dimension reduction strategy in the data space. These include the randomized pivoting-based interpolative and CUR decomposition discussed in Chapter 2 and the randomized subspace approximations discussed in Chapter 3. From the sample efficiency perspective, we focus on learning algorithms with various incorporations of data augmentation that improve generalization and distributional robustness provably. Specifically, Chapter 4 presents a sample complexity analysis for data augmentation consistency regularization where we view sample efficiency from the lens of dimension reduction in the function space. Then in Chapter 5, we introduce an adaptively weighted data augmentation consistency regularization algorithm for distributionally robust optimization with applications in medical image segmentation.

Linearization of ReLU Activation Function for Neural Network-Embedded Optimization:Optimal Day-Ahead Energy Scheduling

Authors: Authors: Cunzhi Zhao, Xingpeng Li
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.01758
Pdf link: https://arxiv.org/pdf/2310.01758
Abstract Neural networks have been widely applied in the power system area. They can be used for better predicting input information and modeling system performance with increased accuracy. In some applications such as battery degradation neural network-based microgrid day-ahead energy scheduling, the input features of the trained learning model are variables to be solved in optimization models that enforce limits on the output of the same learning model. This will create a neural network-embedded optimization problem; the use of nonlinear activation functions in the neural network will make such problems extremely hard to solve if not unsolvable. To address this emerging challenge, this paper investigated different methods for linearizing the nonlinear activation functions with a particular focus on the widely used rectified linear unit (ReLU) function. Four linearization methods tailored for the ReLU activation function are developed, analyzed and compared in this paper. Each method employs a set of linear constraints to replace the ReLU function, effectively linearizing the optimization problem, which can overcome the computational challenges associated with the nonlinearity of the neural network model. These proposed linearization methods provide valuable tools for effectively solving optimization problems that integrate neural network models with ReLU activation functions.

A simple connection from loss flatness to compressed representations in neural networks

Authors: Authors: Shirui Chen, Stefano Recanatesi, Eric Shea-Brown
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01770
Pdf link: https://arxiv.org/pdf/2310.01770
Abstract Deep neural networks' generalization capacity has been studied in a variety of ways, including at least two distinct categories of approach: one based on the shape of the loss landscape in parameter space, and the other based on the structure of the representation manifold in feature space (that is, in the space of unit activities). These two approaches are related, but they are rarely studied together and explicitly connected. Here, we present a simple analysis that makes such a connection. We show that, in the last phase of learning of deep neural networks, compression of the volume of the manifold of neural representations correlates with the flatness of the loss around the minima explored by ongoing parameter optimization. We show that this is predicted by a relatively simple mathematical relationship: loss flatness implies compression of neural representations. Our results build closely on prior work of \citet{ma_linear_2021}, which shows how flatness (i.e., small eigenvalues of the loss Hessian) develops in late phases of learning and lead to robustness to perturbations in network inputs. Moreover, we show there is no similarly direct connection between local dimensionality and sharpness, suggesting that this property may be controlled by different mechanisms than volume and hence may play a complementary role in neural representations. Overall, we advance a dual perspective on generalization in neural networks in both parameter and feature space.

STAMP: Differentiable Task and Motion Planning via Stein Variational Gradient Descent

Authors: Authors: Yewon Lee (1), Philip Huang (2), Krishna Murthy Jatavallabhula (3), Andrew Z. Li (1), Fabian Damken (1 and 4), Eric Heiden (5), Kevin Smith (3), Derek Nowrouzezahrai (6), Fabio Ramos (5 and 7), Florian Shkurti (1) ((1) University of Toronto, (2) Carnegie Mellon University, (3) Massachusetts Institute of Technology, (4) Technische Universitat Darmstadt, (5) NVIDIA, (6) McGill University, (7) University of Sydney)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01775
Pdf link: https://arxiv.org/pdf/2310.01775
Abstract Planning for many manipulation tasks, such as using tools or assembling parts, often requires both symbolic and geometric reasoning. Task and Motion Planning (TAMP) algorithms typically solve these problems by conducting a tree search over high-level task sequences while checking for kinematic and dynamic feasibility. While performant, most existing algorithms are highly inefficient as their time complexity grows exponentially with the number of possible actions and objects. Additionally, they only find a single solution to problems in which many feasible plans may exist. To address these limitations, we propose a novel algorithm called Stein Task and Motion Planning (STAMP) that leverages parallelization and differentiable simulation to efficiently search for multiple diverse plans. STAMP relaxes discrete-and-continuous TAMP problems into continuous optimization problems that can be solved using variational inference. Our algorithm builds upon Stein Variational Gradient Descent, a gradient-based variational inference algorithm, and parallelized differentiable physics simulators on the GPU to efficiently obtain gradients for inference. Further, we employ imitation learning to introduce action abstractions that reduce the inference problem to lower dimensions. We demonstrate our method on two TAMP problems and empirically show that STAMP is able to: 1) produce multiple diverse plans in parallel; and 2) search for plans more efficiently compared to existing TAMP baselines.

Comparative study of microgrid optimal scheduling under multi-optimization algorithm fusion

Authors: Authors: Hongyi Duan, Qingyang Li, Yuchen Li, Jianan Zhang, Yuming Xie
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01805
Pdf link: https://arxiv.org/pdf/2310.01805
Abstract As global attention on renewable and clean energy grows, the research and implementation of microgrids become paramount. This paper delves into the methodology of exploring the relationship between the operational and environmental costs of microgrids through multi-objective optimization models. By integrating various optimization algorithms like Genetic Algorithm, Simulated Annealing, Ant Colony Optimization, and Particle Swarm Optimization, we propose an integrated approach for microgrid optimization. Simulation results depict that these algorithms provide different dispatch results under economic and environmental dispatch, revealing distinct roles of diesel generators and micro gas turbines in microgrids. Overall, this study offers in-depth insights and practical guidance for microgrid design and operation.

Improvement and Enhancement of YOLOv5 Small Target Recognition Based on Multi-module Optimization

Authors: Authors: Qingyang Li, Yuchen Li, Hongyi Duan, JiaLiang Kang, Jianan Zhang, Xueqian Gan, Ruotong Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01806
Pdf link: https://arxiv.org/pdf/2310.01806
Abstract In this paper, the limitations of YOLOv5s model on small target detection task are deeply studied and improved. The performance of the model is successfully enhanced by introducing GhostNet-based convolutional module, RepGFPN-based Neck module optimization, CA and Transformer's attention mechanism, and loss function improvement using NWD. The experimental results validate the positive impact of these improvement strategies on model precision, recall and mAP. In particular, the improved model shows significant superiority in dealing with complex backgrounds and tiny targets in real-world application tests. This study provides an effective optimization strategy for the YOLOv5s model on small target detection, and lays a solid foundation for future related research and applications.

AutoLoRa: A Parameter-Free Automated Robust Fine-Tuning Framework

Authors: Authors: Xilie Xu, Jingfeng Zhang, Mohan Kankanhalli
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.01818
Pdf link: https://arxiv.org/pdf/2310.01818
Abstract Robust Fine-Tuning (RFT) is a low-cost strategy to obtain adversarial robustness in downstream applications, without requiring a lot of computational resources and collecting significant amounts of data. This paper uncovers an issue with the existing RFT, where optimizing both adversarial and natural objectives through the feature extractor (FE) yields significantly divergent gradient directions. This divergence introduces instability in the optimization process, thereby hindering the attainment of adversarial robustness and rendering RFT highly sensitive to hyperparameters. To mitigate this issue, we propose a low-rank (LoRa) branch that disentangles RFT into two distinct components: optimizing natural objectives via the LoRa branch and adversarial objectives via the FE. Besides, we introduce heuristic strategies for automating the scheduling of the learning rate and the scalars of loss terms. Extensive empirical evaluations demonstrate that our proposed automated RFT disentangled via the LoRa branch (AutoLoRa) achieves new state-of-the-art results across a range of downstream tasks. AutoLoRa holds significant practical utility, as it automatically converts a pre-trained FE into an adversarially robust model for downstream tasks without the need for searching hyperparameters.

Adaptive Hybrid Model for Enhanced Stock Market Predictions Using Improved VMD and Stacked Informer

Authors: Authors: Jianan Zhang, Hongyi Duan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01884
Pdf link: https://arxiv.org/pdf/2310.01884
Abstract This paper introduces an innovative adaptive hybrid model for stock market predictions, leveraging the capabilities of an enhanced Variational Mode Decomposition (VMD), Feature Engineering (FE), and stacked Informer integrated with an adaptive loss function. Through rigorous experimentation, the proposed model, termed Adam+GC+enhanced informer (We name it VMGCformer), demonstrates significant proficiency in addressing the intricate dynamics and volatile nature of stock market data. Experimental results, derived from multiple benchmark datasets, underscore the model's superiority in terms of prediction accuracy, responsiveness, and generalization capabilities over traditional and other hybrid models. The research further highlights potential avenues for optimization and introduces future directions to enhance predictive modeling, especially for small enterprises and feature engineering.

Approximating Voltage Stability Boundary Under High Variability of Renewables Using Differential Geometry

Authors: Authors: Dan Wu, Franz-Erich Wolter, Sijia Geng
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.01911
Pdf link: https://arxiv.org/pdf/2310.01911
Abstract This paper proposes a novel method rooted in differential geometry to approximate the voltage stability boundary of power systems under high variability of renewable generation. We extract intrinsic geometric information of the power flow solution manifold at a given operating point. Specifically, coefficients of the Levi-Civita connection are constructed to approximate the geodesics of the manifold starting at an operating point along any interested directions that represent possible fluctuations in generation and load. Then, based on the geodesic approximation, we further predict the voltage collapse point by solving a few univariate quadratic equations. Conventional methods mostly rely on either expensive numerical continuation at specified directions or numerical optimization. Instead, the proposed approach constructs the Christoffel symbols of the second kind from the Riemannian metric tensors to characterize the complete local geometry which is then extended to the proximity of the stability boundary with efficient computations. As a result, this approach is suitable to handle high-dimensional variability in operating points due to the large-scale integration of renewable resources. Using various case studies, we demonstrate the advantages of the proposed method and provide additional insights and discussions on voltage stability in renewable-rich power systems.

PyHexTop: a compact Python code for topology optimization using hexagonal elements

Authors: Authors: Aditi Agarwal, Anupam Saxena, Prabhat Kumar
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2310.01968
Pdf link: https://arxiv.org/pdf/2310.01968
Abstract Python serves as an open-source and cost-effective alternative to the MATLAB programming language. This paper introduces a concise topology optimization Python code, named PyHexTop," primarily intended for educational purposes. Code employs hexagonal elements to parameterize design domains as such elements provide checkerboard-free optimized design naturally. PyHexTop is developed based on the HoneyTop90" MATLAB code~\cite{kumar2023honeytop90} and uses the NumPy and SciPy libraries. Code is straightforward and easily comprehensible, proving a helpful tool that can help people new in the topology optimization field to learn and explore. PyHexTop is specifically tailored to address compliance minimization with specified volume constraints. The paper provides a detailed explanation of the code for solving the MBB design and extensions to solve problems with varying boundary and force conditions. The code is publicly shared at: \url{https://github.com/PrabhatIn/PyHexTop.}

UAV Swarm-enabled Collaborative Secure Relay Communications with Time-domain Colluding Eavesdropper

Authors: Authors: Chuang Zhang, Geng Sun, Qingqing Wu, Jiahui Li, Shuang Liang, Dusit Niyato, Victor C.M. Leung
Subjects: Networking and Internet Architecture (cs.NI); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2310.01980
Pdf link: https://arxiv.org/pdf/2310.01980
Abstract Unmanned aerial vehicles (UAVs) as aerial relays are practically appealing for assisting Internet of Things (IoT) network. In this work, we aim to utilize the UAV swarm to assist the secure communication between the micro base station (MBS) equipped with the planar array antenna (PAA) and the IoT terminal devices by collaborative beamforming (CB), so as to counteract the effects of collusive eavesdropping attacks in time-domain. Specifically, we formulate a UAV swarm-enabled secure relay multi-objective optimization problem (US2RMOP) for simultaneously maximizing the achievable sum rate of associated IoT terminal devices, minimizing the achievable sum rate of the eavesdropper and minimizing the energy consumption of UAV swarm, by jointly optimizing the excitation current weights of both MBS and UAV swarm, the selection of the UAV receiver, the position of UAVs and user association order of IoT terminal devices. Furthermore, the formulated US2RMOP is proved to be a non-convex, NP-hard and large-scale optimization problem. Therefore, we propose an improved multi-objective grasshopper algorithm (IMOGOA) with some specific designs to address the problem. Simulation results exhibit the effectiveness of the proposed UAV swarm-enabled collaborative secure relay strategy and demonstrate the superiority of IMOGOA.

DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

Authors: Authors: Aochuan Chen, Yimeng Zhang, Jinghan Jia, James Diffenderfer, Jiancheng Liu, Konstantinos Parasyris, Yihua Zhang, Zheng Zhang, Bhavya Kailkhura, Sijia Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.02025
Pdf link: https://arxiv.org/pdf/2310.02025
Abstract Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems when first-order (FO) information is difficult or impossible to obtain. However, the scalability of ZO optimization remains an open problem: Its use has primarily been limited to relatively small-scale ML problems, such as sample-wise adversarial attack generation. To our best knowledge, no prior work has demonstrated the effectiveness of ZO optimization in training deep neural networks (DNNs) without a significant decrease in performance. To overcome this roadblock, we develop DeepZero, a principled ZO deep learning (DL) framework that can scale ZO optimization to DNN training from scratch through three primary innovations. First, we demonstrate the advantages of coordinate-wise gradient estimation (CGE) over randomized vector-wise gradient estimation in training accuracy and computational efficiency. Second, we propose a sparsity-induced ZO training protocol that extends the model pruning methodology using only finite differences to explore and exploit the sparse DL prior in CGE. Third, we develop the methods of feature reuse and forward parallelization to advance the practical implementations of ZO training. Our extensive experiments show that DeepZero achieves state-of-the-art (SOTA) accuracy on ResNet-20 trained on CIFAR-10, approaching FO training performance for the first time. Furthermore, we show the practical utility of DeepZero in applications of certified adversarial defense and DL-based partial differential equation error correction, achieving 10-20% improvement over SOTA. We believe our results will inspire future research on scalable ZO optimization and contribute to advancing DL with black box.

View-Independent Adjoint Light Tracing for Lighting Design Optimization

Authors: Authors: Lukas Lipp, David Hahn, Pierre Ecormier-Nocca, Florian Rist, Michael Wimmer
Subjects: Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2310.02043
Pdf link: https://arxiv.org/pdf/2310.02043
Abstract Controlling light is a central element when composing a scene, enabling artistic expression, as well as the design of comfortable living spaces. In contrast to previous camera-based inverse rendering approaches, we introduce a novel method for interactive, view-independent differentiable global illumination. Our method first performs a forward light-tracing pass, starting from the light sources and storing the resulting radiance field on the scene geometry, representing specular highlights via hemi-spherical harmonics. We then evaluate an objective function on the entire radiance data and propagate derivatives back to the lighting parameters by formulating a novel, analytical adjoint light-tracing step. Our method builds on GPU ray tracing, which allows us to optimize all lighting parameters at interactive rates, even for complex geometry. Instead of specifying optimization targets as view-specific images, our method allows us to optimize the lighting of an entire scene to match either baked illumination (e.g., lightmaps), regulatory lighting requirements for work spaces, or artistic sketches drawn directly on the geometry. This approach provides a more direct and intuitive user experience for designers. We visualize our adjoint gradients and compare them to image-based state-of-the-art differentiable rendering methods. We also compare the convergence behavior of various optimization algorithms when using our gradient data vs. image-based differentiable rendering methods. Qualitative comparisons with real-world scenes underline the practical applicability of our method.

De Novo Drug Design with Joint Transformers

Authors: Authors: Adam Izdebski, Ewelina Weglarz-Tomczak, Ewa Szczurek, Jakub M. Tomczak
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.02066
Pdf link: https://arxiv.org/pdf/2310.02066
Abstract De novo drug design requires simultaneously generating novel molecules outside of training data and predicting their target properties, making it a hard task for generative models. To address this, we propose Joint Transformer that combines a Transformer decoder, a Transformer encoder, and a predictor in a joint generative model with shared weights. We show that training the model with a penalized log-likelihood objective results in state-of-the-art performance in molecule generation, while decreasing the prediction error on newly sampled molecules, as compared to a fine-tuned decoder-only Transformer, by 42%. Finally, we propose a probabilistic black-box optimization algorithm that employs Joint Transformer to generate novel molecules with improved target properties, as compared to the training data, outperforming other SMILES-based optimization methods in de novo drug design.

TOaCNN: Adaptive Convolutional Neural Network for Multidisciplinary Topology Optimization

Authors: Authors: Khaish Singh Chadha, Prabhat Kumar
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2310.02069
Pdf link: https://arxiv.org/pdf/2310.02069
Abstract This paper presents an adaptive convolutional neural network (CNN) architecture that can automate diverse topology optimization (TO) problems having different underlying physics. The architecture uses the encoder-decoder networks with dense layers in the middle which includes an additional adaptive layer to capture complex geometrical features. The network is trained using the dataset obtained from the three open-source TO codes involving different physics. The robustness and success of the presented adaptive CNN are demonstrated on compliance minimization problems with constant and design-dependent loads and material bulk modulus optimization. The architecture takes the user's input of the volume fraction. It instantly generates optimized designs resembling their counterparts obtained via open-source TO codes with negligible performance and volume fraction error.

Stochastic Gradient Descent with Preconditioned Polyak Step-size

Authors: Authors: Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Martin Takáč
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.02093
Pdf link: https://arxiv.org/pdf/2310.02093
Abstract Stochastic Gradient Descent (SGD) is one of the many iterative optimization methods that are widely used in solving machine learning problems. These methods display valuable properties and attract researchers and industrial machine learning engineers with their simplicity. However, one of the weaknesses of this type of methods is the necessity to tune learning rate (step-size) for every loss function and dataset combination to solve an optimization problem and get an efficient performance in a given time budget. Stochastic Gradient Descent with Polyak Step-size (SPS) is a method that offers an update rule that alleviates the need of fine-tuning the learning rate of an optimizer. In this paper, we propose an extension of SPS that employs preconditioning techniques, such as Hutchinson's method, Adam, and AdaGrad, to improve its performance on badly scaled and/or ill-conditioned datasets.

Adaptive Gait Modeling and Optimization for Principally Kinematic Systems

Authors: Authors: Siming Deng, Noah J. Cowan, Brian A. Bittner
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.02141
Pdf link: https://arxiv.org/pdf/2310.02141
Abstract Robotic adaptation to unanticipated operating conditions is crucial to achieving persistence and robustness in complex real world settings. For a wide range of cutting-edge robotic systems, such as micro- and nano-scale robots, soft robots, medical robots, and bio-hybrid robots, it is infeasible to anticipate the operating environment a priori due to complexities that arise from numerous factors including imprecision in manufacturing, chemo-mechanical forces, and poorly understood contact mechanics. Drawing inspiration from data-driven modeling, geometric mechanics (or gauge theory), and adaptive control, we employ an adaptive system identification framework and demonstrate its efficacy in enhancing the performance of principally kinematic locomotors (those governed by Rayleigh dissipation or zero momentum conservation). We showcase the capability of the adaptive model to efficiently accommodate varying terrains and iteratively modified behaviors within a behavior optimization framework. This provides both the ability to improve fundamental behaviors and perform motion tracking to precision. Notably, we are capable of optimizing the gaits of the Purcell swimmer using approximately 10 cycles per link, which for the nine-link Purcell swimmer provides a factor of ten improvement in optimization speed over the state of the art. Beyond simply a computational speed up, this ten-fold improvement may enable this method to be successfully deployed for in-situ behavior refinement, injury recovery, and terrain adaptation, particularly in domains where simulations provide poor guides for the real world.

Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization

Authors: Authors: Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2310.02170
Pdf link: https://arxiv.org/pdf/2310.02170
Abstract Large language model (LLM) agents have been shown effective on a wide range of tasks, and by ensembling multiple LLM agents, their performances could be further improved. Existing approaches employ a fixed set of agents to interact with each other in a static architecture, which limits their generalizability to various tasks and requires strong human prior in designing these agents. In this work, we propose to construct a strategic team of agents communicating in a dynamic interaction architecture based on the task query. Specifically, we build a framework named Dynamic LLM-Agent Network ($\textbf{DyLAN}$) for LLM-agent collaboration on complicated tasks like reasoning and code generation. DyLAN enables agents to interact for multiple rounds in a dynamic architecture with inference-time agent selection and an early-stopping mechanism to improve performance and efficiency. We further design an automatic agent team optimization algorithm based on an unsupervised metric termed $\textit{Agent Importance Score}$, enabling the selection of best agents based on the contribution each agent makes. Empirically, we demonstrate that DyLAN performs well in both reasoning and code generation tasks with reasonable computational cost. DyLAN achieves 13.0% and 13.3% improvement on MATH and HumanEval, respectively, compared to a single execution on GPT-35-turbo. On specific subjects of MMLU, agent team optimization in DyLAN increases accuracy by up to 25.0%.

Joint Optimization of Charging Infrastructure Placement and Operational Schedules for a Fleet of Battery Electric Trucks

Authors: Authors: Juan Pablo Bertucci, Theo Hofman, Mauro Salazar
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.02181
Pdf link: https://arxiv.org/pdf/2310.02181
Abstract This paper examines the challenges and requirements for transitioning logistic distribution networks to electric fleets. To maintain their current operations, fleet operators need a clear understanding of the charging infrastructure required and its relationship to existing power grid limitations and fleet schedules. In this context, this paper presents a modeling framework to optimize the charging infrastructure and charging schedules for a logistic distribution network in a joint fashion. Specifically, we cast the joint infrastructure design and operational scheduling problem as a mixed-integer linear program that can be solved with off-the-shelf optimization algorithms providing global optimality guarantees. For a case study in the Netherlands, we assess the impact of different parameters in our optimization problem, specifically, the allowed deviation from existing operations with conventional diesel trucks and the cost factor for daily peak energy usage. We examine the effects on infrastructure design and power requirements, comparing our co-design algorithm with planned infrastructure solutions. The results indicate that current charging and electric machine technologies for trucks can perform the itineraries of conventional trucks for our case study, but to maintain critical time requirements and navigate grid congestion co-design can have a significant impact in reducing total cost of ownership (average 3.51% decrease in total costs compared to rule-based design solutions).

Optimum Monitoring of Heterogeneous Continuous Time Markov Chains

Authors: Authors: Nail Akar, Sennur Ulukus
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2310.02223
Pdf link: https://arxiv.org/pdf/2310.02223
Abstract We study a remote monitoring system in which a collection of ergodic, aperiodic, mutually independent, and heterogeneous continuous time Markov chain (CTMC) based information sources is considered. In this system, a common remote monitor samples the states of the individual CTMCs according to a Poisson process with possibly different per-source sampling rates, in order to maintain remote estimates of the states of each of the sources. Three information freshness models are considered to quantify the accuracy of the remote estimates: fresh when equal (FWE), fresh when sampled (FWS) and fresh when close (FWC). For each of these freshness models, closed-form expressions are derived for mean information freshness for a given source. Using these expressions, optimum sampling rates for all sources are obtained so as to maximize the weighted sum freshness of the monitoring system under an overall sampling rate constraint. This optimization problem possesses a water-filling solution with quadratic worst case computational complexity in the number of information sources. Numerical examples are provided to validate the effectiveness of the optimum sampler in comparison to several baseline sampling policies.

Keyword: adam

Adaptive Hybrid Model for Enhanced Stock Market Predictions Using Improved VMD and Stacked Informer

Authors: Authors: Jianan Zhang, Hongyi Duan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01884
Pdf link: https://arxiv.org/pdf/2310.01884
Abstract This paper introduces an innovative adaptive hybrid model for stock market predictions, leveraging the capabilities of an enhanced Variational Mode Decomposition (VMD), Feature Engineering (FE), and stacked Informer integrated with an adaptive loss function. Through rigorous experimentation, the proposed model, termed Adam+GC+enhanced informer (We name it VMGCformer), demonstrates significant proficiency in addressing the intricate dynamics and volatile nature of stock market data. Experimental results, derived from multiple benchmark datasets, underscore the model's superiority in terms of prediction accuracy, responsiveness, and generalization capabilities over traditional and other hybrid models. The research further highlights potential avenues for optimization and introduces future directions to enhance predictive modeling, especially for small enterprises and feature engineering.

Stochastic Gradient Descent with Preconditioned Polyak Step-size

Authors: Authors: Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Martin Takáč
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.02093
Pdf link: https://arxiv.org/pdf/2310.02093
Abstract Stochastic Gradient Descent (SGD) is one of the many iterative optimization methods that are widely used in solving machine learning problems. These methods display valuable properties and attract researchers and industrial machine learning engineers with their simplicity. However, one of the weaknesses of this type of methods is the necessity to tune learning rate (step-size) for every loss function and dataset combination to solve an optimization problem and get an efficient performance in a given time budget. Stochastic Gradient Descent with Polyak Step-size (SPS) is a method that offers an update rule that alleviates the need of fine-tuning the learning rate of an optimizer. In this paper, we propose an extension of SPS that employs preconditioning techniques, such as Hutchinson's method, Adam, and AdaGrad, to improve its performance on badly scaled and/or ill-conditioned datasets.

Keyword: gradient

Enhancing Secrecy in UAV RSMA Networks: Deep Unfolding Meets Deep Reinforcement Learning

Authors: Authors: Abuzar B. M. Adam, Mohammed A. M. Elhassan
Subjects: Cryptography and Security (cs.CR); Emerging Technologies (cs.ET); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.01437
Pdf link: https://arxiv.org/pdf/2310.01437
Abstract In this paper, we consider the maximization of the secrecy rate in multiple unmanned aerial vehicles (UAV) rate-splitting multiple access (RSMA) network. A joint beamforming, rate allocation, and UAV trajectory optimization problem is formulated which is nonconvex. Hence, the problem is transformed into a Markov decision problem and a novel multiagent deep reinforcement learning (DRL) framework is designed. The proposed framework (named DUN-DRL) combines deep unfolding to design beamforming and rate allocation, data-driven to design the UAV trajectory, and deep deterministic policy gradient (DDPG) for the learning procedure. The proposed DUN-DRL have shown great performance and outperformed other DRL-based methods in the literature.

FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models

Authors: Authors: Jingwei Sun, Ziyue Xu, Hongxu Yin, Dong Yang, Daguang Xu, Yiran Chen, Holger R. Roth
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01467
Pdf link: https://arxiv.org/pdf/2310.01467
Abstract Pre-trained language models (PLM) have revolutionized the NLP landscape, achieving stellar performances across diverse tasks. These models, while benefiting from vast training data, often require fine-tuning on specific data to cater to distinct downstream tasks. However, this data adaptation process has inherent security and privacy concerns, primarily when leveraging user-generated, device-residing data. Federated learning (FL) provides a solution, allowing collaborative model fine-tuning without centralized data collection. However, applying FL to finetune PLMs is hampered by challenges, including restricted model parameter access, high computational requirements, and communication overheads. This paper introduces Federated Black-box Prompt Tuning (FedBPT), a framework designed to address these challenges. FedBPT does not require the clients to access the model parameters. By focusing on training optimal prompts and utilizing gradient-free optimization methods, FedBPT reduces the number of exchanged variables, boosts communication efficiency, and minimizes computational and storage costs. Experiments highlight the framework's ability to drastically cut communication and memory costs while maintaining competitive performance. Ultimately, FedBPT presents a promising solution for efficient, privacy-preserving fine-tuning of PLM in the age of large language models.

Primal-dual hybrid gradient algorithms for computing time-implicit Hamilton-Jacobi equations

Authors: Authors: Tingwei Meng, Wenbo Hao, Siting Liu, Stanley J. Osher, Wuchen Li
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.01605
Pdf link: https://arxiv.org/pdf/2310.01605
Abstract Hamilton-Jacobi (HJ) partial differential equations (PDEs) have diverse applications spanning physics, optimal control, game theory, and imaging sciences. This research introduces a first-order optimization-based technique for HJ PDEs, which formulates the time-implicit update of HJ PDEs as saddle point problems. We remark that the saddle point formulation for HJ equations is aligned with the primal-dual formulation of optimal transport and potential mean-field games (MFGs). This connection enables us to extend MFG techniques and design numerical schemes for solving HJ PDEs. We employ the primal-dual hybrid gradient (PDHG) method to solve the saddle point problems, benefiting from the simple structures that enable fast computations in updates. Remarkably, the method caters to a broader range of Hamiltonians, encompassing non-smooth and spatiotemporally dependent cases. The approach's effectiveness is verified through various numerical examples in both one-dimensional and two-dimensional examples, such as quadratic and $L^1$ Hamiltonians with spatial and time dependence.

Intractability of Learning the Discrete Logarithm with Gradient-Based Methods

Authors: Authors: Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhibek Kadyrsizova, Zhenisbek Assylbekov
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.01611
Pdf link: https://arxiv.org/pdf/2310.01611
Abstract The discrete logarithm problem is a fundamental challenge in number theory with significant implications for cryptographic protocols. In this paper, we investigate the limitations of gradient-based methods for learning the parity bit of the discrete logarithm in finite cyclic groups of prime order. Our main result, supported by theoretical analysis and empirical verification, reveals the concentration of the gradient of the loss function around a fixed point, independent of the logarithm's base used. This concentration property leads to a restricted ability to learn the parity bit efficiently using gradient-based methods, irrespective of the complexity of the network architecture being trained. Our proof relies on Boas-Bellman inequality in inner product spaces and it involves establishing approximate orthogonality of discrete logarithm's parity bit functions through the spectral norm of certain matrices. Empirical experiments using a neural network-based approach further verify the limitations of gradient-based learning, demonstrating the decreasing success rate in predicting the parity bit as the group order increases.

From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression

Authors: Authors: Xuxing Chen, Krishnakumar Balasubramanian, Promit Ghosal, Bhavya Agrawalla
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Optimization and Control (math.OC); Probability (math.PR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.01687
Pdf link: https://arxiv.org/pdf/2310.01687
Abstract We conduct a comprehensive investigation into the dynamics of gradient descent using large-order constant step-sizes in the context of quadratic regression models. Within this framework, we reveal that the dynamics can be encapsulated by a specific cubic map, naturally parameterized by the step-size. Through a fine-grained bifurcation analysis concerning the step-size parameter, we delineate five distinct training phases: (1) monotonic, (2) catapult, (3) periodic, (4) chaotic, and (5) divergent, precisely demarcating the boundaries of each phase. As illustrations, we provide examples involving phase retrieval and two-layer neural networks employing quadratic activation functions and constant outer-layers, utilizing orthogonal training data. Our simulations indicate that these five phases also manifest with generic non-orthogonal data. We also empirically investigate the generalization performance when training in the various non-monotonic (and non-divergent) phases. In particular, we observe that performing an ergodic trajectory averaging stabilizes the test error in non-monotonic (and non-divergent) phases.

Blending Imitation and Reinforcement Learning for Robust Policy Improvement

Authors: Authors: Xuefeng Liu, Takuma Yoneda, Rick L. Stevens, Matthew R. Walter, Yuxin Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.01737
Pdf link: https://arxiv.org/pdf/2310.01737
Abstract While reinforcement learning (RL) has shown promising performance, its sample complexity continues to be a substantial hurdle, restricting its broader application across a variety of domains. Imitation learning (IL) utilizes oracles to improve sample efficiency, yet it is often constrained by the quality of the oracles deployed. which actively interleaves between IL and RL based on an online estimate of their performance. RPI draws on the strengths of IL, using oracle queries to facilitate exploration, an aspect that is notably challenging in sparse-reward RL, particularly during the early stages of learning. As learning unfolds, RPI gradually transitions to RL, effectively treating the learned policy as an improved oracle. This algorithm is capable of learning from and improving upon a diverse set of black-box oracles. Integral to RPI are Robust Active Policy Selection (RAPS) and Robust Policy Gradient (RPG), both of which reason over whether to perform state-wise imitation from the oracles or learn from its own value function when the learner's performance surpasses that of the oracles in a specific state. Empirical evaluations and theoretical analysis validate that RPI excels in comparison to existing state-of-the-art methodologies, demonstrating superior performance across various benchmark domains.

Exploring Counterfactual Alignment Loss towards Human-centered AI

Authors: Authors: Mingzhou Liu, Xinwei Sun, Ching-Wen Lee, Yu Qiao, Yizhou Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.01766
Pdf link: https://arxiv.org/pdf/2310.01766
Abstract Deep neural networks have demonstrated impressive accuracy in supervised learning tasks. However, their lack of transparency makes it hard for humans to trust their results, especially in safe-critic domains such as healthcare. To address this issue, recent explanation-guided learning approaches proposed to align the gradient-based attention map to image regions annotated by human experts, thereby obtaining an intrinsically human-centered model. However, the attention map these methods are based on may fail to causally attribute the model predictions, thus compromising their validity for alignment. To address this issue, we propose a novel human-centered framework based on counterfactual generation. In particular, we utilize the counterfactual generation's ability for causal attribution to introduce a novel loss called the CounterFactual Alignment (CF-Align) loss. This loss guarantees that the features attributed by the counterfactual generation for the classifier align with the human annotations. To optimize the proposed loss that entails a counterfactual generation with an implicit function form, we leverage the implicit function theorem for backpropagation. Our method is architecture-agnostic and, therefore can be applied to any neural network. We demonstrate the effectiveness of our method on a lung cancer diagnosis dataset, showcasing faithful alignment to humans.

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

Authors: Authors: Nuoya Xiong, Lijun Ding, Simon S. Du
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.01769
Pdf link: https://arxiv.org/pdf/2310.01769
Abstract This paper rigorously shows how over-parameterization changes the convergence behaviors of gradient descent (GD) for the matrix sensing problem, where the goal is to recover an unknown low-rank ground-truth matrix from near-isotropic linear measurements. First, we consider the symmetric setting with the symmetric parameterization where $M^* \in \mathbb{R}^{n \times n}$ is a positive semi-definite unknown matrix of rank $r \ll n$, and one uses a symmetric parameterization $XX^\top$ to learn $M^$. Here $X \in \mathbb{R}^{n \times k}$ with $k > r$ is the factor matrix. We give a novel $\Omega (1/T^2)$ lower bound of randomly initialized GD for the over-parameterized case ($k >r$) where $T$ is the number of iterations. This is in stark contrast to the exact-parameterization scenario ($k=r$) where the convergence rate is $\exp (-\Omega (T))$. Next, we study asymmetric setting where $M^ \in \mathbb{R}^{n_1 \times n_2}$ is the unknown matrix of rank $r \ll \min{n_1,n_2}$, and one uses an asymmetric parameterization $FG^\top$ to learn $M^*$ where $F \in \mathbb{R}^{n_1 \times k}$ and $G \in \mathbb{R}^{n_2 \times k}$. Building on prior work, we give a global exact convergence result of randomly initialized GD for the exact-parameterization case ($k=r$) with an $\exp (-\Omega(T))$ rate. Furthermore, we give the first global exact convergence result for the over-parameterization case ($k>r$) with an $\exp(-\Omega(\alpha^2 T))$ rate where $\alpha$ is the initialization scale. This linear convergence result in the over-parameterization case is especially significant because one can apply the asymmetric parameterization to the symmetric setting to speed up from $\Omega (1/T^2)$ to linear convergence. On the other hand, we propose a novel method that only modifies one step of GD and obtains a convergence rate independent of $\alpha$, recovering the rate in the exact-parameterization case.

STAMP: Differentiable Task and Motion Planning via Stein Variational Gradient Descent

Authors: Authors: Yewon Lee (1), Philip Huang (2), Krishna Murthy Jatavallabhula (3), Andrew Z. Li (1), Fabian Damken (1 and 4), Eric Heiden (5), Kevin Smith (3), Derek Nowrouzezahrai (6), Fabio Ramos (5 and 7), Florian Shkurti (1) ((1) University of Toronto, (2) Carnegie Mellon University, (3) Massachusetts Institute of Technology, (4) Technische Universitat Darmstadt, (5) NVIDIA, (6) McGill University, (7) University of Sydney)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01775
Pdf link: https://arxiv.org/pdf/2310.01775
Abstract Planning for many manipulation tasks, such as using tools or assembling parts, often requires both symbolic and geometric reasoning. Task and Motion Planning (TAMP) algorithms typically solve these problems by conducting a tree search over high-level task sequences while checking for kinematic and dynamic feasibility. While performant, most existing algorithms are highly inefficient as their time complexity grows exponentially with the number of possible actions and objects. Additionally, they only find a single solution to problems in which many feasible plans may exist. To address these limitations, we propose a novel algorithm called Stein Task and Motion Planning (STAMP) that leverages parallelization and differentiable simulation to efficiently search for multiple diverse plans. STAMP relaxes discrete-and-continuous TAMP problems into continuous optimization problems that can be solved using variational inference. Our algorithm builds upon Stein Variational Gradient Descent, a gradient-based variational inference algorithm, and parallelized differentiable physics simulators on the GPU to efficiently obtain gradients for inference. Further, we employ imitation learning to introduce action abstractions that reduce the inference problem to lower dimensions. We demonstrate our method on two TAMP problems and empirically show that STAMP is able to: 1) produce multiple diverse plans in parallel; and 2) search for plans more efficiently compared to existing TAMP baselines.

AutoLoRa: A Parameter-Free Automated Robust Fine-Tuning Framework

Authors: Authors: Xilie Xu, Jingfeng Zhang, Mohan Kankanhalli
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.01818
Pdf link: https://arxiv.org/pdf/2310.01818
Abstract Robust Fine-Tuning (RFT) is a low-cost strategy to obtain adversarial robustness in downstream applications, without requiring a lot of computational resources and collecting significant amounts of data. This paper uncovers an issue with the existing RFT, where optimizing both adversarial and natural objectives through the feature extractor (FE) yields significantly divergent gradient directions. This divergence introduces instability in the optimization process, thereby hindering the attainment of adversarial robustness and rendering RFT highly sensitive to hyperparameters. To mitigate this issue, we propose a low-rank (LoRa) branch that disentangles RFT into two distinct components: optimizing natural objectives via the LoRa branch and adversarial objectives via the FE. Besides, we introduce heuristic strategies for automating the scheduling of the learning rate and the scalars of loss terms. Extensive empirical evaluations demonstrate that our proposed automated RFT disentangled via the LoRa branch (AutoLoRa) achieves new state-of-the-art results across a range of downstream tasks. AutoLoRa holds significant practical utility, as it automatically converts a pre-trained FE into an adversarially robust model for downstream tasks without the need for searching hyperparameters.

Trainable Noise Model as an XAI evaluation method: application on Sobol for remote sensing image segmentation

Authors: Authors: Hossein Shreim, Abdul Karim Gizzini, Ali J. Ghandour
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.01828
Pdf link: https://arxiv.org/pdf/2310.01828
Abstract eXplainable Artificial Intelligence (XAI) has emerged as an essential requirement when dealing with mission-critical applications, ensuring transparency and interpretability of the employed black box AI models. The significance of XAI spans various domains, from healthcare to finance, where understanding the decision-making process of deep learning algorithms is essential. Most AI-based computer vision models are often black boxes; hence, providing explainability of deep neural networks in image processing is crucial for their wide adoption and deployment in medical image analysis, autonomous driving, and remote sensing applications. Recently, several XAI methods for image classification tasks have been introduced. On the contrary, image segmentation has received comparatively less attention in the context of explainability, although it is a fundamental task in computer vision applications, especially in remote sensing. Only some research proposes gradient-based XAI algorithms for image segmentation. This paper adapts the recent gradient-free Sobol XAI method for semantic segmentation. To measure the performance of the Sobol method for segmentation, we propose a quantitative XAI evaluation method based on a learnable noise model. The main objective of this model is to induce noise on the explanation maps, where higher induced noise signifies low accuracy and vice versa. A benchmark analysis is conducted to evaluate and compare performance of three XAI methods, including Seg-Grad-CAM, Seg-Grad-CAM++ and Seg-Sobol using the proposed noise-based evaluation technique. This constitutes the first attempt to run and evaluate XAI methods using high-resolution satellite images.

Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data

Authors: Authors: Xuran Meng, Difan Zou, Yuan Cao
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.01975
Pdf link: https://arxiv.org/pdf/2310.01975
Abstract Modern deep learning models are usually highly over-parameterized so that they can overfit the training data. Surprisingly, such overfitting neural networks can usually still achieve high prediction accuracy. To study this "benign overfitting" phenomenon, a line of recent works has theoretically studied the learning of linear models and two-layer neural networks. However, most of these analyses are still limited to the very simple learning problems where the Bayes-optimal classifier is linear. In this work, we investigate a class of XOR-type classification tasks with label-flipping noises. We show that, under a certain condition on the sample complexity and signal-to-noise ratio, an over-parameterized ReLU CNN trained by gradient descent can achieve near Bayes-optimal accuracy. Moreover, we also establish a matching lower bound result showing that when the previous condition is not satisfied, the prediction accuracy of the obtained CNN is an absolute constant away from the Bayes-optimal rate. Our result demonstrates that CNNs have a remarkable capacity to efficiently learn XOR problems, even in the presence of highly correlated features.

Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

Authors: Authors: Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Rätsch, Hadi Daneshmand
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.02012
Pdf link: https://arxiv.org/pdf/2310.02012
Abstract Normalization layers are one of the key building blocks for deep neural networks. Several theoretical studies have shown that batch normalization improves the signal propagation, by avoiding the representations from becoming collinear across the layers. However, results on mean-field theory of batch normalization also conclude that this benefit comes at the expense of exploding gradients in depth. Motivated by these two aspects of batch normalization, in this study we pose the following question: "Can a batch-normalized network keep the optimal signal propagation properties, but avoid exploding gradients?" We answer this question in the affirmative by giving a particular construction of an Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded gradients at any depth. Based on Weingarten calculus, we develop a rigorous and non-asymptotic theory for this constructed MLP that gives a precise characterization of forward signal propagation, while proving that gradients remain bounded for linearly independent input samples, which holds in most practical settings. Inspired by our theory, we also design an activation shaping scheme that empirically achieves the same properties for certain non-linear activations.

DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

Authors: Authors: Aochuan Chen, Yimeng Zhang, Jinghan Jia, James Diffenderfer, Jiancheng Liu, Konstantinos Parasyris, Yihua Zhang, Zheng Zhang, Bhavya Kailkhura, Sijia Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.02025
Pdf link: https://arxiv.org/pdf/2310.02025
Abstract Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems when first-order (FO) information is difficult or impossible to obtain. However, the scalability of ZO optimization remains an open problem: Its use has primarily been limited to relatively small-scale ML problems, such as sample-wise adversarial attack generation. To our best knowledge, no prior work has demonstrated the effectiveness of ZO optimization in training deep neural networks (DNNs) without a significant decrease in performance. To overcome this roadblock, we develop DeepZero, a principled ZO deep learning (DL) framework that can scale ZO optimization to DNN training from scratch through three primary innovations. First, we demonstrate the advantages of coordinate-wise gradient estimation (CGE) over randomized vector-wise gradient estimation in training accuracy and computational efficiency. Second, we propose a sparsity-induced ZO training protocol that extends the model pruning methodology using only finite differences to explore and exploit the sparse DL prior in CGE. Third, we develop the methods of feature reuse and forward parallelization to advance the practical implementations of ZO training. Our extensive experiments show that DeepZero achieves state-of-the-art (SOTA) accuracy on ResNet-20 trained on CIFAR-10, approaching FO training performance for the first time. Furthermore, we show the practical utility of DeepZero in applications of certified adversarial defense and DL-based partial differential equation error correction, achieving 10-20% improvement over SOTA. We believe our results will inspire future research on scalable ZO optimization and contribute to advancing DL with black box.

View-Independent Adjoint Light Tracing for Lighting Design Optimization

Authors: Authors: Lukas Lipp, David Hahn, Pierre Ecormier-Nocca, Florian Rist, Michael Wimmer
Subjects: Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2310.02043
Pdf link: https://arxiv.org/pdf/2310.02043
Abstract Controlling light is a central element when composing a scene, enabling artistic expression, as well as the design of comfortable living spaces. In contrast to previous camera-based inverse rendering approaches, we introduce a novel method for interactive, view-independent differentiable global illumination. Our method first performs a forward light-tracing pass, starting from the light sources and storing the resulting radiance field on the scene geometry, representing specular highlights via hemi-spherical harmonics. We then evaluate an objective function on the entire radiance data and propagate derivatives back to the lighting parameters by formulating a novel, analytical adjoint light-tracing step. Our method builds on GPU ray tracing, which allows us to optimize all lighting parameters at interactive rates, even for complex geometry. Instead of specifying optimization targets as view-specific images, our method allows us to optimize the lighting of an entire scene to match either baked illumination (e.g., lightmaps), regulatory lighting requirements for work spaces, or artistic sketches drawn directly on the geometry. This approach provides a more direct and intuitive user experience for designers. We visualize our adjoint gradients and compare them to image-based state-of-the-art differentiable rendering methods. We also compare the convergence behavior of various optimization algorithms when using our gradient data vs. image-based differentiable rendering methods. Qualitative comparisons with real-world scenes underline the practical applicability of our method.

Stochastic Gradient Descent with Preconditioned Polyak Step-size

Authors: Authors: Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Martin Takáč
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.02093
Pdf link: https://arxiv.org/pdf/2310.02093
Abstract Stochastic Gradient Descent (SGD) is one of the many iterative optimization methods that are widely used in solving machine learning problems. These methods display valuable properties and attract researchers and industrial machine learning engineers with their simplicity. However, one of the weaknesses of this type of methods is the necessity to tune learning rate (step-size) for every loss function and dataset combination to solve an optimization problem and get an efficient performance in a given time budget. Stochastic Gradient Descent with Polyak Step-size (SPS) is a method that offers an update rule that alleviates the need of fine-tuning the learning rate of an optimizer. In this paper, we propose an extension of SPS that employs preconditioning techniques, such as Hutchinson's method, Adam, and AdaGrad, to improve its performance on badly scaled and/or ill-conditioned datasets.

Symmetric Single Index Learning

Authors: Authors: Aaron Zweig, Joan Bruna
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.02117
Pdf link: https://arxiv.org/pdf/2310.02117
Abstract Few neural architectures lend themselves to provable learning with gradient based methods. One popular model is the single-index model, in which labels are produced by composing an unknown linear projection with a possibly unknown scalar link function. Learning this model with SGD is relatively well-understood, whereby the so-called information exponent of the link function governs a polynomial sample complexity rate. However, extending this analysis to deeper or more complicated architectures remains challenging. In this work, we consider single index learning in the setting of symmetric neural networks. Under analytic assumptions on the activation and maximum degree assumptions on the link function, we prove that gradient flow recovers the hidden planted direction, represented as a finitely supported vector in the feature space of power sum polynomials. We characterize a notion of information exponent adapted to our setting that controls the efficiency of learning.

Keyword: super-resolution

RF-ULM: Deep Learning for Radio-Frequency Ultrasound Localization Microscopy

Authors: Authors: Christopher Hahne, Georges Chabouh, Arthur Chavignon, Olivier Couture, Raphael Sznitman
Subjects: Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
Arxiv link: https://arxiv.org/abs/2310.01545
Pdf link: https://arxiv.org/pdf/2310.01545
Abstract In Ultrasound Localization Microscopy (ULM), achieving high-resolution images relies on the precise localization of contrast agent particles across consecutive beamformed frames. However, our study uncovers an enormous potential: The process of delay-and-sum beamforming leads to an irreversible reduction of Radio-Frequency (RF) data, while its implications for localization remain largely unexplored. The rich contextual information embedded within RF wavefronts, including their hyperbolic shape and phase, offers great promise for guiding Deep Neural Networks (DNNs) in challenging localization scenarios. To fully exploit this data, we propose to directly localize scatterers in RF signals. Our approach involves a custom super-resolution DNN using learned feature channel shuffling and a novel semi-global convolutional sampling block tailored for reliable and accurate localization in RF input data. Additionally, we introduce a geometric point transformation that facilitates seamless mapping between B-mode and RF spaces. To validate the effectiveness of our method and understand the impact of beamforming, we conduct an extensive comparison with State-Of-The-Art (SOTA) techniques in ULM. We present the inaugural in vivo results from an RF-trained DNN, highlighting its real-world practicality. Our findings show that RF-ULM bridges the domain gap between synthetic and real datasets, offering a considerable advantage in terms of precision and complexity. To enable the broader research community to benefit from our findings, our code and the associated SOTA methods are made available at https://github.com/hahnec/rf-ulm.

CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems

Authors: Authors: Priyanshu Burark, Karn Tiwari, Meer Mehran Rashid, Prathosh A P, N M Anoop Krishnan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2310.01650
Pdf link: https://arxiv.org/pdf/2310.01650
Abstract Continuous dynamical systems, characterized by differential equations, are ubiquitously used to model several important problems: plasma dynamics, flow through porous media, weather forecasting, and epidemic dynamics. Recently, a wide range of data-driven models has been used successfully to model these systems. However, in contrast to established fields like computer vision, limited studies are available analyzing the strengths and potential applications of different classes of these models that could steer decision-making in scientific machine learning. Here, we introduce CodBench, an exhaustive benchmarking suite comprising 11 state-of-the-art data-driven models for solving differential equations. Specifically, we comprehensively evaluate 4 distinct categories of models, viz., feed forward neural networks, deep operator regression models, frequency-based neural operators, and transformer architectures against 8 widely applicable benchmark datasets encompassing challenges from fluid and solid mechanics. We conduct extensive experiments, assessing the operators' capabilities in learning, zero-shot super-resolution, data efficiency, robustness to noise, and computational efficiency. Interestingly, our findings highlight that current operators struggle with the newer mechanics datasets, motivating the need for more robust neural operators. All the datasets and codes will be shared in an easy-to-use fashion for the scientific community. We hope this resource will be an impetus for accelerated progress and exploration in modeling dynamical systems.

CoNO: Complex Neural Operator for Continuous Dynamical Systems

Authors: Authors: Karn Tiwari, N M Anoop Krishnan, Prathosh A P
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.02094
Pdf link: https://arxiv.org/pdf/2310.02094
Abstract Neural operators extend data-driven models to map between infinite-dimensional functional spaces. These models have successfully solved continuous dynamical systems represented by differential equations, viz weather forecasting, fluid flow, or solid mechanics. However, the existing operators still rely on real space, thereby losing rich representations potentially captured in the complex space by functional transforms. In this paper, we introduce a Complex Neural Operator (CoNO), that parameterizes the integral kernel in the complex fractional Fourier domain. Additionally, the model employing a complex-valued neural network along with aliasing-free activation functions preserves the complex values and complex algebraic properties, thereby enabling improved representation, robustness to noise, and generalization. We show that the model effectively captures the underlying partial differential equation with a single complex fractional Fourier transform. We perform an extensive empirical evaluation of CoNO on several datasets and additional tasks such as zero-shot super-resolution, evaluation of out-of-distribution data, data efficiency, and robustness to noise. CoNO exhibits comparable or superior performance to all the state-of-the-art models in these tasks. Altogether, CoNO presents a robust and superior model for modeling continuous dynamical systems, providing a fillip to scientific machine learning.

Oct 04 '23 06:10 zoq

arxiv-updates arxiv-updates copied to clipboard

New submissions for Wed, 4 Oct 23

Keyword: sgd

Stochastic Gradient Descent with Preconditioned Polyak Step-size

Symmetric Single Index Learning

Chunking: Forgetting Matters in Continual Learning even without Changing Tasks

Keyword: optimization

Enhancing Secrecy in UAV RSMA Networks: Deep Unfolding Meets Deep Reinforcement Learning

Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets

FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

Decision-Oriented Intervention Cost Prediction for Multi-robot Persistent Monitoring

Primal-dual hybrid gradient algorithms for computing time-implicit Hamilton-Jacobi equations

Distributionally Robust Path Integral Control

Estimating and Implementing Conventional Fairness Metrics With Probabilistic Protected Features

Decentralized Micro Water-Energy Co-Optimization for Small Communities

DynAMO: Multi-agent reinforcement learning for dynamic anticipatory mesh optimization with applications to hyperbolic conservation laws

RETRO: Reactive Trajectory Optimization for Real-Time Robot Motion Planning in Dynamic Environments

Randomized Dimension Reduction with Statistical Guarantees

Linearization of ReLU Activation Function for Neural Network-Embedded Optimization:Optimal Day-Ahead Energy Scheduling

A simple connection from loss flatness to compressed representations in neural networks

STAMP: Differentiable Task and Motion Planning via Stein Variational Gradient Descent

Comparative study of microgrid optimal scheduling under multi-optimization algorithm fusion

Improvement and Enhancement of YOLOv5 Small Target Recognition Based on Multi-module Optimization

AutoLoRa: A Parameter-Free Automated Robust Fine-Tuning Framework

Adaptive Hybrid Model for Enhanced Stock Market Predictions Using Improved VMD and Stacked Informer

Approximating Voltage Stability Boundary Under High Variability of Renewables Using Differential Geometry

PyHexTop: a compact Python code for topology optimization using hexagonal elements

UAV Swarm-enabled Collaborative Secure Relay Communications with Time-domain Colluding Eavesdropper

DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

View-Independent Adjoint Light Tracing for Lighting Design Optimization

De Novo Drug Design with Joint Transformers

TOaCNN: Adaptive Convolutional Neural Network for Multidisciplinary Topology Optimization

Stochastic Gradient Descent with Preconditioned Polyak Step-size

Adaptive Gait Modeling and Optimization for Principally Kinematic Systems

Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization

Joint Optimization of Charging Infrastructure Placement and Operational Schedules for a Fleet of Battery Electric Trucks

Optimum Monitoring of Heterogeneous Continuous Time Markov Chains

Keyword: adam

Adaptive Hybrid Model for Enhanced Stock Market Predictions Using Improved VMD and Stacked Informer

Stochastic Gradient Descent with Preconditioned Polyak Step-size

Keyword: gradient

Enhancing Secrecy in UAV RSMA Networks: Deep Unfolding Meets Deep Reinforcement Learning

FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models

Primal-dual hybrid gradient algorithms for computing time-implicit Hamilton-Jacobi equations

Intractability of Learning the Discrete Logarithm with Gradient-Based Methods

From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression

Blending Imitation and Reinforcement Learning for Robust Policy Improvement

Exploring Counterfactual Alignment Loss towards Human-centered AI

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

STAMP: Differentiable Task and Motion Planning via Stein Variational Gradient Descent

AutoLoRa: A Parameter-Free Automated Robust Fine-Tuning Framework

Trainable Noise Model as an XAI evaluation method: application on Sobol for remote sensing image segmentation

Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data

Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

View-Independent Adjoint Light Tracing for Lighting Design Optimization

Stochastic Gradient Descent with Preconditioned Polyak Step-size

Symmetric Single Index Learning

Keyword: super-resolution

RF-ULM: Deep Learning for Radio-Frequency Ultrasound Localization Microscopy

CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems

CoNO: Complex Neural Operator for Continuous Dynamical Systems

arxiv-updates
arxiv-updates copied to clipboard