arxiv-updates New submissions for Tue, 19 Sep 23

New submissions for Tue, 19 Sep 23

Open zoq opened this issue 1 year ago • 0 comments

Keyword: sgd

Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets

Authors: Authors: Pulkit Gopalani, Samyak Jha, Anirbit Mukherjee
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.09258
Pdf link: https://arxiv.org/pdf/2309.09258
Abstract In this note, we demonstrate a first-of-its-kind provable convergence of SGD to the global minima of appropriately regularized logistic empirical risk of depth $2$ nets -- for arbitrary data and with any number of gates with adequately smooth and bounded activations like sigmoid and tanh. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized logistic loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives.

Keyword: optimization

Maneuver Decision-Making Through Proximal Policy Optimization And Monte Carlo Tree Search

Authors: Authors: Zhang Hong-Peng
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.08611
Pdf link: https://arxiv.org/pdf/2309.08611
Abstract Maneuver decision-making can be regarded as a Markov decision process and can be address by reinforcement learning. However, original reinforcement learning algorithms can hardly solve the maneuvering decision-making problem. One reason is that agents use random actions in the early stages of training, which makes it difficult to get rewards and learn how to make effective decisions. To address this issue, a method based on proximal policy optimization and Monte Carlo tree search is proposed. The method uses proximal policy optimization to train the agent, and regards the results of air combat as targets to train the value network. Then, based on the value network and the visit count of each node, Monte Carlo tree search is used to find the actions with more expected returns than random actions, which can improve the training performance. The ablation studies and simulation experiments indicate that agents trained by the proposed method can make different decisions according to different states, which demonstrates that the method can solve the maneuvering decision problem that the original reinforcement learning algorithm cannot solve.

A Stochastic Online Forecast-and-Optimize Framework for Real-Time Energy Dispatch in Virtual Power Plants under Uncertainty

Authors: Authors: Wei Jiang, Zhongkai Yi, Li Wang, Hanwei Zhang, Jihai Zhang, Fangquan Lin, Cheng Yang
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2309.08642
Pdf link: https://arxiv.org/pdf/2309.08642
Abstract Aggregating distributed energy resources in power systems significantly increases uncertainties, in particular caused by the fluctuation of renewable energy generation. This issue has driven the necessity of widely exploiting advanced predictive control techniques under uncertainty to ensure long-term economics and decarbonization. In this paper, we propose a real-time uncertainty-aware energy dispatch framework, which is composed of two key elements: (i) A hybrid forecast-and-optimize sequential task, integrating deep learning-based forecasting and stochastic optimization, where these two stages are connected by the uncertainty estimation at multiple temporal resolutions; (ii) An efficient online data augmentation scheme, jointly involving model pre-training and online fine-tuning stages. In this way, the proposed framework is capable to rapidly adapt to the real-time data distribution, as well as to target on uncertainties caused by data drift, model discrepancy and environment perturbations in the control process, and finally to realize an optimal and robust dispatch solution. The proposed framework won the championship in CityLearn Challenge 2022, which provided an influential opportunity to investigate the potential of AI application in the energy domain. In addition, comprehensive experiments are conducted to interpret its effectiveness in the real-life scenario of smart building energy management.

Cure the headache of Transformers via Collinear Constrained Attention

Authors: Authors: Shiyi Zhu, Jing Ye, Wei Jiang, Qi Zhang, Yifan Wu, Jianguo Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2309.08646
Pdf link: https://arxiv.org/pdf/2309.08646
Abstract As the rapid progression of practical applications based on Large Language Models continues, the importance of extrapolating performance has grown exponentially in the research domain. In our study, we identified an anomalous behavior in Transformer models that had been previously overlooked, leading to a chaos around closest tokens which carried the most important information. We've coined this discovery the "headache of Transformers". To address this at its core, we introduced a novel self-attention structure named Collinear Constrained Attention (CoCA). This structure can be seamlessly integrated with existing extrapolation, interpolation methods, and other optimization strategies designed for traditional Transformer models. We have achieved excellent extrapolating performance even for 16 times to 24 times of sequence lengths during inference without any fine-tuning on our model. We have also enhanced CoCA's computational and spatial efficiency to ensure its practicality. We plan to open-source CoCA shortly. In the meantime, we've made our code available in the appendix for reappearing experiments.

Speeding up charge exchange recombination spectroscopy analysis in support of NERSC/DIII-D realtime workflow

Authors: Authors: Aarushi Jain, Laurie Stephey, Erik Linsenmayer, Colin Chrystal, Jonathan Dursi, Hannah Ross
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Plasma Physics (physics.plasm-ph)
Arxiv link: https://arxiv.org/abs/2309.08687
Pdf link: https://arxiv.org/pdf/2309.08687
Abstract We report optimization work made in support of the development of a realtime Superfacility workflow between DIII-D and NERSC. At DIII-D, the ion properties measured by charge exchange recombination (CER) spectroscopy are required inputs for a Superfacility realtime workflow that computes the full plasma kinetic equilibrium. In this workflow, minutes matter since the results must be ready during the brief 10-15 minute pause between plasma discharges. Prior to this work, a sample CERFIT analysis took approximately 15 minutes. Because the problem consists of many calculations that can be done independently, we were able to restructure the CERFIT code to leverage this parallelism with Slurm job arrays. We reduced the runtime to approximately 51 seconds -- a speedup of roughly 20x, saving valuable time for both the scientists interested in the CER results and also for the larger equilibrium reconstruction workflow.

Probabilistic Constellation Shaping With Denoising Diffusion Probabilistic Models: A Novel Approach

Authors: Authors: Mehdi Letafati, Samad Ali, Matti Latva-aho
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.08688
Pdf link: https://arxiv.org/pdf/2309.08688
Abstract With the incredible results achieved from generative pre-trained transformers (GPT) and diffusion models, generative AI (GenAI) is envisioned to yield remarkable breakthroughs in various industrial and academic domains. In this paper, we utilize denoising diffusion probabilistic models (DDPM), as one of the state-of-the-art generative models, for probabilistic constellation shaping in wireless communications. While the geometry of constellations is predetermined by the networking standards, probabilistic constellation shaping can help enhance the information rate and communication performance by designing the probability of occurrence (generation) of constellation symbols. Unlike conventional methods that deal with an optimization problem over the discrete distribution of constellations, we take a radically different approach. Exploiting the denoise-and-generate'' characteristic of DDPMs, the key idea is to learn how to generate constellation symbols out of noise, mimicking'' the way the receiver performs symbol reconstruction. By doing so, we make the constellation symbols sent by the transmitter, and what is inferred (reconstructed) at the receiver become as similar as possible. Our simulations show that the proposed scheme outperforms deep neural network (DNN)-based benchmark and uniform shaping, while providing network resilience as well as robust out-of-distribution performance under low-SNR regimes and non-Gaussian noise. Notably, a threefold improvement in terms of mutual information is achieved compared to DNN-based approach for 64-QAM geometry.

Wasserstein Distributionally Robust Control Barrier Function using Conditional Value-at-Risk with Differentiable Convex Programming

Authors: Authors: Alaa Eddine Chriat, Chuangchuang Sun
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.08700
Pdf link: https://arxiv.org/pdf/2309.08700
Abstract Control Barrier functions (CBFs) have attracted extensive attention for designing safe controllers for their deployment in real-world safety-critical systems. However, the perception of the surrounding environment is often subject to stochasticity and further distributional shift from the nominal one. In this paper, we present distributional robust CBF (DR-CBF) to achieve resilience under distributional shift while keeping the advantages of CBF, such as computational efficacy and forward invariance. To achieve this goal, we first propose a single-level convex reformulation to estimate the conditional value at risk (CVaR) of the safety constraints under distributional shift measured by a Wasserstein metric, which is by nature tri-level programming. Moreover, to construct a control barrier condition to enforce the forward invariance of the CVaR, the technique of differentiable convex programming is applied to enable differentiation through the optimization layer of CVaR estimation. We also provide an approximate variant of DR-CBF for higher-order systems. Simulation results are presented to validate the chance-constrained safety guarantee under the distributional shift in both first and second-order systems.

Clustered Multi-Agent Linear Bandits

Authors: Authors: Hamza Cherkaoui, Merwan Barlier, Igor Colin
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.08710
Pdf link: https://arxiv.org/pdf/2309.08710
Abstract We address in this paper a particular instance of the multi-agent linear stochastic bandit problem, called clustered multi-agent linear bandits. In this setting, we propose a novel algorithm leveraging an efficient collaboration between the agents in order to accelerate the overall optimization problem. In this contribution, a network controller is responsible for estimating the underlying cluster structure of the network and optimizing the experiences sharing among agents within the same groups. We provide a theoretical analysis for both the regret minimization problem and the clustering quality. Through empirical evaluation against state-of-the-art algorithms on both synthetic and real data, we demonstrate the effectiveness of our approach: our algorithm significantly improves regret minimization while managing to recover the true underlying cluster partitioning.

RoSSO: A High-Performance Python Package for Robotic Surveillance Strategy Optimization Using JAX

Authors: Authors: Yohan John, Connor Hughes, Gilberto Diaz-Garcia, Jason R. Marden, Francesco Bullo
Subjects: Robotics (cs.RO); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.08742
Pdf link: https://arxiv.org/pdf/2309.08742
Abstract To enable the computation of effective randomized patrol routes for single- or multi-robot teams, we present RoSSO, a Python package designed for solving Markov chain optimization problems. We exploit machine-learning techniques such as reverse-mode automatic differentiation and constraint parametrization to achieve superior efficiency compared to general-purpose nonlinear programming solvers. Additionally, we supplement a game-theoretic stochastic surveillance formulation in the literature with a novel greedy algorithm and multi-robot extension. We close with numerical results for a police district in downtown San Francisco that demonstrate RoSSO's capabilities on our new formulations and the prior work.

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Authors: Authors: Yi Shen, Pan Xu, Michael M. Zavlanos
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.08748
Pdf link: https://arxiv.org/pdf/2309.08748
Abstract Without direct interaction with the environment. Often, the environment in which the data are collected differs from the environment in which the learned policy is applied. To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed that compute worst-case bounds on the policy values assuming that the distribution of the new environment lies within an uncertainty set. Typically, this uncertainty set is defined based on the KL divergence around the empirical distribution computed from the logging dataset. However, the KL uncertainty set fails to encompass distributions with varying support and lacks awareness of the geometry of the distribution support. As a result, KL approaches fall short in addressing practical environment mismatches and lead to over-fitting to worst-case scenarios. To overcome these limitations, we propose a novel DRO approach that employs the Wasserstein distance instead. While Wasserstein DRO is generally computationally more expensive compared to KL DRO, we present a regularized method and a practical (biased) stochastic gradient descent method to optimize the policy efficiently. We also provide a theoretical analysis of the finite sample complexity and iteration complexity for our proposed method. We further validate our approach using a public dataset that was recorded in a randomized stoke trial.

A Control Approach for Nonlinear Stochastic State Uncertain Systems with Probabilistic Safety Guarantees

Authors: Authors: Mohammad S. Ramadan, Mohammad Alsuwaidan, Ahmed Atallah, Sylvia Herbert
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.08767
Pdf link: https://arxiv.org/pdf/2309.08767
Abstract This paper presents an algorithm to apply nonlinear control design approaches in the case of stochastic systems with partial state observation. Deterministic nonlinear control approaches are formulated under the assumption of full state access and, often, relative degree one. We propose a control design approach that first generates a control policy for nonlinear deterministic models with full state observation. The resulting control policy is then used to build an importance-like probability distribution over the space of control sequences which are to be evaluated for the true stochastic and state-uncertain dynamics. This distribution serves in the sampling step within a random search control optimization procedure, to focus the exploration effort on certain regions of the control space. The sampled control sequences are assigned costs determined by a prescribed finite-horizon performance and safety measure, which is based on the stochastic dynamics. This sampling algorithm is parallelizable and shown to have computational complexity indifferent to the state dimension, and to be able to guarantee safety over the prescribed prediction horizon. A numerical simulation is provided to test the applicability and effectiveness of the presented approach and compare it to a certainty equivalence controller.

SHAPNN: Shapley Value Regularized Tabular Neural Network

Authors: Authors: Qisen Cheng, Shuhui Qu, Janghwan Lee
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.08799
Pdf link: https://arxiv.org/pdf/2309.08799
Abstract We present SHAPNN, a novel deep tabular data modeling architecture designed for supervised learning. Our approach leverages Shapley values, a well-established technique for explaining black-box models. Our neural network is trained using standard backward propagation optimization methods, and is regularized with realtime estimated Shapley values. Our method offers several advantages, including the ability to provide valid explanations with no computational overhead for data instances and datasets. Additionally, prediction with explanation serves as a regularizer, which improves the model's performance. Moreover, the regularized prediction enhances the model's capability for continual learning. We evaluate our method on various publicly available datasets and compare it with state-of-the-art deep neural network models, demonstrating the superior performance of SHAPNN in terms of AUROC, transparency, as well as robustness to streaming data.

Geometric Projectors: Geometric Constraints based Optimization for Robot Behaviors

Authors: Authors: Xuemin Chi, Tobias Löw, Yiming Li, Zhitao Liu, Sylvain Calinon
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08802
Pdf link: https://arxiv.org/pdf/2309.08802
Abstract Generating motion for robots that interact with objects of various shapes is a complex challenge, further complicated when the robot's own geometry and multiple desired behaviors are considered. To address this issue, we introduce a new framework based on Geometric Projectors (GeoPro) for constrained optimization. This novel framework allows for the generation of task-agnostic behaviors that are compliant with geometric constraints. GeoPro streamlines the design of behaviors in both task and configuration spaces, offering diverse functionalities such as collision avoidance and goal-reaching, while maintaining high computational efficiency. We validate the efficacy of our work through simulations and Franka Emika robotic experiments, comparing its performance against state-of-the-art methodologies. This comprehensive evaluation highlights GeoPro's versatility in accommodating robots with varying dynamics and precise geometric shapes. For additional materials, please visit: https://www.xueminchi.com/publications/geopro

Distributionally Robust CVaR-Based Safety Filtering for Motion Planning in Uncertain Environments

Authors: Authors: Sleiman Safaoui, Tyler H. Summers
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.08821
Pdf link: https://arxiv.org/pdf/2309.08821
Abstract Safety is a core challenge of autonomous robot motion planning, especially in the presence of dynamic and uncertain obstacles. Many recent results use learning and deep learning-based motion planners and prediction modules to predict multiple possible obstacle trajectories and generate obstacle-aware ego robot plans. However, planners that ignore the inherent uncertainties in such predictions incur collision risks and lack formal safety guarantees. In this paper, we present a computationally efficient safety filtering solution to reduce the collision risk of ego robot motion plans using multiple samples of obstacle trajectory predictions. The proposed approach reformulates the collision avoidance problem by computing safe halfspaces based on obstacle sample trajectories using distributionally robust optimization (DRO) techniques. The safe halfspaces are used in a model predictive control (MPC)-like safety filter to apply corrections to the reference ego trajectory thereby promoting safer planning. The efficacy and computational efficiency of our approach are demonstrated through numerical simulations.

Distributionally Robust Post-hoc Classifiers under Prior Shifts

Authors: Authors: Jiaheng Wei, Harikrishna Narasimhan, Ehsan Amid, Wen-Sheng Chu, Yang Liu, Abhishek Kumar
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.08825
Pdf link: https://arxiv.org/pdf/2309.08825
Abstract The generalization ability of machine learning models degrades significantly when the test distribution shifts away from the training distribution. We investigate the problem of training models that are robust to shifts caused by changes in the distribution of class-priors or group-priors. The presence of skewed training priors can often lead to the models overfitting to spurious features. Unlike existing methods, which optimize for either the worst or the average performance over classes or groups, our work is motivated by the need for finer control over the robustness properties of the model. We present an extremely lightweight post-hoc approach that performs scaling adjustments to predictions from a pre-trained model, with the goal of minimizing a distributionally robust loss around a chosen target distribution. These adjustments are computed by solving a constrained optimization problem on a validation set and applied to the model during test time. Our constrained optimization objective is inspired by a natural notion of robustness to controlled distribution shifts. Our method comes with provable guarantees and empirically makes a strong case for distributional robust post-hoc classifiers. An empirical implementation is available at https://github.com/weijiaheng/Drops.

Intention-Aware Planner for Robust and Safe Aerial Tracking

Authors: Authors: Qiuyu Ren, Huan Yu, Jiajun Dai, Zhi Zheng, Jun Meng, Li Xu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08854
Pdf link: https://arxiv.org/pdf/2309.08854
Abstract The intention of the target can help us to estimate its future motion state more accurately. This paper proposes an intention-aware planner to enhance safety and robustness in aerial tracking applications. Firstly, we utilize the Mediapipe framework to estimate target's pose. A risk assessment function and a state observation function are designed to predict the target intention. Afterwards, an intention-driven hybrid A* method is proposed for target motion prediction, ensuring that the target's future positions align with its intention. Finally, an intention-aware optimization approach, in conjunction with particular penalty formulations, is designed to generate a spatial-temporal optimal trajectory. Benchmark comparisons validate the superior performance of our proposed methodology across diverse scenarios. This is attributed to the integration of the target intention into the planner through coupled formulations.

Towards Geometric Motion Planning for High-Dimensional Systems: Gait-Based Coordinate Optimization and Local Metrics

Authors: Authors: Yanhao Yang, Ross L. Hatton
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08871
Pdf link: https://arxiv.org/pdf/2309.08871
Abstract Geometric motion planning offers effective and interpretable gait analysis and optimization tools for locomoting systems. However, due to the curse of dimensionality in coordinate optimization, a key component of geometric motion planning, it is almost infeasible to apply current geometric motion planning to high-dimensional systems. In this paper, we propose a gait-based coordinate optimization method that overcomes the curse of dimensionality. We also identify a unified geometric representation of locomotion by generalizing various nonholonomic constraints into local metrics. By combining these two approaches, we take a step towards geometric motion planning for high-dimensional systems. We test our method in two classes of high-dimensional systems - low Reynolds number swimmers and free-falling Cassie - with up to 11-dimensional shape variables. The resulting optimal gait in the high-dimensional system shows better efficiency compared to that of the reduced-order model. Furthermore, we provide a geometric optimality interpretation of the optimal gait.

Semantic Information Extraction for Text Data with Probability Graph

Authors: Authors: Zhouxiang Zhao, Zhaohui Yang, Ye Hu, Licheng Lin, Zhaoyang Zhang
Subjects: Computation and Language (cs.CL); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.08879
Pdf link: https://arxiv.org/pdf/2309.08879
Abstract In this paper, the problem of semantic information extraction for resource constrained text data transmission is studied. In the considered model, a sequence of text data need to be transmitted within a communication resource-constrained network, which only allows limited data transmission. Thus, at the transmitter, the original text data is extracted with natural language processing techniques. Then, the extracted semantic information is captured in a knowledge graph. An additional probability dimension is introduced in this graph to capture the importance of each information. This semantic information extraction problem is posed as an optimization framework whose goal is to extract most important semantic information for transmission. To find an optimal solution for this problem, a Floyd's algorithm based solution coupled with an efficient sorting mechanism is proposed. Numerical results testify the effectiveness of the proposed algorithm with regards to two novel performance metrics including semantic uncertainty and semantic similarity.

GRaCE: Optimizing Grasps to Satisfy Ranked Criteria in Complex Scenario

Authors: Authors: Tasbolat Taunyazov, Kelvin Lin, Harold Soh
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08887
Pdf link: https://arxiv.org/pdf/2309.08887
Abstract This paper addresses the multi-faceted problem of robot grasping, where multiple criteria may conflict and differ in importance. We introduce Grasp Ranking and Criteria Evaluation (GRaCE), a novel approach that employs hierarchical rule-based logic and a rank-preserving utility function to optimize grasps based on various criteria such as stability, kinematic constraints, and goal-oriented functionalities. Additionally, we propose GRaCE-OPT, a hybrid optimization strategy that combines gradient-based and gradient-free methods to effectively navigate the complex, non-convex utility function. Experimental results in both simulated and real-world scenarios show that GRaCE requires fewer samples to achieve comparable or superior performance relative to existing methods. The modular architecture of GRaCE allows for easy customization and adaptation to specific application needs.

Efficient Methods for Non-stationary Online Learning

Authors: Authors: Peng Zhao, Yan-Feng Xie, Lijun Zhang, Zhi-Hua Zhou
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.08911
Pdf link: https://arxiv.org/pdf/2309.08911
Abstract Non-stationary online learning has drawn much attention in recent years. In particular, dynamic regret and adaptive regret are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of the non-stationarity, in which a group of base-learners are maintained and a meta-algorithm is employed to track the best one on the fly. However, the two-layer structure raises the concern about the computational complexity -- those methods typically maintain $\mathcal{O}(\log T)$ base-learners simultaneously for a $T$-round online game and thus perform multiple projections onto the feasible domain per round, which becomes the computational bottleneck when the domain is complicated. In this paper, we present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $\mathcal{O}(\log T)$ to $1$. Moreover, our obtained algorithms require only one gradient query and one function evaluation at each round. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods. Empirical studies verify our theoretical findings.

Delving into Multimodal Prompting for Fine-grained Visual Classification

Authors: Authors: Xin Jiang, Hao Tang, Junyao Gao, Xiaoyu Du, Shengfeng He, Zechao Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2309.08912
Pdf link: https://arxiv.org/pdf/2309.08912
Abstract Fine-grained visual classification (FGVC) involves categorizing fine subdivisions within a broader category, which poses challenges due to subtle inter-class discrepancies and large intra-class variations. However, prevailing approaches primarily focus on uni-modal visual concepts. Recent advancements in pre-trained vision-language models have demonstrated remarkable performance in various high-level vision tasks, yet the applicability of such models to FGVC tasks remains uncertain. In this paper, we aim to fully exploit the capabilities of cross-modal description to tackle FGVC tasks and propose a novel multimodal prompting solution, denoted as MP-FGVC, based on the contrastive language-image pertaining (CLIP) model. Our MP-FGVC comprises a multimodal prompts scheme and a multimodal adaptation scheme. The former includes Subcategory-specific Vision Prompt (SsVP) and Discrepancy-aware Text Prompt (DaTP), which explicitly highlights the subcategory-specific discrepancies from the perspectives of both vision and language. The latter aligns the vision and text prompting elements in a common semantic space, facilitating cross-modal collaborative reasoning through a Vision-Language Fusion Module (VLFM) for further improvement on FGVC. Moreover, we tailor a two-stage optimization strategy for MP-FGVC to fully leverage the pre-trained CLIP model and expedite efficient adaptation for FGVC. Extensive experiments conducted on four FGVC datasets demonstrate the effectiveness of our MP-FGVC.

Bidirectional Graph GAN: Representing Brain Structure-Function Connections for Alzheimer's Disease

Authors: Authors: Shuqiang Wang, Chen Ding
Subjects: Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2309.08916
Pdf link: https://arxiv.org/pdf/2309.08916
Abstract The relationship between brain structure and function is critical for revealing the pathogenesis of brain disease, including Alzheimer's disease (AD). However, it is a great challenge to map brain structure-function connections due to various reasons. In this work, a bidirectional graph generative adversarial networks (BGGAN) is proposed to represent brain structure-function connections. Specifically, by designing a module incorporating inner graph convolution network (InnerGCN), the generators of BGGAN can employ features of direct and indirect brain regions to learn the mapping function between structural domain and functional domain. Besides, a new module named Balancer is designed to counterpoise the optimization between generators and discriminators. By introducing the Balancer into BGGAN, both the structural generator and functional generator can not only alleviate the issue of mode collapse but also learn complementarity of structural and functional features. Experimental results using ADNI datasets show that the both the generated structure connections and generated function connections can improve the identification accuracy of AD. More importantly, based the proposed model, it is found that the relationship between brain structure and function is not a complete one-to-one correspondence. Brain structure is the basis of brain function. The strong structural connections are almost accompanied by strong functional connections.

Inverse classification with logistic and softmax classifiers: efficient optimization

Authors: Authors: Miguel Á. Carreira-Perpiñán, Suryabhan Singh Hada
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.08945
Pdf link: https://arxiv.org/pdf/2309.08945
Abstract In recent years, a certain type of problems have become of interest where one wants to query a trained classifier. Specifically, one wants to find the closest instance to a given input instance such that the classifier's predicted label is changed in a desired way. Examples of these ``inverse classification'' problems are counterfactual explanations, adversarial examples and model inversion. All of them are fundamentally optimization problems over the input instance vector involving a fixed classifier, and it is of interest to achieve a fast solution for interactive or real-time applications. We focus on solving this problem efficiently for two of the most widely used classifiers: logistic regression and softmax classifiers. Owing to special properties of these models, we show that the optimization can be solved in closed form for logistic regression, and iteratively but extremely fast for the softmax classifier. This allows us to solve either case exactly (to nearly machine precision) in a runtime of milliseconds to around a second even for very high-dimensional instances and many classes.

ExBluRF: Efficient Radiance Fields for Extreme Motion Blurred Images

Authors: Authors: Dongwoo Lee, Jeongtaek Oh, Jaesung Lim, Sunghyun Cho, Kyoung Mu Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08957
Pdf link: https://arxiv.org/pdf/2309.08957
Abstract We present ExBluRF, a novel view synthesis method for extreme motion blurred images based on efficient radiance fields optimization. Our approach consists of two main components: 6-DOF camera trajectory-based motion blur formulation and voxel-based radiance fields. From extremely blurred images, we optimize the sharp radiance fields by jointly estimating the camera trajectories that generate the blurry images. In training, multiple rays along the camera trajectory are accumulated to reconstruct single blurry color, which is equivalent to the physical motion blur operation. We minimize the photo-consistency loss on blurred image space and obtain the sharp radiance fields with camera trajectories that explain the blur of all images. The joint optimization on the blurred image space demands painfully increasing computation and resources proportional to the blur size. Our method solves this problem by replacing the MLP-based framework to low-dimensional 6-DOF camera poses and voxel-based radiance fields. Compared with the existing works, our approach restores much sharper 3D scenes from challenging motion blurred views with the order of 10 times less training time and GPU memory consumption.

FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization

Authors: Authors: Nan Ma, Mohan Wang, Yiheng Han, Yong-Jin Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08966
Pdf link: https://arxiv.org/pdf/2309.08966
Abstract Cross-modality point cloud registration is confronted with significant challenges due to inherent differences in modalities between different sensors. We propose a cross-modality point cloud registration framework FF-LOGO: a cross-modality point cloud registration method with feature filtering and local-global optimization. The cross-modality feature correlation filtering module extracts geometric transformation-invariant features from cross-modality point clouds and achieves point selection by feature matching. We also introduce a cross-modality optimization process, including a local adaptive key region aggregation module and a global modality consistency fusion optimization module. Experimental results demonstrate that our two-stage optimization significantly improves the registration accuracy of the feature association and selection module. Our method achieves a substantial increase in recall rate compared to the current state-of-the-art methods on the 3DCSR dataset, improving from 40.59% to 75.74%. Our code will be available at https://github.com/wangmohan17/FFLOGO.

Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations

Authors: Authors: Fucheng Jia, Shiqi Jiang, Ting Cao, Wei Cui, Tianrui Xia, Xu Cao, Yuanchun Li, Deyu Zhang, Ju Ren, Yunxin Liu, Lili Qiu, Mao Yang
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.08978
Pdf link: https://arxiv.org/pdf/2309.08978
Abstract Web applications are increasingly becoming the primary platform for AI service delivery, making in-browser deep learning (DL) inference more prominent. However, current in-browser inference systems fail to effectively utilize advanced web programming techniques and customize kernels for various client devices, leading to suboptimal performance. To address the issues, this paper presents the first in-browser inference system, nn-JIT.web, which enables just-in-time (JIT) auto-generation of optimized kernels for both CPUs and GPUs during inference. The system achieves this by using two novel web programming techniques that can significantly reduce kernel generation time, compared to other tensor compilers such as TVM, while maintaining or even improving performance. The first technique, Tensor-Web Compiling Co-Design, lowers compiling costs by unifying tensor and web compiling and eliminating redundant and ineffective compiling passes. The second technique, Web-Specific Lite Kernel Optimization Space Design, reduces kernel tuning costs by focusing on web programming requirements and efficient hardware resource utilization, limiting the optimization space to only dozens. nn-JIT.web is evaluated for modern transformer models on a range of client devices, including the mainstream CPUs and GPUs from ARM, Intel, AMD and Nvidia. Results show that nn-JIT.web can achieve up to 8.2x faster within 30 seconds compared to the baselines across various models.

QTOS: An Open-Source Quadruped Trajectory Optimization Stack

Authors: Authors: Alexy Skoutnev, Andrew Cinar, Praful Sigdel, Forrest Laine
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09058
Pdf link: https://arxiv.org/pdf/2309.09058
Abstract We introduce a new open-source framework, Quadruped Trajectory Optimization Stack (QTOS), which integrates a global planner, local planner, simulator, controller, and robot interface into a single package. QTOS serves as a full-stack interface, simplifying continuous motion planning on an open-source quadruped platform by bridging the gap between middleware and gait planning. It empowers users to effortlessly translate high-level navigation objectives into low-level robot commands. Furthermore, QTOS enhances the stability and adaptability of long-distance gait planning across challenging terrain.

Achieving Ultra-Reliable Low-Latency Communication (URLLC) in Next-Generation Cellular Networks with Programmable Data Planes

Authors: Authors: Kerim Gökarslan
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2309.09079
Pdf link: https://arxiv.org/pdf/2309.09079
Abstract Recent advancements in wireless technologies towards the next-generation cellular networks have brought a new era that made it possible to apply cellular technology on traditionally-wired networks with tighter requirements, such as industrial networks. The next-generation cellular technologies (e.g., 5G and Beyond) introduce the concept of ultra-reliable low-latency communications (URLLC). This thesis presents a Software-Defined Networking (SDN) architecture with programmable data planes for the next-generation cellular networks to achieve URLLC. Our design deploys programmable switches between the cellular core and Radio Access Networks (RAN) to monitor and modify data traffic at the line speed. We introduce the concept of \textit{intra-cellular optimization}, a relaxation in cellular networks to allow pre-authorized in-network devices to communicate without being required to signal the core network. We also present a control structure, Unified Control Plane (UCP), containing a novel Ethernet Layer control protocol and an adapted version of link-state routing information distribution among the programmable switches. Our implementation uses P4 with an 5G implementation (Open5Gs) and a UE/RAN simulator. We implement a Python simulator to evaluate the performance of our system on multi-switch topologies by simulating the switch behavior. Our evaluation indicates latency reduction up to 2x with \textit{intra-cellular optimization} compared to the conventional architecture. We show that our design has a ten-millisecond level of control latency, and achieves fine-grained network security and monitoring.

CppFlow: Generative Inverse Kinematics for Efficient and Robust Cartesian Path Planning

Authors: Authors: Jeremy Morgan, David Millard, Gaurav S. Sukhatme
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09102
Pdf link: https://arxiv.org/pdf/2309.09102
Abstract In this work we present CppFlow - a novel and performant planner for the Cartesian Path Planning problem, which finds valid trajectories up to 129x faster than current methods, while also succeeding on more difficult problems where others fail. At the core of the proposed algorithm is the use of a learned, generative Inverse Kinematics solver, which is able to efficiently produce promising entire candidate solution trajectories on the GPU. Precise, valid solutions are then found through classical approaches such as differentiable programming, global search, and optimization. In combining approaches from these two paradigms we get the best of both worlds - efficient approximate solutions from generative AI which are made exact using the guarantees of traditional planning and optimization. We evaluate our system against other state of the art methods on a set of established baselines as well as new ones introduced in this work and find that our method significantly outperforms others in terms of the time to find a valid solution and planning success rate, and performs comparably in terms of trajectory length over time. The work is made open source and available for use upon acceptance.

Uncertainty-aware 3D Object-Level Mapping with Deep Shape Priors

Authors: Authors: Ziwei Liao, Jun Yang, Jingxing Qian, Angela P. Schoellig, Steven L. Waslander
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09118
Pdf link: https://arxiv.org/pdf/2309.09118
Abstract 3D object-level mapping is a fundamental problem in robotics, which is especially challenging when object CAD models are unavailable during inference. In this work, we propose a framework that can reconstruct high-quality object-level maps for unknown objects. Our approach takes multiple RGB-D images as input and outputs dense 3D shapes and 9-DoF poses (including 3 scale parameters) for detected objects. The core idea of our approach is to leverage a learnt generative model for shape categories as a prior and to formulate a probabilistic, uncertainty-aware optimization framework for 3D reconstruction. We derive a probabilistic formulation that propagates shape and pose uncertainty through two novel loss functions. Unlike current state-of-the-art approaches, we explicitly model the uncertainty of the object shapes and poses during our optimization, resulting in a high-quality object-level mapping system. Moreover, the resulting shape and pose uncertainties, which we demonstrate can accurately reflect the true errors of our object maps, can also be useful for downstream robotics tasks such as active vision. We perform extensive evaluations on indoor and outdoor real-world datasets, achieving achieves substantial improvements over state-of-the-art methods. Our code will be available at https://github.com/TRAILab/UncertainShapePose.

Conditional Mutual Information Constrained Deep Learning for Classification

Authors: Authors: En-Hui Yang, Shayan Mohajer Hamidi, Linfeng Ye, Renhao Tan, Beverly Yang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.09123
Pdf link: https://arxiv.org/pdf/2309.09123
Abstract The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the DNN, respectively. By using NCMI to evaluate popular DNNs pretrained over ImageNet in the literature, it is shown that their validation accuracies over ImageNet validation data set are more or less inversely proportional to their NCMI values. Based on this observation, the standard deep learning (DL) framework is further modified to minimize the standard cross entropy function subject to an NCMI constraint, yielding CMI constrained deep learning (CMIC-DL). A novel alternating learning algorithm is proposed to solve such a constrained optimization problem. Extensive experiment results show that DNNs trained within CMIC-DL outperform the state-of-the-art models trained within the standard DL and other loss functions in the literature in terms of both accuracy and robustness against adversarial attacks. In addition, visualizing the evolution of learning process through the lens of CMI and NCMI is also advocated.

A Contracting Dynamical System Perspective toward Interval Markov Decision Processes

Authors: Authors: Saber Jafarpour, Samuel Coogan
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.09146
Pdf link: https://arxiv.org/pdf/2309.09146
Abstract Interval Markov decision processes are a class of Markov models where the transition probabilities between the states belong to intervals. In this paper, we study the problem of efficient estimation of the optimal policies in Interval Markov Decision Processes (IMDPs) with continuous action-space. Given an IMDP, we show that the pessimistic (resp. the optimistic) value iterations, i.e., the value iterations under the assumption of a competitive adversary (resp. cooperative agent), are monotone dynamical systems and are contracting with respect to the $\ell_{\infty}$-norm. Inspired by this dynamical system viewpoint, we introduce another IMDP, called the action-space relaxation IMDP. We show that the action-space relaxation IMDP has two key features: (i) its optimal value is an upper bound for the optimal value of the original IMDP, and (ii) its value iterations can be efficiently solved using tools and techniques from convex optimization. We then consider the policy optimization problems at each step of the value iterations as a feedback controller of the value function. Using this system-theoretic perspective, we propose an iteration-distributed implementation of the value iterations for approximating the optimal value of the action-space relaxation IMDP.

Consensus-Based Leader-Follower Formation Tracking for Control-Affine Nonlinear Multiagent Systems

Authors: Authors: Clinton Enwerem, John S. Baras
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.09156
Pdf link: https://arxiv.org/pdf/2309.09156
Abstract In the typical multiagent formation tracking problem centered on consensus, the prevailing assumption in the literature is that the agents' nonlinear models can be approximated by integrator systems, by their feedback-linearized equivalents, or by dynamics composed of deterministic linear and nonlinear terms. The resulting approaches associated with such assumptions, however, are hardly applicable to general nonlinear systems. To this end, we present consensus-based control laws for multiagent formation tracking in finite-dimensional state space, with the agents represented by a more general class of dynamics: control-affine nonlinear systems. The agents also exchange information via a leader-follower communication topology modeled as an undirected and connected graph with a single leader node. By leveraging standard tools from algebraic graph theory and Lyapunov analysis, we first derive a locally asymptotically stabilizing formation tracking law. Next, to demonstrate the effectiveness of our approach, we present results from numerical simulations of an example in robotics. These results -- together with a comparison of the formation errors obtained with our approach and those realized via an optimization-based method -- further validate our theoretical propositions.

Spline-Based Minimum-Curvature Trajectory Optimization for Autonomous Racing

Authors: Authors: Haoru Xue, Tianwei Yue, John M. Dolan
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.09186
Pdf link: https://arxiv.org/pdf/2309.09186
Abstract We propose a novel B-spline trajectory optimization method for autonomous racing. We consider the unavailability of sophisticated race car and race track dynamics in early-stage autonomous motorsports development and derive methods that work with limited dynamics data and additional conservative constraints. We formulate a minimum-curvature optimization problem with only the spline control points as optimization variables. We then compare the current state-of-the-art method with our optimization result, which achieves a similar level of optimality with a 90% reduction on the decision variable dimension, and in addition offers mathematical smoothness guarantee and flexible manipulation options. We concurrently reduce the problem computation time from seconds to milliseconds for a long race track, enabling future online adaptation of the previously offline technique.

MFRL-BI: Design of a Model-free Reinforcement Learning Process Control Scheme by Using Bayesian Inference

Authors: Authors: Yanrong Li, Juan Du, Wei Jiang
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.09205
Pdf link: https://arxiv.org/pdf/2309.09205
Abstract Design of process control scheme is critical for quality assurance to reduce variations in manufacturing systems. Taking semiconductor manufacturing as an example, extensive literature focuses on control optimization based on certain process models (usually linear models), which are obtained by experiments before a manufacturing process starts. However, in real applications, pre-defined models may not be accurate, especially for a complex manufacturing system. To tackle model inaccuracy, we propose a model-free reinforcement learning (MFRL) approach to conduct experiments and optimize control simultaneously according to real-time data. Specifically, we design a novel MFRL control scheme by updating the distribution of disturbances using Bayesian inference to reduce their large variations during manufacturing processes. As a result, the proposed MFRL controller is demonstrated to perform well in a nonlinear chemical mechanical planarization (CMP) process when the process model is unknown. Theoretical properties are also guaranteed when disturbances are additive. The numerical studies also demonstrate the effectiveness and efficiency of our methodology.

Neural Gradient Learning and Optimization for Oriented Point Normal Estimation

Authors: Authors: Qing Li, Huifang Feng, Kanle Shi, Yi Fang, Yu-Shen Liu, Zhizhong Han
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.09211
Pdf link: https://arxiv.org/pdf/2309.09211
Abstract We propose Neural Gradient Learning (NGL), a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation. It has excellent gradient approximation properties for the underlying geometry of the data. We utilize a simple neural network to parameterize the objective function to produce gradients at points using a global implicit representation. However, the derived gradients usually drift away from the ground-truth oriented normals due to the lack of local detail descriptions. Therefore, we introduce Gradient Vector Optimization (GVO) to learn an angular distance field based on local plane geometry to refine the coarse gradient vectors. Finally, we formulate our method with a two-phase pipeline of coarse estimation followed by refinement. Moreover, we integrate two weighting functions, i.e., anisotropic kernel and inlier score, into the optimization to improve the robust and detail-preserving performance. Our method efficiently conducts global gradient approximation while achieving better accuracy and generalization ability of local feature description. This leads to a state-of-the-art normal estimator that is robust to noise, outliers and point density variations. Extensive evaluations show that our method outperforms previous works in both unoriented and oriented normal estimation on widely used benchmarks. The source code and pre-trained models are available at https://github.com/LeoQLi/NGLO.

Fresh Multiple Access: A Unified Framework Based on Large Models and Mean-Field Approximations

Authors: Authors: Haiming Hui, Shuqi Wei, Wei Chen
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2309.09226
Pdf link: https://arxiv.org/pdf/2309.09226
Abstract Information freshness has attracted increasingly attention in the past decade as it plays a critical role in the emerging real-time applications. Age of information (AoI) holds the promise of effectively characterizing the information freshness, hence widely considered as a fundamental performance metric. However, in multiple-device scenarios, most existing works focus on the analysis and optimization of AoI based on queueing systems. The study for a unified approach for general multiple access control scheme in freshness-oriented scenarios remains open. In this paper, we take into consideration the combination of the fundamental freshness metric AoI and multiple access control schemes to achieve efficient cross-layer analysis and optimization in freshness-oriented scenarios, which is referred to as fresh multiple access. To this end, we build a unified framework with a discrete-time tandem queue model for fresh multiple access. The unified framework enables the analysis and optimization for general multiple access protocols in fresh multiple access. To handle the high dimension framework embedded in fresh multiple access, we introduce large model approaches for the Markov chain formulation in AoI oriented scenarios. Two typical AoI-based metric are studied including age of incorrect information (AoII) and peak AoII. Moreover, to address the computational complexity of the large model, we present mean-field approximations which significantly reduces the dimension of the Markov chain model by approximating the integral affect of massive devices in fresh multiple access.

Convex Latent-Optimized Adversarial Regularizers for Imaging Inverse Problems

Authors: Authors: Huayu Wang, Chen Luo, Taofeng Xie, Qiyu Jin, Guoqing Chen, Zhuo-Xu Cui, Dong Liang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2309.09250
Pdf link: https://arxiv.org/pdf/2309.09250
Abstract Recently, data-driven techniques have demonstrated remarkable effectiveness in addressing challenges related to MR imaging inverse problems. However, these methods still exhibit certain limitations in terms of interpretability and robustness. In response, we introduce Convex Latent-Optimized Adversarial Regularizers (CLEAR), a novel and interpretable data-driven paradigm. CLEAR represents a fusion of deep learning (DL) and variational regularization. Specifically, we employ a latent optimization technique to adversarially train an input convex neural network, and its set of minima can fully represent the real data manifold. We utilize it as a convex regularizer to formulate a CLEAR-informed variational regularization model that guides the solution of the imaging inverse problem on the real data manifold. Leveraging its inherent convexity, we have established the convergence of the projected subgradient descent algorithm for the CLEAR-informed regularization model. This convergence guarantees the attainment of a unique solution to the imaging inverse problem, subject to certain assumptions. Furthermore, we have demonstrated the robustness of our CLEAR-informed model, explicitly showcasing its capacity to achieve stable reconstruction even in the presence of measurement interference. Finally, we illustrate the superiority of our approach using MRI reconstruction as an example. Our method consistently outperforms conventional data-driven techniques and traditional regularization approaches, excelling in both reconstruction quality and robustness.

User Assignment and Resource Allocation for Hierarchical Federated Learning over Wireless Networks

Authors: Authors: Tinghao Zhang, Kwok-Yan Lam, Jun Zhao
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.09253
Pdf link: https://arxiv.org/pdf/2309.09253
Abstract The large population of wireless users is a key driver of data-crowdsourced Machine Learning (ML). However, data privacy remains a significant concern. Federated Learning (FL) encourages data sharing in ML without requiring data to leave users' devices but imposes heavy computation and communications overheads on mobile devices. Hierarchical FL (HFL) alleviates this problem by performing partial model aggregation at edge servers. HFL can effectively reduce energy consumption and latency through effective resource allocation and appropriate user assignment. Nevertheless, resource allocation in HFL involves optimizing multiple variables, and the objective function should consider both energy consumption and latency, making the development of resource allocation algorithms very complicated. Moreover, it is challenging to perform user assignment, which is a combinatorial optimization problem in a large search space. This article proposes a spectrum resource optimization algorithm (SROA) and a two-stage iterative algorithm (TSIA) for HFL. Given an arbitrary user assignment pattern, SROA optimizes CPU frequency, transmit power, and bandwidth to minimize system cost. TSIA aims to find a user assignment pattern that considerably reduces the total system cost. Experimental results demonstrate the superiority of the proposed HFL framework over existing studies in energy and latency reduction.

RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose Estimation

Authors: Authors: Lijun Li, Linrui Tian1, Xindi Zhang, Qi Wang, Bang Zhang, Liefeng Bo, Mengyuan Liu, Chen Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.09301
Pdf link: https://arxiv.org/pdf/2309.09301
Abstract The current interacting hand (IH) datasets are relatively simplistic in terms of background and texture, with hand joints being annotated by a machine annotator, which may result in inaccuracies, and the diversity of pose distribution is limited. However, the variability of background, pose distribution, and texture can greatly influence the generalization ability. Therefore, we present a large-scale synthetic dataset RenderIH for interacting hands with accurate and diverse pose annotations. The dataset contains 1M photo-realistic images with varied backgrounds, perspectives, and hand textures. To generate natural and diverse interacting poses, we propose a new pose optimization algorithm. Additionally, for better pose estimation accuracy, we introduce a transformer-based pose estimation network, TransHand, to leverage the correlation between interacting hands and verify the effectiveness of RenderIH in improving results. Our dataset is model-agnostic and can improve more accuracy of any hand pose estimation method in comparison to other real or synthetic datasets. Experiments have shown that pretraining on our synthetic data can significantly decrease the error from 6.76mm to 5.79mm, and our Transhand surpasses contemporary methods. Our dataset and code are available at https://github.com/adwardlee/RenderIH.

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Authors: Authors: Yuxi Ren, Jie Wu, Peng Zhang, Manlin Zhang, Xuefeng Xiao, Qian He, Rui Wang, Min Zheng, Xin Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.09310
Pdf link: https://arxiv.org/pdf/2309.09310
Abstract Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation. However, the success of these GAN models hinges on ponderous computational costs and labor-expensive training data. Current efficient GAN learning techniques often fall into two orthogonal aspects: i) model slimming via reduced calculation costs; ii)data/label-efficient learning with fewer training data/labels. To combine the best of both worlds, we propose a new learning paradigm, Unified GAN Compression (UGC), with a unified optimization objective to seamlessly prompt the synergy of model-efficient and label-efficient learning. UGC sets up semi-supervised-driven network architecture search and adaptive online semi-supervised distillation stages sequentially, which formulates a heterogeneous mutual learning scheme to obtain an architecture-flexible, label-efficient, and performance-excellent model.

Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Sentence Embeddings

Authors: Authors: Stephen Fitz
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2309.09397
Pdf link: https://arxiv.org/pdf/2309.09397
Abstract As Large Language Models are deployed within Artificial Intelligence systems, that are increasingly integrated with human society, it becomes more important than ever to study their internal structures. Higher level abilities of LLMs such as GPT-3.5 emerge in large part due to informative language representations they induce from raw text data during pre-training on trillions of words. These embeddings exist in vector spaces of several thousand dimensions, and their processing involves mapping between multiple vector spaces, with total number of parameters on the order of trillions. Furthermore, these language representations are induced by gradient optimization, resulting in a black box system that is hard to interpret. In this paper, we take a look at the topological structure of neuronal activity in the "brain" of Chat-GPT's foundation language model, and analyze it with respect to a metric representing the notion of fairness. We develop a novel approach to visualize GPT's moral dimensions. We first compute a fairness metric, inspired by social psychology literature, to identify factors that typically influence fairness assessments in humans, such as legitimacy, need, and responsibility. Subsequently, we summarize the manifold's shape using a lower-dimensional simplicial complex, whose topology is derived from this metric. We color it with a heat map associated with this fairness metric, producing human-readable visualizations of the high-dimensional sentence manifold. Our results show that sentence embeddings based on GPT-3.5 can be decomposed into two submanifolds corresponding to fair and unfair moral judgments. This indicates that GPT-based language models develop a moral dimension within their representation spaces and induce an understanding of fairness during their training process.

A Schedule of Duties in the Cloud Space Using a Modified Salp Swarm Algorithm

Authors: Authors: Hossein Jamali, Ponkoj Chandra Shill, David Feil-Seifer, Frederick C. Harris, Jr., Sergiu M. Dascalu
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.09441
Pdf link: https://arxiv.org/pdf/2309.09441
Abstract Cloud computing is a concept introduced in the information technology era, with the main components being the grid, distributed, and valuable computing. The cloud is being developed continuously and, naturally, comes up with many challenges, one of which is scheduling. A schedule or timeline is a mechanism used to optimize the time for performing a duty or set of duties. A scheduling process is accountable for choosing the best resources for performing a duty. The main goal of a scheduling algorithm is to improve the efficiency and quality of the service while at the same time ensuring the acceptability and effectiveness of the targets. The task scheduling problem is one of the most important NP-hard issues in the cloud domain and, so far, many techniques have been proposed as solutions, including using genetic algorithms (GAs), particle swarm optimization, (PSO), and ant colony optimization (ACO). To address this problem, in this paper, one of the collective intelligence algorithms, called the Salp Swarm Algorithm (SSA), has been expanded, improved, and applied. The performance of the proposed algorithm has been compared with that of GAs, PSO, continuous ACO, and the basic SSA. The results show that our algorithm has generally higher performance than the other algorithms. For example, compared to the basic SSA, the proposed method has an average reduction of approximately 21% in makespan.

Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles

Authors: Authors: Noah Golowich, Dhruv Rohatgi, Ankur Moitra
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.09457
Pdf link: https://arxiv.org/pdf/2309.09457
Abstract The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map $\phi(x, a)$ that maps state-action pairs to $d$-dimensional vectors, and that the rewards and transitions are linear functions in this representation. But where do these features come from? In the absence of expert domain knowledge, a tempting strategy is to use the ``kitchen sink" approach and hope that the true features are included in a much larger set of potential features. In this paper we revisit linear MDPs from the perspective of feature selection. In a $k$-sparse linear MDP, there is an unknown subset $S \subset [d]$ of size $k$ containing all the relevant features, and the goal is to learn a near-optimal policy in only poly$(k,\log d)$ interactions with the environment. Our main result is the first polynomial-time algorithm for this problem. In contrast, earlier works either made prohibitively strong assumptions that obviated the need for exploration, or required solving computationally intractable optimization problems. Along the way we introduce the notion of an emulator: a succinct approximate representation of the transitions that suffices for computing certain Bellman backups. Since linear MDPs are a non-parametric model, it is not even obvious whether polynomial-sized emulators exist. We show that they do exist and can be computed efficiently via convex programming. As a corollary of our main result, we give an algorithm for learning a near-optimal policy in block MDPs whose decoding function is a low-depth decision tree; the algorithm runs in quasi-polynomial time and takes a polynomial number of samples. This can be seen as a reinforcement learning analogue of classic results in computational learning theory. Furthermore, it gives a natural model where improving the sample complexity via representation learning is computationally feasible.

Stealthy Physical Masked Face Recognition Attack via Adversarial Style Optimization

Authors: Authors: Huihui Gong, Minjing Dong, Siqi Ma, Seyit Camtepe, Surya Nepal, Chang Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.09480
Pdf link: https://arxiv.org/pdf/2309.09480
Abstract Deep neural networks (DNNs) have achieved state-of-the-art performance on face recognition (FR) tasks in the last decade. In real scenarios, the deployment of DNNs requires taking various face accessories into consideration, like glasses, hats, and masks. In the COVID-19 pandemic era, wearing face masks is one of the most effective ways to defend against the novel coronavirus. However, DNNs are known to be vulnerable to adversarial examples with a small but elaborated perturbation. Thus, a facial mask with adversarial perturbations may pose a great threat to the widely used deep learning-based FR models. In this paper, we consider a challenging adversarial setting: targeted attack against FR models. We propose a new stealthy physical masked FR attack via adversarial style optimization. Specifically, we train an adversarial style mask generator that hides adversarial perturbations inside style masks. Moreover, to ameliorate the phenomenon of sub-optimization with one fixed style, we propose to discover the optimal style given a target through style optimization in a continuous relaxation manner. We simultaneously optimize the generator and the style selection for generating strong and stealthy adversarial style masks. We evaluated the effectiveness and transferability of our proposed method via extensive white-box and black-box digital experiments. Furthermore, we also conducted physical attack experiments against local FR models and online platforms.

LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

Authors: Authors: Zecheng Tang, Chenfei Wu, Juntao Li, Nan Duan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2309.09506
Pdf link: https://arxiv.org/pdf/2309.09506
Abstract Graphic layout generation, a growing research field, plays a significant role in user engagement and information perception. Existing methods primarily treat layout generation as a numerical optimization task, focusing on quantitative aspects while overlooking the semantic information of layout, such as the relationship between each layout element. In this paper, we propose LayoutNUWA, the first model that treats layout generation as a code generation task to enhance semantic information and harness the hidden layout expertise of large language models~(LLMs). More concretely, we develop a Code Instruct Tuning (CIT) approach comprising three interconnected modules: 1) the Code Initialization (CI) module quantifies the numerical conditions and initializes them as HTML code with strategically placed masks; 2) the Code Completion (CC) module employs the formatting knowledge of LLMs to fill in the masked portions within the HTML code; 3) the Code Rendering (CR) module transforms the completed code into the final layout output, ensuring a highly interpretable and transparent layout generation procedure that directly maps code to a visualized layout. We attain significant state-of-the-art performance (even over 50% improvements) on multiple datasets, showcasing the strong capabilities of LayoutNUWA. Our code is available at https://github.com/ProjectNUWA/LayoutNUWA.

Pruning Large Language Models via Accuracy Predictor

Authors: Authors: Yupeng Ji, Yibo Cao, Jiucai Liu
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2309.09507
Pdf link: https://arxiv.org/pdf/2309.09507
Abstract Large language models(LLMs) containing tens of billions of parameters (or even more) have demonstrated impressive capabilities in various NLP tasks. However, substantial model size poses challenges to training, inference, and deployment so that it is necessary to compress the model. At present, most model compression for LLMs requires manual design of pruning features, which has problems such as complex optimization pipeline and difficulty in retaining the capabilities of certain parts of the model.Therefore, we propose a novel pruning approach: firstly, a training set of a certain number of architecture-accuracy pairs is established, and then a non-neural model is trained as an accuracy predictor. Using the accuracy predictor to further optimize the search space and search, the optimal model can be automatically selected. Experiments show that our proposed approach is effective and efficient. Compared with the baseline, the perplexity(PPL) on Wikitext2 and PTB dropped by 9.48% and 5,76% respectively, and the average accuracy of MMLU increased by 6.28%.

Proof-of-Prospect-Theory: A Novel Game-based Consensus Mechanism for Blockchain

Authors: Authors: Yuqi Xie, Changbing Tang, Feilong Lin, Guanrong Chen, Zhao Zhang, Zhonglong Zheng
Subjects: Computational Engineering, Finance, and Science (cs.CE); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.09529
Pdf link: https://arxiv.org/pdf/2309.09529
Abstract Blockchain technology is a breakthrough in changing the ways of business and organization operations, in which the consensus problem is challenging with practical constraints, such as computational power and consensus standard. In this paper, a novel consensus mechanism named Proof-of-Prospect-Theory (PoPT) is designed from the view of game theory, where the game prospect value is considered as an important election criterion of the block-recorder. PoPT portrays the popularity of a node in the network as an attribute, which is constituted by the subjective sensibilities of nodes. Furthermore, the performances of the PoPT and the willingness of ordinary nodes to participate in the consensus are analyzed, exploring fairness, decentralization, credibility, and the motivating ability of the consensus mechanism. Finally, numerical simulations with optimization of the PoPT consensus mechanism are demonstrated in the scenario of a smart grid system to illustrate the effectiveness of the PoPT.

Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation

Authors: Authors: Huan Liu, Zichang Tan, Qiang Chen, Yunchao Wei, Yao Zhao, Jingdong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.09667
Pdf link: https://arxiv.org/pdf/2309.09667
Abstract Detecting and grounding multi-modal media manipulation (DGM^4) has become increasingly crucial due to the widespread dissemination of face forgery and text misinformation. In this paper, we present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM^4 problem. Unlike previous state-of-the-art methods that solely focus on the image (RGB) domain to describe visual forgery features, we additionally introduce the frequency domain as a complementary viewpoint. By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts. Then, our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands. Moreover, to address the semantic conflicts between image and frequency domains, the forgery-aware mutual module is developed to further enable the effective interaction of disparate image and frequency features, resulting in aligned and comprehensive visual forgery representations. Finally, based on visual and textual forgery features, we propose a unified decoder that comprises two symmetric cross-modal interaction modules responsible for gathering modality-specific forgery information, along with a fusing interaction module for aggregation of both modalities. The proposed unified decoder formulates our UFAFormer as a unified framework, ultimately simplifying the overall architecture and facilitating the optimization process. Experimental results on the DGM^4 dataset, containing several perturbations, demonstrate the superior performance of our framework compared to previous methods, setting a new benchmark in the field.

Distributed course allocation with asymmetric friendships

Authors: Authors: Ilya Khakhiashvili, Lihi Dery, Tal Grinshpoun
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2309.09684
Pdf link: https://arxiv.org/pdf/2309.09684
Abstract Students' decisions on whether to take a class are strongly affected by whether their friends plan to take the class with them. A student may prefer to be assigned to a course they likes less, just to be with their friends, rather than taking a more preferred class alone. It has been shown that taking classes with friends positively affects academic performance. Thus, academic institutes should prioritize friendship relations when assigning course seats. The introduction of friendship relations results in several non-trivial changes to current course allocation methods. This paper explores how course allocation mechanisms can account for friendships between students and provide a unique, distributed solution. In particular, we model the problem as an asymmetric distributed constraint optimization problem and develop a new dedicated algorithm. Our extensive evaluation includes both simulated data and data derived from a user study on 177 students' preferences over courses and friends. The results show that our algorithm obtains high utility for the students while keeping the solution fair and observing courses' seat capacity limitations.

Securing Fixed Neural Network Steganography

Authors: Authors: Zicong Luo, Sheng Li, Guobiao Li, Zhenxing Qian, Xinpeng Zhang
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.09700
Pdf link: https://arxiv.org/pdf/2309.09700
Abstract Image steganography is the art of concealing secret information in images in a way that is imperceptible to unauthorized parties. Recent advances show that is possible to use a fixed neural network (FNN) for secret embedding and extraction. Such fixed neural network steganography (FNNS) achieves high steganographic performance without training the networks, which could be more useful in real-world applications. However, the existing FNNS schemes are vulnerable in the sense that anyone can extract the secret from the stego-image. To deal with this issue, we propose a key-based FNNS scheme to improve the security of the FNNS, where we generate key-controlled perturbations from the FNN for data embedding. As such, only the receiver who possesses the key is able to correctly extract the secret from the stego-image using the FNN. In order to improve the visual quality and undetectability of the stego-image, we further propose an adaptive perturbation optimization strategy by taking the perturbation cost into account. Experimental results show that our proposed scheme is capable of preventing unauthorized secret extraction from the stego-images. Furthermore, our scheme is able to generate stego-images with higher visual quality than the state-of-the-art FNNS scheme, especially when the FNN is a neural network for ordinary learning tasks.

Multi-Dictionary Tensor Decomposition

Authors: Authors: Maxwell McNeil, Petko Bogdanov
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.09717
Pdf link: https://arxiv.org/pdf/2309.09717
Abstract Tensor decomposition methods are popular tools for analysis of multi-way datasets from social media, healthcare, spatio-temporal domains, and others. Widely adopted models such as Tucker and canonical polyadic decomposition (CPD) follow a data-driven philosophy: they decompose a tensor into factors that approximate the observed data well. In some cases side information is available about the tensor modes. For example, in a temporal user-item purchases tensor a user influence graph, an item similarity graph, and knowledge about seasonality or trends in the temporal mode may be available. Such side information may enable more succinct and interpretable tensor decomposition models and improved quality in downstream tasks. We propose a framework for Multi-Dictionary Tensor Decomposition (MDTD) which takes advantage of prior structural information about tensor modes in the form of coding dictionaries to obtain sparsely encoded tensor factors. We derive a general optimization algorithm for MDTD that handles both complete input and input with missing values. Our framework handles large sparse tensors typical to many real-world application domains. We demonstrate MDTD's utility via experiments with both synthetic and real-world datasets. It learns more concise models than dictionary-free counterparts and improves (i) reconstruction quality ($60%$ fewer non-zero coefficients coupled with smaller error); (ii) missing values imputation quality (two-fold MSE reduction with up to orders of magnitude time savings) and (iii) the estimation of the tensor rank. MDTD's quality improvements do not come with a running time premium: it can decompose $19GB$ datasets in less than a minute. It can also impute missing values in sparse billion-entry tensors more accurately and scalably than state-of-the-art competitors.

Learning Covariances for Estimation with Constrained Bilevel Optimization

Authors: Authors: Mohamad Qadri, Zachary Manchester, Michael Kaess
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09718
Pdf link: https://arxiv.org/pdf/2309.09718
Abstract We consider the problem of learning error covariance matrices for robotic state estimation. The convergence of a state estimator to the correct belief over the robot state is dependent on the proper tuning of noise models. During inference, these models are used to weigh different blocks of the Jacobian and error vector resulting from linearization and hence, additionally affect the stability and convergence of the non-linear system. We propose a gradient-based method to estimate well-conditioned covariance matrices by formulating the learning process as a constrained bilevel optimization problem over factor graphs. We evaluate our method against baselines across a range of simulated and real-world tasks and demonstrate that our technique converges to model estimates that lead to better solutions as evidenced by the improved tracking accuracy on unseen test trajectories.

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

Authors: Authors: Hao Sun, Li Shen, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, Dacheng Tao
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.09719
Pdf link: https://arxiv.org/pdf/2309.09719
Abstract Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same learning rate across different clients. But they are not efficient when data is heterogeneous. To maximize the performance of optimization methods, the main challenge is how to adjust the learning rate without hurting the convergence. In this paper, we propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate based on local historical gradient squares and synchronized learning rates. Theoretical analysis shows that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients, which enables promising scalability in federated optimization. We also empirically compare our method with several communication-efficient federated optimization methods. Extensive experimental results on Computer Vision (CV) tasks and Natural Language Processing (NLP) task show the efficacy of our proposed FedLALR method and also coincides with our theoretical findings.

Significant improvement of lossy compression rate and speed of HPC data using perceptron parallelized compression

Authors: Authors: Xinzhe Chen, Jianjiang Li
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2309.09778
Pdf link: https://arxiv.org/pdf/2309.09778
Abstract The escalating surge in data generation presents formidable challenges to information technology, necessitating advancements in storage, retrieval, and utilization. With the proliferation of artificial intelligence and big data, the "Data Age 2025" report forecasts an exponential increase in global data production. The escalating data volumes raise concerns about efficient data processing. The paper addresses the predicament of achieving a lower compression ratio while maintaining or surpassing the compression performance of state-of-the-art techniques. This paper introduces a lossy compression framework grounded in the perceptron model for data prediction, striving for high compression quality. The contributions of this study encompass the introduction of positive and negative factors within the relative-to-absolute domain transformation algorithm, the utilization of a three-layer perceptron for improved predictive accuracy, and data selection rule modifications for parallelized compression within compression blocks. Comparative experiments with SZ2.1's PW_REL mode demonstrate a maximum compression ratio reduction of 17.78%. The article is structured as follows: the introduction highlights the data explosion challenge; related work delves into existing solutions; optimization of mapping algorithms in the relative and absolute domains is expounded in Section 3,the design of the new compression framework is detailed in Section 4,In Section 5 we describe the whole process and give pseudo-code, and in Section 6, our solution is evaluated. Finally, in Section 7, we provide an outlook for future work.

DFL-TORO: A One-Shot Demonstration Framework for Learning Time-Optimal Robotic Manufacturing Tasks

Authors: Authors: Alireza Barekatain, Hamed Habibi, Holger Voos
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09802
Pdf link: https://arxiv.org/pdf/2309.09802
Abstract This paper presents DFL-TORO, a novel Demonstration Framework for Learning Time-Optimal Robotic tasks via One-shot kinesthetic demonstration. It aims at optimizing the process of Learning from Demonstration (LfD), applied in the manufacturing sector. As the effectiveness of LfD is challenged by the quality and efficiency of human demonstrations, our approach offers a streamlined method to intuitively capture task requirements from human teachers, by reducing the need for multiple demonstrations. Furthermore, we propose an optimization-based smoothing algorithm that ensures time-optimal and jerk-regulated demonstration trajectories, while also adhering to the robot's kinematic constraints. The result is a significant reduction in noise, thereby boosting the robot's operation efficiency. Evaluations using a Franka Emika Research 3 (FR3) robot for a reaching task further substantiate the efficacy of our framework, highlighting its potential to transform kinesthetic demonstrations in contemporary manufacturing environments.

Energy Management of Hydrogen Hybrid Electric Vehicles -- A Potential Study

Authors: Authors: David Theodor Machacek, Nazim Ozan Yazar, Thomas Huber, Christopher Harald Onder
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.09804
Pdf link: https://arxiv.org/pdf/2309.09804
Abstract The hydrogen combustion engine (H$_2$ICE) is known to be able to burn H$_2$ under ultra-lean conditions, while producing no CO$_2$ emissions and extremely low engine-out NO$_x^{\mathrm{eo}}$ emissions. Immediate goals, as for instance the upcoming EURO 7 NO$_x$ limitations, can be reached more easily as extremely low engine-out NO$_x^{\mathrm{eo}}$ emissions facilitate the reduction of the overall tailpipe NO$_x^{\mathrm{tp}}$ emissions. In this work, the feasibility of achieving consistent reductions in NO$_x^{\mathrm{eo}}$ emissions through the implementation of electric hybridization of an H$_2$ICE-equipped passenger car (H$_2$-HEV), combined with a dedicated energy management strategy (EMS) is discussed. In particular, the mixed H$_2$-HEV architecture is investigated and compared to a series H$_2$-HEV, a parallel H$_2$-HEV, and a base H$_2$-vehicle, which is only equipped with an H$_2$ICE. For hybrid vehicles, a low H$_2$ consumption and low NO$_x^{\mathrm{eo}}$ emissions are conflicting objectives, the trade-off of which depends on the EMS and can be represented as a Pareto front. Overall, through the utilization of a dedicated energy management calibration, the mixed H$_2$-HEV demonstrates the capability to consistently achieve extremely low engine-out NO$_x^{\mathrm{eo}}$ emissions. For a broad range of driving missions, the mixed H$2$-HEV is able to decrease the engine-out NO$\mathrm{x}^\mathrm{eo}$ emissions by more than 90%, while, at the same time, the H$_2$ consumption is decreased by over 16%, compared to a comparable non-hybridized H$_2$-vehicle. These significant emission reductions are possible without having to modify the exhaust-gas aftertreatment system, or the optimization of any of the individual drivetrain components, but solely by setting the EMS calibration accordingly.

Coco-LIC: Continuous-Time Tightly-Coupled LiDAR-Inertial-Camera Odometry using Non-Uniform B-spline

Authors: Authors: Xiaolei Lang, Chao Chen, Kai Tang, Yukai Ma, Jiajun Lv, Yong Liu, Xingxing Zuo
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09808
Pdf link: https://arxiv.org/pdf/2309.09808
Abstract In this paper, we propose an efficient continuous-time LiDAR-Inertial-Camera Odometry, utilizing non-uniform B-splines to tightly couple measurements from the LiDAR, IMU, and camera. In contrast to uniform B-spline-based continuous-time methods, our non-uniform B-spline approach offers significant advantages in terms of achieving real-time efficiency and high accuracy. This is accomplished by dynamically and adaptively placing control points, taking into account the varying dynamics of the motion. To enable efficient fusion of heterogeneous LiDAR-Inertial-Camera data within a short sliding-window optimization, we assign depth to visual pixels using corresponding map points from a global LiDAR map, and formulate frame-to-map reprojection factors for the associated pixels in the current image frame. This way circumvents the necessity for depth optimization of visual pixels, which typically entails a lengthy sliding window with numerous control points for continuous-time trajectory estimation. We conduct dedicated experiments on real-world datasets to demonstrate the advantage and efficacy of adopting non-uniform continuous-time trajectory representation. Our LiDAR-Inertial-Camera odometry system is also extensively evaluated on both challenging scenarios with sensor degenerations and large-scale scenarios, and has shown comparable or higher accuracy than the state-of-the-art methods. The codebase of this paper will also be open-sourced at https://github.com/APRIL-ZJU/Coco-LIC.

Learning Inertial Parameter Identification of Unknown Object with Humanoid Robot using Sim-to-Real Adaptation

Authors: Authors: Donghoon Baek, Bo Peng, Saurabh Gupta, Joao Ramos
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09810
Pdf link: https://arxiv.org/pdf/2309.09810
Abstract Understanding the dynamics of unknown object is crucial for collaborative robots including humanoids to more safely and accurately interact with humans. Most relevant literature leverage a force/torque sensor, prior knowledge of object, vision system, and a long-horizon trajectory which are often impractical. Moreover, these methods often entail solving non-linear optimization problem, sometimes yielding physically inconsistent results. In this work, we propose a fast learningbased inertial parameter estimation as more practical manner. We acquire a reliable dataset in a high-fidelity simulation and train a time-series data-driven regression model (e.g., LSTM) to estimate the inertial parameter of unknown objects. We also introduce a novel sim-to-real adaptation method combining Robot System Identification and Gaussian Processes to directly transfer the trained model to real-world application. We demonstrate our method with a 4-DOF single manipulator of physical wheeled humanoid robot, SATYRR. Results show that our method can identify the inertial parameters of various unknown objects faster and more accurately than conventional methods.

DynaPix SLAM: A Pixel-Based Dynamic SLAM Approach

Authors: Authors: Chenghao Xu, Elia Bonetto, Aamir Ahmad
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09879
Pdf link: https://arxiv.org/pdf/2309.09879
Abstract In static environments, visual simultaneous localization and mapping (V-SLAM) methods achieve remarkable performance. However, moving objects severely affect core modules of such systems like state estimation and loop closure detection. To address this, dynamic SLAM approaches often use semantic information, geometric constraints, or optical flow to mask features associated with dynamic entities. These are limited by various factors such as a dependency on the quality of the underlying method, poor generalization to unknown or unexpected moving objects, and often produce noisy results, e.g. by masking static but movable objects or making use of predefined thresholds. In this paper, to address these trade-offs, we introduce a novel visual SLAM system, DynaPix, based on per-pixel motion probability values. Our approach consists of a new semantic-free probabilistic pixel-wise motion estimation module and an improved pose optimization process. Our per-pixel motion probability estimation combines a novel static background differencing method on both images and optical flows from splatted frames. DynaPix fully integrates those motion probabilities into both map point selection and weighted bundle adjustment within the tracking and optimization modules of ORB-SLAM2. We evaluate DynaPix against ORB-SLAM2 and DynaSLAM on both GRADE and TUM-RGBD datasets, obtaining lower errors and longer trajectory tracking times. We will release both source code and data upon acceptance of this work.

Differentiable Boustrophedon Path Plans

Authors: Authors: Thomas Manzini, Robin Murphy
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09882
Pdf link: https://arxiv.org/pdf/2309.09882
Abstract This paper introduces a differentiable representation for optimization of boustrophedon path plans in convex polygons, explores an additional parameter of these path plans that can be optimized, discusses the properties of this representation that can be leveraged during the optimization process, and shows that the previously published attempt at optimization of these path plans was too coarse to be practically useful. Experiments were conducted to show that this differentiable representation can reproduce the same scores from transitional discrete representations of boustrophedon path plans with high fidelity. Finally, optimization via gradient descent was attempted, but found to fail because the search space is far more non-convex than was previously considered in the literature. The wide range of applications for boustrophedon path plans means that this work has the potential to improve path planning efficiency in numerous areas of robotics including mapping and search tasks using uncrewed aerial systems, environmental sampling tasks using uncrewed marine vehicles, and agricultural tasks using ground vehicles, among numerous others applications.

Recycling Krylov Subspaces for Efficient Partitioned Solution of Aerostructural Adjoint Systems

Authors: Authors: Christophe Blondeau
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2309.09925
Pdf link: https://arxiv.org/pdf/2309.09925
Abstract Robust and efficient solvers for coupled-adjoint linear systems are crucial to successful aerostructural optimization. Monolithic and partitioned strategies can be applied. The monolithic approach is expected to offer better robustness and efficiency for strong fluid-structure interactions. However, it requires a high implementation cost and convergence may depend on appropriate scaling and initialization strategies. On the other hand, the modularity of the partitioned method enables a straightforward implementation while its convergence may require relaxation. In addition, a partitioned solver leads to a higher number of iterations to get the same level of convergence as the monolithic one. The objective of this paper is to accelerate the fluid-structure coupled-adjoint partitioned solver by considering techniques borrowed from approximate invariant subspace recycling strategies adapted to sequences of linear systems with varying right-hand sides. Indeed, in a partitioned framework, the structural source term attached to the fluid block of equations affects the right-hand side with the nice property of quickly converging to a constant value. We also consider deflation of approximate eigenvectors in conjunction with advanced inner-outer Krylov solvers for the fluid block equations. We demonstrate the benefit of these techniques by computing the coupled derivatives of an aeroelastic configuration of the ONERA-M6 fixed wing in transonic flow. For this exercise the fluid grid was coupled to a structural model specifically designed to exhibit a high flexibility. All computations are performed using RANS flow modeling and a fully linearized one-equation Spalart-Allmaras turbulence model. Numerical simulations show up to 39% reduction in matrix-vector products for GCRO-DR and up to 19% for the nested FGCRO-DR solver.

Hierarchical Attention and Graph Neural Networks: Toward Drift-Free Pose Estimation

Authors: Authors: Kathia Melbouci, Fawzi Nashashibi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09934
Pdf link: https://arxiv.org/pdf/2309.09934
Abstract The most commonly used method for addressing 3D geometric registration is the iterative closet-point algorithm, this approach is incremental and prone to drift over multiple consecutive frames. The Common strategy to address the drift is the pose graph optimization subsequent to frame-to-frame registration, incorporating a loop closure process that identifies previously visited places. In this paper, we explore a framework that replaces traditional geometric registration and pose graph optimization with a learned model utilizing hierarchical attention mechanisms and graph neural networks. We propose a strategy to condense the data flow, preserving essential information required for the precise estimation of rigid poses. Our results, derived from tests on the KITTI Odometry dataset, demonstrate a significant improvement in pose estimation accuracy. This improvement is especially notable in determining rotational components when compared with results obtained through conventional multi-way registration via pose graph optimization. The code will be made available upon completion of the review process.

Keyword: adam

High-order BDF convolution quadrature for fractional evolution equations with hyper-singular source term

Authors: Authors: Jiankang Shi, Minghua Chen, Jianxiong Cao
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2309.09664
Pdf link: https://arxiv.org/pdf/2309.09664
Abstract Anomalous diffusion in the presence or absence of an external force field is often modelled in terms of the fractional evolution equations, which can involve the hyper-singular source term. For this case, conventional time stepping methods may exhibit a severe order reduction. Although a second-order numerical algorithm is provided for the subdiffusion model with a simple hyper-singular source term $t^{\mu}$, $-2<\mu<-1$ in [arXiv:2207.08447], the convergence analysis remain to be proved. To fill in these gaps, we present a simple and robust smoothing method for the hyper-singular source term, where the Hadamard finite-part integral is introduced. This method is based on the smoothing/ID$m$-BDF$k$ method proposed by the authors [Shi and Chen, SIAM J. Numer. Anal., to appear] for subdiffusion equation with a weakly singular source term. We prove that the $k$th-order convergence rate can be restored for the diffusion-wave case $\gamma \in (1,2)$ and sketch the proof for the subdiffusion case $\gamma \in (0,1)$, even if the source term is hyper-singular and the initial data is not compatible. Numerical experiments are provided to confirm the theoretical results.

Multi-turn Dialogue Comprehension from a Topic-aware Perspective

Authors: Authors: Xinbei Ma, Yi Xu, Hai Zhao, Zhuosheng Zhang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2309.09666
Pdf link: https://arxiv.org/pdf/2309.09666
Abstract Dialogue related Machine Reading Comprehension requires language models to effectively decouple and model multi-turn dialogue passages. As a dialogue development goes after the intentions of participants, its topic may not keep constant through the whole passage. Hence, it is non-trivial to detect and leverage the topic shift in dialogue modeling. Topic modeling, although has been widely studied in plain text, deserves far more utilization in dialogue reading comprehension. This paper proposes to model multi-turn dialogues from a topic-aware perspective. We start with a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way. Then we use these fragments as topic-aware language processing units in further dialogue comprehension. On one hand, the split segments indict specific topics rather than mixed intentions, thus showing convenient on in-domain topic detection and location. For this task, we design a clustering system with a self-training auto-encoder, and we build two constructed datasets for evaluation. On the other hand, the split segments are an appropriate element of multi-turn dialogue response selection. For this purpose, we further present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements and matches response candidates with a dual cross-attention. Empirical studies on three public benchmarks show great improvements over baselines. Our work continues the previous studies on document topic, and brings the dialogue modeling to a novel topic-aware perspective with exhaustive experiments and analyses.

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

Authors: Authors: Hao Sun, Li Shen, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, Dacheng Tao
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.09719
Pdf link: https://arxiv.org/pdf/2309.09719
Abstract Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same learning rate across different clients. But they are not efficient when data is heterogeneous. To maximize the performance of optimization methods, the main challenge is how to adjust the learning rate without hurting the convergence. In this paper, we propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate based on local historical gradient squares and synchronized learning rates. Theoretical analysis shows that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients, which enables promising scalability in federated optimization. We also empirically compare our method with several communication-efficient federated optimization methods. Extensive experimental results on Computer Vision (CV) tasks and Natural Language Processing (NLP) task show the efficacy of our proposed FedLALR method and also coincides with our theoretical findings.

Keyword: gradient

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Authors: Authors: Yi Shen, Pan Xu, Michael M. Zavlanos
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.08748
Pdf link: https://arxiv.org/pdf/2309.08748
Abstract Without direct interaction with the environment. Often, the environment in which the data are collected differs from the environment in which the learned policy is applied. To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed that compute worst-case bounds on the policy values assuming that the distribution of the new environment lies within an uncertainty set. Typically, this uncertainty set is defined based on the KL divergence around the empirical distribution computed from the logging dataset. However, the KL uncertainty set fails to encompass distributions with varying support and lacks awareness of the geometry of the distribution support. As a result, KL approaches fall short in addressing practical environment mismatches and lead to over-fitting to worst-case scenarios. To overcome these limitations, we propose a novel DRO approach that employs the Wasserstein distance instead. While Wasserstein DRO is generally computationally more expensive compared to KL DRO, we present a regularized method and a practical (biased) stochastic gradient descent method to optimize the policy efficiently. We also provide a theoretical analysis of the finite sample complexity and iteration complexity for our proposed method. We further validate our approach using a public dataset that was recorded in a randomized stoke trial.

Control Barrier Function for Linearizable Systems with High Relative Degrees from Signal Temporal Logics: A Reference Governor Approach

Authors: Authors: Kaier Liang, Mingyu Cai, Cristian-Ioan Vasile
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.08813
Pdf link: https://arxiv.org/pdf/2309.08813
Abstract This paper considers the safety-critical navigation problem with Signal Temporal Logic (STL) tasks. We developed an explicit reference governor-guided control barrier function (ERG-guided CBF) method that enables the application of first-order CBFs to high-order linearizable systems. This method significantly reduces the conservativeness of the existing CBF approaches for high-order systems. Furthermore, our framework provides safety-critical guarantees in the sense of obstacle avoidance by constructing the margin of safety and updating direction of safe evolution in the agent's state space. To improve control performance and enhance STL satisfaction, we employ efficient gradient-based methods for iteratively learning optimal parameters of ERG-guided CBF. We validate the algorithm through both high-order linear and nonlinear systems. A video demonstration can be found on: \url{https://youtu.be/ZRmsA2FeFR4}

GRaCE: Optimizing Grasps to Satisfy Ranked Criteria in Complex Scenario

Authors: Authors: Tasbolat Taunyazov, Kelvin Lin, Harold Soh
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08887
Pdf link: https://arxiv.org/pdf/2309.08887
Abstract This paper addresses the multi-faceted problem of robot grasping, where multiple criteria may conflict and differ in importance. We introduce Grasp Ranking and Criteria Evaluation (GRaCE), a novel approach that employs hierarchical rule-based logic and a rank-preserving utility function to optimize grasps based on various criteria such as stability, kinematic constraints, and goal-oriented functionalities. Additionally, we propose GRaCE-OPT, a hybrid optimization strategy that combines gradient-based and gradient-free methods to effectively navigate the complex, non-convex utility function. Experimental results in both simulated and real-world scenarios show that GRaCE requires fewer samples to achieve comparable or superior performance relative to existing methods. The modular architecture of GRaCE allows for easy customization and adaptation to specific application needs.

GCL: Gradient-Guided Contrastive Learning for Medical Image Segmentation with Multi-Perspective Meta Labels

Authors: Authors: Yixuan Wu, Jintai Chen, Jiahuan Yan, Yiheng Zhu, Danny Z. Chen, Jian Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.08888
Pdf link: https://arxiv.org/pdf/2309.08888
Abstract Since annotating medical images for segmentation tasks commonly incurs expensive costs, it is highly desirable to design an annotation-efficient method to alleviate the annotation burden. Recently, contrastive learning has exhibited a great potential in learning robust representations to boost downstream tasks with limited labels. In medical imaging scenarios, ready-made meta labels (i.e., specific attribute information of medical images) inherently reveal semantic relationships among images, which have been used to define positive pairs in previous work. However, the multi-perspective semantics revealed by various meta labels are usually incompatible and can incur intractable "semantic contradiction" when combining different meta labels. In this paper, we tackle the issue of "semantic contradiction" in a gradient-guided manner using our proposed Gradient Mitigator method, which systematically unifies multi-perspective meta labels to enable a pre-trained model to attain a better high-level semantic recognition ability. Moreover, we emphasize that the fine-grained discrimination ability is vital for segmentation-oriented pre-training, and develop a novel method called Gradient Filter to dynamically screen pixel pairs with the most discriminating power based on the magnitude of gradients. Comprehensive experiments on four medical image segmentation datasets verify that our new method GCL: (1) learns informative image representations and considerably boosts segmentation performance with limited labels, and (2) shows promising generalizability on out-of-distribution datasets.

Efficient Methods for Non-stationary Online Learning

Authors: Authors: Peng Zhao, Yan-Feng Xie, Lijun Zhang, Zhi-Hua Zhou
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.08911
Pdf link: https://arxiv.org/pdf/2309.08911
Abstract Non-stationary online learning has drawn much attention in recent years. In particular, dynamic regret and adaptive regret are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of the non-stationarity, in which a group of base-learners are maintained and a meta-algorithm is employed to track the best one on the fly. However, the two-layer structure raises the concern about the computational complexity -- those methods typically maintain $\mathcal{O}(\log T)$ base-learners simultaneously for a $T$-round online game and thus perform multiple projections onto the feasible domain per round, which becomes the computational bottleneck when the domain is complicated. In this paper, we present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $\mathcal{O}(\log T)$ to $1$. Moreover, our obtained algorithms require only one gradient query and one function evaluation at each round. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods. Empirical studies verify our theoretical findings.

Solving Quadratic Systems with Full-Rank Matrices Using Sparse or Generative Priors

Authors: Authors: Junren Chen, Shuai Huang, Michael K. Ng, Zhaoqiang Liu
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.09032
Pdf link: https://arxiv.org/pdf/2309.09032
Abstract The problem of recovering a signal $\boldsymbol{x} \in \mathbb{R}^n$ from a quadratic system ${y_i=\boldsymbol{x}^\top\boldsymbol{A}_i\boldsymbol{x},\ i=1,\ldots,m}$ with full-rank matrices $\boldsymbol{A}_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. With i.i.d. standard Gaussian matrices $\boldsymbol{A}_i$, this paper addresses the high-dimensional case where $m\ll n$ by incorporating prior knowledge of $\boldsymbol{x}$. First, we consider a $k$-sparse $\boldsymbol{x}$ and introduce the thresholded Wirtinger flow (TWF) algorithm that does not require the sparsity level $k$. TWF comprises two steps: the spectral initialization that identifies a point sufficiently close to $\boldsymbol{x}$ (up to a sign flip) when $m=O(k^2\log n)$, and the thresholded gradient descent (with a good initialization) that produces a sequence linearly converging to $\boldsymbol{x}$ with $m=O(k\log n)$ measurements. Second, we explore the generative prior, assuming that $\boldsymbol{x}$ lies in the range of an $L$-Lipschitz continuous generative model with $k$-dimensional inputs in an $\ell_2$-ball of radius $r$. We develop the projected gradient descent (PGD) algorithm that also comprises two steps: the projected power method that provides an initial vector with $O\big(\sqrt{\frac{k \log L}{m}}\big)$ $\ell_2$-error given $m=O(k\log(Lnr))$ measurements, and the projected gradient descent that refines the $\ell_2$-error to $O(\delta)$ at a geometric rate when $m=O(k\log\frac{Lrn}{\delta^2})$. Experimental results corroborate our theoretical findings and show that: (i) our approach for the sparse case notably outperforms the existing provable algorithm sparse power factorization; (ii) leveraging the generative prior allows for precise image recovery in the MNIST dataset from a small number of quadratic measurements.

Neural Gradient Learning and Optimization for Oriented Point Normal Estimation

Authors: Authors: Qing Li, Huifang Feng, Kanle Shi, Yi Fang, Yu-Shen Liu, Zhizhong Han
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.09211
Pdf link: https://arxiv.org/pdf/2309.09211
Abstract We propose Neural Gradient Learning (NGL), a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation. It has excellent gradient approximation properties for the underlying geometry of the data. We utilize a simple neural network to parameterize the objective function to produce gradients at points using a global implicit representation. However, the derived gradients usually drift away from the ground-truth oriented normals due to the lack of local detail descriptions. Therefore, we introduce Gradient Vector Optimization (GVO) to learn an angular distance field based on local plane geometry to refine the coarse gradient vectors. Finally, we formulate our method with a two-phase pipeline of coarse estimation followed by refinement. Moreover, we integrate two weighting functions, i.e., anisotropic kernel and inlier score, into the optimization to improve the robust and detail-preserving performance. Our method efficiently conducts global gradient approximation while achieving better accuracy and generalization ability of local feature description. This leads to a state-of-the-art normal estimator that is robust to noise, outliers and point density variations. Extensive evaluations show that our method outperforms previous works in both unoriented and oriented normal estimation on widely used benchmarks. The source code and pre-trained models are available at https://github.com/LeoQLi/NGLO.

Convex Latent-Optimized Adversarial Regularizers for Imaging Inverse Problems

Authors: Authors: Huayu Wang, Chen Luo, Taofeng Xie, Qiyu Jin, Guoqing Chen, Zhuo-Xu Cui, Dong Liang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2309.09250
Pdf link: https://arxiv.org/pdf/2309.09250
Abstract Recently, data-driven techniques have demonstrated remarkable effectiveness in addressing challenges related to MR imaging inverse problems. However, these methods still exhibit certain limitations in terms of interpretability and robustness. In response, we introduce Convex Latent-Optimized Adversarial Regularizers (CLEAR), a novel and interpretable data-driven paradigm. CLEAR represents a fusion of deep learning (DL) and variational regularization. Specifically, we employ a latent optimization technique to adversarially train an input convex neural network, and its set of minima can fully represent the real data manifold. We utilize it as a convex regularizer to formulate a CLEAR-informed variational regularization model that guides the solution of the imaging inverse problem on the real data manifold. Leveraging its inherent convexity, we have established the convergence of the projected subgradient descent algorithm for the CLEAR-informed regularization model. This convergence guarantees the attainment of a unique solution to the imaging inverse problem, subject to certain assumptions. Furthermore, we have demonstrated the robustness of our CLEAR-informed model, explicitly showcasing its capacity to achieve stable reconstruction even in the presence of measurement interference. Finally, we illustrate the superiority of our approach using MRI reconstruction as an example. Our method consistently outperforms conventional data-driven techniques and traditional regularization approaches, excelling in both reconstruction quality and robustness.

A Distributed Strategy to Maximize Coverage in a Heterogeneous Sensor Network in the Presence of Obstacles

Authors: Authors: Hesam Mosalli, Amir G. Aghdam
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.09363
Pdf link: https://arxiv.org/pdf/2309.09363
Abstract In this paper, an efficient deployment strategy is proposed for a network of mobile and static sensors with nonidentical sensing and communication radii. The multiplicatively weighted Voronoi (MW-Voronoi) diagram is used to partition the field and assign the underlying coverage task to each mobile sensor. A gradient-based method is applied to find the best candidate point based on the detected coverage holes and the coverage priority considering the relative distance of the mobile sensor from the static ones and the obstacles in the field. The sensors move to a new position if such a relocation increases their local coverage. The efficiency of the proposed strategy in different scenarios is demonstrated by simulations.

Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Sentence Embeddings

Authors: Authors: Stephen Fitz
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2309.09397
Pdf link: https://arxiv.org/pdf/2309.09397
Abstract As Large Language Models are deployed within Artificial Intelligence systems, that are increasingly integrated with human society, it becomes more important than ever to study their internal structures. Higher level abilities of LLMs such as GPT-3.5 emerge in large part due to informative language representations they induce from raw text data during pre-training on trillions of words. These embeddings exist in vector spaces of several thousand dimensions, and their processing involves mapping between multiple vector spaces, with total number of parameters on the order of trillions. Furthermore, these language representations are induced by gradient optimization, resulting in a black box system that is hard to interpret. In this paper, we take a look at the topological structure of neuronal activity in the "brain" of Chat-GPT's foundation language model, and analyze it with respect to a metric representing the notion of fairness. We develop a novel approach to visualize GPT's moral dimensions. We first compute a fairness metric, inspired by social psychology literature, to identify factors that typically influence fairness assessments in humans, such as legitimacy, need, and responsibility. Subsequently, we summarize the manifold's shape using a lower-dimensional simplicial complex, whose topology is derived from this metric. We color it with a heat map associated with this fairness metric, producing human-readable visualizations of the high-dimensional sentence manifold. Our results show that sentence embeddings based on GPT-3.5 can be decomposed into two submanifolds corresponding to fair and unfair moral judgments. This indicates that GPT-based language models develop a moral dimension within their representation spaces and induce an understanding of fairness during their training process.

Reducing Adversarial Training Cost with Gradient Approximation

Authors: Authors: Huihui Gong, Shuo Yang, Siqi Ma, Seyit Camtepe, Surya Nepal, Chang Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.09464
Pdf link: https://arxiv.org/pdf/2309.09464
Abstract Deep learning models have achieved state-of-the-art performances in various domains, while they are vulnerable to the inputs with well-crafted but small perturbations, which are named after adversarial examples (AEs). Among many strategies to improve the model robustness against AEs, Projected Gradient Descent (PGD) based adversarial training is one of the most effective methods. Unfortunately, the prohibitive computational overhead of generating strong enough AEs, due to the maximization of the loss function, sometimes makes the regular PGD adversarial training impractical when using larger and more complicated models. In this paper, we propose that the adversarial loss can be approximated by the partial sum of Taylor series. Furthermore, we approximate the gradient of adversarial loss and propose a new and efficient adversarial training method, adversarial training with gradient approximation (GAAT), to reduce the cost of building up robust models. Additionally, extensive experiments demonstrate that this efficiency improvement can be achieved without any or with very little loss in accuracy on natural and adversarial examples, which show that our proposed method saves up to 60% of the training time with comparable model test accuracy on MNIST, CIFAR-10 and CIFAR-100 datasets.

Gradpaint: Gradient-Guided Inpainting with Diffusion Models

Authors: Authors: Asya Grechka, Guillaume Couairon, Matthieu Cord
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.09614
Pdf link: https://arxiv.org/pdf/2309.09614
Abstract Denoising Diffusion Probabilistic Models (DDPMs) have recently achieved remarkable results in conditional and unconditional image generation. The pre-trained models can be adapted without further training to different downstream tasks, by guiding their iterative denoising process at inference time to satisfy additional constraints. For the specific task of image inpainting, the current guiding mechanism relies on copying-and-pasting the known regions from the input image at each denoising step. However, diffusion models are strongly conditioned by the initial random noise, and therefore struggle to harmonize predictions inside the inpainting mask with the real parts of the input image, often producing results with unnatural artifacts. Our method, dubbed GradPaint, steers the generation towards a globally coherent image. At each step in the denoising process, we leverage the model's "denoised image estimation" by calculating a custom loss measuring its coherence with the masked input image. Our guiding mechanism uses the gradient obtained from backpropagating this loss through the diffusion model itself. GradPaint generalizes well to diffusion models trained on various datasets, improving upon current state-of-the-art supervised and unsupervised methods.

Two-Stage Learning of Highly Dynamic Motions with Rigid and Articulated Soft Quadrupeds

Authors: Authors: Francecso Vezzi, Jiatao Ding, Antonin Raffin, Jens Kober, Cosimo Della Santina
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09682
Pdf link: https://arxiv.org/pdf/2309.09682
Abstract Controlled execution of dynamic motions in quadrupedal robots, especially those with articulated soft bodies, presents a unique set of challenges that traditional methods struggle to address efficiently. In this study, we tackle these issues by relying on a simple yet effective two-stage learning framework to generate dynamic motions for quadrupedal robots. First, a gradient-free evolution strategy is employed to discover simply represented control policies, eliminating the need for a predefined reference motion. Then, we refine these policies using deep reinforcement learning. Our approach enables the acquisition of complex motions like pronking and back-flipping, effectively from scratch. Additionally, our method simplifies the traditionally labour-intensive task of reward shaping, boosting the efficiency of the learning process. Importantly, our framework proves particularly effective for articulated soft quadrupeds, whose inherent compliance and adaptability make them ideal for dynamic tasks but also introduce unique control challenges.

Learning Covariances for Estimation with Constrained Bilevel Optimization

Authors: Authors: Mohamad Qadri, Zachary Manchester, Michael Kaess
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09718
Pdf link: https://arxiv.org/pdf/2309.09718
Abstract We consider the problem of learning error covariance matrices for robotic state estimation. The convergence of a state estimator to the correct belief over the robot state is dependent on the proper tuning of noise models. During inference, these models are used to weigh different blocks of the Jacobian and error vector resulting from linearization and hence, additionally affect the stability and convergence of the non-linear system. We propose a gradient-based method to estimate well-conditioned covariance matrices by formulating the learning process as a constrained bilevel optimization problem over factor graphs. We evaluate our method against baselines across a range of simulated and real-world tasks and demonstrate that our technique converges to model estimates that lead to better solutions as evidenced by the improved tracking accuracy on unseen test trajectories.

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

Authors: Authors: Hao Sun, Li Shen, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, Dacheng Tao
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.09719
Pdf link: https://arxiv.org/pdf/2309.09719
Abstract Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same learning rate across different clients. But they are not efficient when data is heterogeneous. To maximize the performance of optimization methods, the main challenge is how to adjust the learning rate without hurting the convergence. In this paper, we propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate based on local historical gradient squares and synchronized learning rates. Theoretical analysis shows that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients, which enables promising scalability in federated optimization. We also empirically compare our method with several communication-efficient federated optimization methods. Extensive experimental results on Computer Vision (CV) tasks and Natural Language Processing (NLP) task show the efficacy of our proposed FedLALR method and also coincides with our theoretical findings.

Differentiable Boustrophedon Path Plans

Authors: Authors: Thomas Manzini, Robin Murphy
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.09882
Pdf link: https://arxiv.org/pdf/2309.09882
Abstract This paper introduces a differentiable representation for optimization of boustrophedon path plans in convex polygons, explores an additional parameter of these path plans that can be optimized, discusses the properties of this representation that can be leveraged during the optimization process, and shows that the previously published attempt at optimization of these path plans was too coarse to be practically useful. Experiments were conducted to show that this differentiable representation can reproduce the same scores from transitional discrete representations of boustrophedon path plans with high fidelity. Finally, optimization via gradient descent was attempted, but found to fail because the search space is far more non-convex than was previously considered in the literature. The wide range of applications for boustrophedon path plans means that this work has the potential to improve path planning efficiency in numerous areas of robotics including mapping and search tasks using uncrewed aerial systems, environmental sampling tasks using uncrewed marine vehicles, and agricultural tasks using ground vehicles, among numerous others applications.

Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees

Authors: Authors: Alexia Jolicoeur-Martineau, Kilian Fatras, Tal Kachman
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.09968
Pdf link: https://arxiv.org/pdf/2309.09968
Abstract Tabular data is hard to acquire and is subject to missing values. This paper proposes a novel approach to generate and impute mixed-type (continuous and categorical) tabular data using score-based diffusion and conditional flow matching. Contrary to previous work that relies on neural networks as function approximators, we instead utilize XGBoost, a popular Gradient-Boosted Tree (GBT) method. In addition to being elegant, we empirically show on various datasets that our method i) generates highly realistic synthetic data when the training dataset is either clean or tainted by missing data and ii) generates diverse plausible data imputations. Our method often outperforms deep-learning generation methods and can trained in parallel using CPUs without the need for a GPU. To make it easily accessible, we release our code through a Python library on PyPI and an R package on CRAN.

Keyword: super-resolution

Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution

Authors: Authors: Wenyu Zhang, Xin Deng, Baojun Jia, Xingtong Yu, Yifan Chen, jin Ma, Qing Ding, Xinming Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08919
Pdf link: https://arxiv.org/pdf/2309.08919
Abstract Current Scene text image super-resolution approaches primarily focus on extracting robust features, acquiring text information, and complex training strategies to generate super-resolution images. However, the upsampling module, which is crucial in the process of converting low-resolution images to high-resolution ones, has received little attention in existing works. To address this issue, we propose the Pixel Adapter Module (PAM) based on graph attention to address pixel distortion caused by upsampling. The PAM effectively captures local structural information by allowing each pixel to interact with its neighbors and update features. Unlike previous graph attention mechanisms, our approach achieves 2-3 orders of magnitude improvement in efficiency and memory utilization by eliminating the dependency on sparse adjacency matrices and introducing a sliding window approach for efficient parallel computation. Additionally, we introduce the MLP-based Sequential Residual Block (MSRB) for robust feature extraction from text images, and a Local Contour Awareness loss ($\mathcal{L}_{lca}$) to enhance the model's perception of details. Comprehensive experiments on TextZoom demonstrate that our proposed method generates high-quality super-resolution images, surpassing existing methods in recognition accuracy. For single-stage and multi-stage strategies, we achieved improvements of 0.7% and 2.6%, respectively, increasing the performance from 52.6% and 53.7% to 53.3% and 56.3%. The code is available at https://github.com/wenyu1009/RTSRN.

Sep 19 '23 06:09 zoq

arxiv-updates arxiv-updates copied to clipboard

New submissions for Tue, 19 Sep 23

Keyword: sgd

Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets

Keyword: optimization

Maneuver Decision-Making Through Proximal Policy Optimization And Monte Carlo Tree Search

A Stochastic Online Forecast-and-Optimize Framework for Real-Time Energy Dispatch in Virtual Power Plants under Uncertainty

Cure the headache of Transformers via Collinear Constrained Attention

Speeding up charge exchange recombination spectroscopy analysis in support of NERSC/DIII-D realtime workflow

Probabilistic Constellation Shaping With Denoising Diffusion Probabilistic Models: A Novel Approach

Wasserstein Distributionally Robust Control Barrier Function using Conditional Value-at-Risk with Differentiable Convex Programming

Clustered Multi-Agent Linear Bandits

RoSSO: A High-Performance Python Package for Robotic Surveillance Strategy Optimization Using JAX

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

A Control Approach for Nonlinear Stochastic State Uncertain Systems with Probabilistic Safety Guarantees

SHAPNN: Shapley Value Regularized Tabular Neural Network

Geometric Projectors: Geometric Constraints based Optimization for Robot Behaviors

Distributionally Robust CVaR-Based Safety Filtering for Motion Planning in Uncertain Environments

Distributionally Robust Post-hoc Classifiers under Prior Shifts

Intention-Aware Planner for Robust and Safe Aerial Tracking

Towards Geometric Motion Planning for High-Dimensional Systems: Gait-Based Coordinate Optimization and Local Metrics

Semantic Information Extraction for Text Data with Probability Graph

GRaCE: Optimizing Grasps to Satisfy Ranked Criteria in Complex Scenario

Efficient Methods for Non-stationary Online Learning

Delving into Multimodal Prompting for Fine-grained Visual Classification

Bidirectional Graph GAN: Representing Brain Structure-Function Connections for Alzheimer's Disease

Inverse classification with logistic and softmax classifiers: efficient optimization

ExBluRF: Efficient Radiance Fields for Extreme Motion Blurred Images

FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization

Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations

QTOS: An Open-Source Quadruped Trajectory Optimization Stack

Achieving Ultra-Reliable Low-Latency Communication (URLLC) in Next-Generation Cellular Networks with Programmable Data Planes

CppFlow: Generative Inverse Kinematics for Efficient and Robust Cartesian Path Planning

Uncertainty-aware 3D Object-Level Mapping with Deep Shape Priors

Conditional Mutual Information Constrained Deep Learning for Classification

A Contracting Dynamical System Perspective toward Interval Markov Decision Processes

Consensus-Based Leader-Follower Formation Tracking for Control-Affine Nonlinear Multiagent Systems

Spline-Based Minimum-Curvature Trajectory Optimization for Autonomous Racing

MFRL-BI: Design of a Model-free Reinforcement Learning Process Control Scheme by Using Bayesian Inference

Neural Gradient Learning and Optimization for Oriented Point Normal Estimation

Fresh Multiple Access: A Unified Framework Based on Large Models and Mean-Field Approximations

Convex Latent-Optimized Adversarial Regularizers for Imaging Inverse Problems

User Assignment and Resource Allocation for Hierarchical Federated Learning over Wireless Networks

RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose Estimation

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Sentence Embeddings

A Schedule of Duties in the Cloud Space Using a Modified Salp Swarm Algorithm

Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles

Stealthy Physical Masked Face Recognition Attack via Adversarial Style Optimization

LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

Pruning Large Language Models via Accuracy Predictor

Proof-of-Prospect-Theory: A Novel Game-based Consensus Mechanism for Blockchain

Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation

Distributed course allocation with asymmetric friendships

Securing Fixed Neural Network Steganography

Multi-Dictionary Tensor Decomposition

Learning Covariances for Estimation with Constrained Bilevel Optimization

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

Significant improvement of lossy compression rate and speed of HPC data using perceptron parallelized compression

DFL-TORO: A One-Shot Demonstration Framework for Learning Time-Optimal Robotic Manufacturing Tasks

Energy Management of Hydrogen Hybrid Electric Vehicles -- A Potential Study

Coco-LIC: Continuous-Time Tightly-Coupled LiDAR-Inertial-Camera Odometry using Non-Uniform B-spline

Learning Inertial Parameter Identification of Unknown Object with Humanoid Robot using Sim-to-Real Adaptation

DynaPix SLAM: A Pixel-Based Dynamic SLAM Approach

Differentiable Boustrophedon Path Plans

Recycling Krylov Subspaces for Efficient Partitioned Solution of Aerostructural Adjoint Systems

Hierarchical Attention and Graph Neural Networks: Toward Drift-Free Pose Estimation

Keyword: adam

High-order BDF convolution quadrature for fractional evolution equations with hyper-singular source term

Multi-turn Dialogue Comprehension from a Topic-aware Perspective

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

Keyword: gradient

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Control Barrier Function for Linearizable Systems with High Relative Degrees from Signal Temporal Logics: A Reference Governor Approach

GRaCE: Optimizing Grasps to Satisfy Ranked Criteria in Complex Scenario

GCL: Gradient-Guided Contrastive Learning for Medical Image Segmentation with Multi-Perspective Meta Labels

Efficient Methods for Non-stationary Online Learning

Solving Quadratic Systems with Full-Rank Matrices Using Sparse or Generative Priors

Neural Gradient Learning and Optimization for Oriented Point Normal Estimation

Convex Latent-Optimized Adversarial Regularizers for Imaging Inverse Problems

arxiv-updates
arxiv-updates copied to clipboard