arxiv-updates icon indicating copy to clipboard operation
arxiv-updates copied to clipboard

New submissions for Tue, 19 Sep 23

Open zoq opened this issue 1 year ago • 0 comments

Keyword: sgd

Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets

  • Authors: Authors: Pulkit Gopalani, Samyak Jha, Anirbit Mukherjee
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2309.09258
  • Pdf link: https://arxiv.org/pdf/2309.09258
  • Abstract In this note, we demonstrate a first-of-its-kind provable convergence of SGD to the global minima of appropriately regularized logistic empirical risk of depth $2$ nets -- for arbitrary data and with any number of gates with adequately smooth and bounded activations like sigmoid and tanh. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized logistic loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives.

Keyword: optimization

Maneuver Decision-Making Through Proximal Policy Optimization And Monte Carlo Tree Search

  • Authors: Authors: Zhang Hong-Peng
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2309.08611
  • Pdf link: https://arxiv.org/pdf/2309.08611
  • Abstract Maneuver decision-making can be regarded as a Markov decision process and can be address by reinforcement learning. However, original reinforcement learning algorithms can hardly solve the maneuvering decision-making problem. One reason is that agents use random actions in the early stages of training, which makes it difficult to get rewards and learn how to make effective decisions. To address this issue, a method based on proximal policy optimization and Monte Carlo tree search is proposed. The method uses proximal policy optimization to train the agent, and regards the results of air combat as targets to train the value network. Then, based on the value network and the visit count of each node, Monte Carlo tree search is used to find the actions with more expected returns than random actions, which can improve the training performance. The ablation studies and simulation experiments indicate that agents trained by the proposed method can make different decisions according to different states, which demonstrates that the method can solve the maneuvering decision problem that the original reinforcement learning algorithm cannot solve.

A Stochastic Online Forecast-and-Optimize Framework for Real-Time Energy Dispatch in Virtual Power Plants under Uncertainty

  • Authors: Authors: Wei Jiang, Zhongkai Yi, Li Wang, Hanwei Zhang, Jihai Zhang, Fangquan Lin, Cheng Yang
  • Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
  • Arxiv link: https://arxiv.org/abs/2309.08642
  • Pdf link: https://arxiv.org/pdf/2309.08642
  • Abstract Aggregating distributed energy resources in power systems significantly increases uncertainties, in particular caused by the fluctuation of renewable energy generation. This issue has driven the necessity of widely exploiting advanced predictive control techniques under uncertainty to ensure long-term economics and decarbonization. In this paper, we propose a real-time uncertainty-aware energy dispatch framework, which is composed of two key elements: (i) A hybrid forecast-and-optimize sequential task, integrating deep learning-based forecasting and stochastic optimization, where these two stages are connected by the uncertainty estimation at multiple temporal resolutions; (ii) An efficient online data augmentation scheme, jointly involving model pre-training and online fine-tuning stages. In this way, the proposed framework is capable to rapidly adapt to the real-time data distribution, as well as to target on uncertainties caused by data drift, model discrepancy and environment perturbations in the control process, and finally to realize an optimal and robust dispatch solution. The proposed framework won the championship in CityLearn Challenge 2022, which provided an influential opportunity to investigate the potential of AI application in the energy domain. In addition, comprehensive experiments are conducted to interpret its effectiveness in the real-life scenario of smart building energy management.

Cure the headache of Transformers via Collinear Constrained Attention

  • Authors: Authors: Shiyi Zhu, Jing Ye, Wei Jiang, Qi Zhang, Yifan Wu, Jianguo Li
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2309.08646
  • Pdf link: https://arxiv.org/pdf/2309.08646
  • Abstract As the rapid progression of practical applications based on Large Language Models continues, the importance of extrapolating performance has grown exponentially in the research domain. In our study, we identified an anomalous behavior in Transformer models that had been previously overlooked, leading to a chaos around closest tokens which carried the most important information. We've coined this discovery the "headache of Transformers". To address this at its core, we introduced a novel self-attention structure named Collinear Constrained Attention (CoCA). This structure can be seamlessly integrated with existing extrapolation, interpolation methods, and other optimization strategies designed for traditional Transformer models. We have achieved excellent extrapolating performance even for 16 times to 24 times of sequence lengths during inference without any fine-tuning on our model. We have also enhanced CoCA's computational and spatial efficiency to ensure its practicality. We plan to open-source CoCA shortly. In the meantime, we've made our code available in the appendix for reappearing experiments.

Speeding up charge exchange recombination spectroscopy analysis in support of NERSC/DIII-D realtime workflow

  • Authors: Authors: Aarushi Jain, Laurie Stephey, Erik Linsenmayer, Colin Chrystal, Jonathan Dursi, Hannah Ross
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Plasma Physics (physics.plasm-ph)
  • Arxiv link: https://arxiv.org/abs/2309.08687
  • Pdf link: https://arxiv.org/pdf/2309.08687
  • Abstract We report optimization work made in support of the development of a realtime Superfacility workflow between DIII-D and NERSC. At DIII-D, the ion properties measured by charge exchange recombination (CER) spectroscopy are required inputs for a Superfacility realtime workflow that computes the full plasma kinetic equilibrium. In this workflow, minutes matter since the results must be ready during the brief 10-15 minute pause between plasma discharges. Prior to this work, a sample CERFIT analysis took approximately 15 minutes. Because the problem consists of many calculations that can be done independently, we were able to restructure the CERFIT code to leverage this parallelism with Slurm job arrays. We reduced the runtime to approximately 51 seconds -- a speedup of roughly 20x, saving valuable time for both the scientists interested in the CER results and also for the larger equilibrium reconstruction workflow.

Probabilistic Constellation Shaping With Denoising Diffusion Probabilistic Models: A Novel Approach

  • Authors: Authors: Mehdi Letafati, Samad Ali, Matti Latva-aho
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2309.08688
  • Pdf link: https://arxiv.org/pdf/2309.08688
  • Abstract With the incredible results achieved from generative pre-trained transformers (GPT) and diffusion models, generative AI (GenAI) is envisioned to yield remarkable breakthroughs in various industrial and academic domains. In this paper, we utilize denoising diffusion probabilistic models (DDPM), as one of the state-of-the-art generative models, for probabilistic constellation shaping in wireless communications. While the geometry of constellations is predetermined by the networking standards, probabilistic constellation shaping can help enhance the information rate and communication performance by designing the probability of occurrence (generation) of constellation symbols. Unlike conventional methods that deal with an optimization problem over the discrete distribution of constellations, we take a radically different approach. Exploiting the denoise-and-generate'' characteristic of DDPMs, the key idea is to learn how to generate constellation symbols out of noise, mimicking'' the way the receiver performs symbol reconstruction. By doing so, we make the constellation symbols sent by the transmitter, and what is inferred (reconstructed) at the receiver become as similar as possible. Our simulations show that the proposed scheme outperforms deep neural network (DNN)-based benchmark and uniform shaping, while providing network resilience as well as robust out-of-distribution performance under low-SNR regimes and non-Gaussian noise. Notably, a threefold improvement in terms of mutual information is achieved compared to DNN-based approach for 64-QAM geometry.

Wasserstein Distributionally Robust Control Barrier Function using Conditional Value-at-Risk with Differentiable Convex Programming

  • Authors: Authors: Alaa Eddine Chriat, Chuangchuang Sun
  • Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2309.08700
  • Pdf link: https://arxiv.org/pdf/2309.08700
  • Abstract Control Barrier functions (CBFs) have attracted extensive attention for designing safe controllers for their deployment in real-world safety-critical systems. However, the perception of the surrounding environment is often subject to stochasticity and further distributional shift from the nominal one. In this paper, we present distributional robust CBF (DR-CBF) to achieve resilience under distributional shift while keeping the advantages of CBF, such as computational efficacy and forward invariance. To achieve this goal, we first propose a single-level convex reformulation to estimate the conditional value at risk (CVaR) of the safety constraints under distributional shift measured by a Wasserstein metric, which is by nature tri-level programming. Moreover, to construct a control barrier condition to enforce the forward invariance of the CVaR, the technique of differentiable convex programming is applied to enable differentiation through the optimization layer of CVaR estimation. We also provide an approximate variant of DR-CBF for higher-order systems. Simulation results are presented to validate the chance-constrained safety guarantee under the distributional shift in both first and second-order systems.

Clustered Multi-Agent Linear Bandits

  • Authors: Authors: Hamza Cherkaoui, Merwan Barlier, Igor Colin
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2309.08710
  • Pdf link: https://arxiv.org/pdf/2309.08710
  • Abstract We address in this paper a particular instance of the multi-agent linear stochastic bandit problem, called clustered multi-agent linear bandits. In this setting, we propose a novel algorithm leveraging an efficient collaboration between the agents in order to accelerate the overall optimization problem. In this contribution, a network controller is responsible for estimating the underlying cluster structure of the network and optimizing the experiences sharing among agents within the same groups. We provide a theoretical analysis for both the regret minimization problem and the clustering quality. Through empirical evaluation against state-of-the-art algorithms on both synthetic and real data, we demonstrate the effectiveness of our approach: our algorithm significantly improves regret minimization while managing to recover the true underlying cluster partitioning.

RoSSO: A High-Performance Python Package for Robotic Surveillance Strategy Optimization Using JAX

  • Authors: Authors: Yohan John, Connor Hughes, Gilberto Diaz-Garcia, Jason R. Marden, Francesco Bullo
  • Subjects: Robotics (cs.RO); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2309.08742
  • Pdf link: https://arxiv.org/pdf/2309.08742
  • Abstract To enable the computation of effective randomized patrol routes for single- or multi-robot teams, we present RoSSO, a Python package designed for solving Markov chain optimization problems. We exploit machine-learning techniques such as reverse-mode automatic differentiation and constraint parametrization to achieve superior efficiency compared to general-purpose nonlinear programming solvers. Additionally, we supplement a game-theoretic stochastic surveillance formulation in the literature with a novel greedy algorithm and multi-robot extension. We close with numerical results for a police district in downtown San Francisco that demonstrate RoSSO's capabilities on our new formulations and the prior work.

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

  • Authors: Authors: Yi Shen, Pan Xu, Michael M. Zavlanos
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2309.08748
  • Pdf link: https://arxiv.org/pdf/2309.08748
  • Abstract Without direct interaction with the environment. Often, the environment in which the data are collected differs from the environment in which the learned policy is applied. To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed that compute worst-case bounds on the policy values assuming that the distribution of the new environment lies within an uncertainty set. Typically, this uncertainty set is defined based on the KL divergence around the empirical distribution computed from the logging dataset. However, the KL uncertainty set fails to encompass distributions with varying support and lacks awareness of the geometry of the distribution support. As a result, KL approaches fall short in addressing practical environment mismatches and lead to over-fitting to worst-case scenarios. To overcome these limitations, we propose a novel DRO approach that employs the Wasserstein distance instead. While Wasserstein DRO is generally computationally more expensive compared to KL DRO, we present a regularized method and a practical (biased) stochastic gradient descent method to optimize the policy efficiently. We also provide a theoretical analysis of the finite sample complexity and iteration complexity for our proposed method. We further validate our approach using a public dataset that was recorded in a randomized stoke trial.

A Control Approach for Nonlinear Stochastic State Uncertain Systems with Probabilistic Safety Guarantees

  • Authors: Authors: Mohammad S. Ramadan, Mohammad Alsuwaidan, Ahmed Atallah, Sylvia Herbert
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2309.08767
  • Pdf link: https://arxiv.org/pdf/2309.08767
  • Abstract This paper presents an algorithm to apply nonlinear control design approaches in the case of stochastic systems with partial state observation. Deterministic nonlinear control approaches are formulated under the assumption of full state access and, often, relative degree one. We propose a control design approach that first generates a control policy for nonlinear deterministic models with full state observation. The resulting control policy is then used to build an importance-like probability distribution over the space of control sequences which are to be evaluated for the true stochastic and state-uncertain dynamics. This distribution serves in the sampling step within a random search control optimization procedure, to focus the exploration effort on certain regions of the control space. The sampled control sequences are assigned costs determined by a prescribed finite-horizon performance and safety measure, which is based on the stochastic dynamics. This sampling algorithm is parallelizable and shown to have computational complexity indifferent to the state dimension, and to be able to guarantee safety over the prescribed prediction horizon. A numerical simulation is provided to test the applicability and effectiveness of the presented approach and compare it to a certainty equivalence controller.

SHAPNN: Shapley Value Regularized Tabular Neural Network

  • Authors: Authors: Qisen Cheng, Shuhui Qu, Janghwan Lee
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2309.08799
  • Pdf link: https://arxiv.org/pdf/2309.08799
  • Abstract We present SHAPNN, a novel deep tabular data modeling architecture designed for supervised learning. Our approach leverages Shapley values, a well-established technique for explaining black-box models. Our neural network is trained using standard backward propagation optimization methods, and is regularized with realtime estimated Shapley values. Our method offers several advantages, including the ability to provide valid explanations with no computational overhead for data instances and datasets. Additionally, prediction with explanation serves as a regularizer, which improves the model's performance. Moreover, the regularized prediction enhances the model's capability for continual learning. We evaluate our method on various publicly available datasets and compare it with state-of-the-art deep neural network models, demonstrating the superior performance of SHAPNN in terms of AUROC, transparency, as well as robustness to streaming data.

Geometric Projectors: Geometric Constraints based Optimization for Robot Behaviors

  • Authors: Authors: Xuemin Chi, Tobias Löw, Yiming Li, Zhitao Liu, Sylvain Calinon
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.08802
  • Pdf link: https://arxiv.org/pdf/2309.08802
  • Abstract Generating motion for robots that interact with objects of various shapes is a complex challenge, further complicated when the robot's own geometry and multiple desired behaviors are considered. To address this issue, we introduce a new framework based on Geometric Projectors (GeoPro) for constrained optimization. This novel framework allows for the generation of task-agnostic behaviors that are compliant with geometric constraints. GeoPro streamlines the design of behaviors in both task and configuration spaces, offering diverse functionalities such as collision avoidance and goal-reaching, while maintaining high computational efficiency. We validate the efficacy of our work through simulations and Franka Emika robotic experiments, comparing its performance against state-of-the-art methodologies. This comprehensive evaluation highlights GeoPro's versatility in accommodating robots with varying dynamics and precise geometric shapes. For additional materials, please visit: https://www.xueminchi.com/publications/geopro

Distributionally Robust CVaR-Based Safety Filtering for Motion Planning in Uncertain Environments

  • Authors: Authors: Sleiman Safaoui, Tyler H. Summers
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2309.08821
  • Pdf link: https://arxiv.org/pdf/2309.08821
  • Abstract Safety is a core challenge of autonomous robot motion planning, especially in the presence of dynamic and uncertain obstacles. Many recent results use learning and deep learning-based motion planners and prediction modules to predict multiple possible obstacle trajectories and generate obstacle-aware ego robot plans. However, planners that ignore the inherent uncertainties in such predictions incur collision risks and lack formal safety guarantees. In this paper, we present a computationally efficient safety filtering solution to reduce the collision risk of ego robot motion plans using multiple samples of obstacle trajectory predictions. The proposed approach reformulates the collision avoidance problem by computing safe halfspaces based on obstacle sample trajectories using distributionally robust optimization (DRO) techniques. The safe halfspaces are used in a model predictive control (MPC)-like safety filter to apply corrections to the reference ego trajectory thereby promoting safer planning. The efficacy and computational efficiency of our approach are demonstrated through numerical simulations.

Distributionally Robust Post-hoc Classifiers under Prior Shifts

  • Authors: Authors: Jiaheng Wei, Harikrishna Narasimhan, Ehsan Amid, Wen-Sheng Chu, Yang Liu, Abhishek Kumar
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2309.08825
  • Pdf link: https://arxiv.org/pdf/2309.08825
  • Abstract The generalization ability of machine learning models degrades significantly when the test distribution shifts away from the training distribution. We investigate the problem of training models that are robust to shifts caused by changes in the distribution of class-priors or group-priors. The presence of skewed training priors can often lead to the models overfitting to spurious features. Unlike existing methods, which optimize for either the worst or the average performance over classes or groups, our work is motivated by the need for finer control over the robustness properties of the model. We present an extremely lightweight post-hoc approach that performs scaling adjustments to predictions from a pre-trained model, with the goal of minimizing a distributionally robust loss around a chosen target distribution. These adjustments are computed by solving a constrained optimization problem on a validation set and applied to the model during test time. Our constrained optimization objective is inspired by a natural notion of robustness to controlled distribution shifts. Our method comes with provable guarantees and empirically makes a strong case for distributional robust post-hoc classifiers. An empirical implementation is available at https://github.com/weijiaheng/Drops.

Intention-Aware Planner for Robust and Safe Aerial Tracking

  • Authors: Authors: Qiuyu Ren, Huan Yu, Jiajun Dai, Zhi Zheng, Jun Meng, Li Xu
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.08854
  • Pdf link: https://arxiv.org/pdf/2309.08854
  • Abstract The intention of the target can help us to estimate its future motion state more accurately. This paper proposes an intention-aware planner to enhance safety and robustness in aerial tracking applications. Firstly, we utilize the Mediapipe framework to estimate target's pose. A risk assessment function and a state observation function are designed to predict the target intention. Afterwards, an intention-driven hybrid A* method is proposed for target motion prediction, ensuring that the target's future positions align with its intention. Finally, an intention-aware optimization approach, in conjunction with particular penalty formulations, is designed to generate a spatial-temporal optimal trajectory. Benchmark comparisons validate the superior performance of our proposed methodology across diverse scenarios. This is attributed to the integration of the target intention into the planner through coupled formulations.

Towards Geometric Motion Planning for High-Dimensional Systems: Gait-Based Coordinate Optimization and Local Metrics

  • Authors: Authors: Yanhao Yang, Ross L. Hatton
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.08871
  • Pdf link: https://arxiv.org/pdf/2309.08871
  • Abstract Geometric motion planning offers effective and interpretable gait analysis and optimization tools for locomoting systems. However, due to the curse of dimensionality in coordinate optimization, a key component of geometric motion planning, it is almost infeasible to apply current geometric motion planning to high-dimensional systems. In this paper, we propose a gait-based coordinate optimization method that overcomes the curse of dimensionality. We also identify a unified geometric representation of locomotion by generalizing various nonholonomic constraints into local metrics. By combining these two approaches, we take a step towards geometric motion planning for high-dimensional systems. We test our method in two classes of high-dimensional systems - low Reynolds number swimmers and free-falling Cassie - with up to 11-dimensional shape variables. The resulting optimal gait in the high-dimensional system shows better efficiency compared to that of the reduced-order model. Furthermore, we provide a geometric optimality interpretation of the optimal gait.

Semantic Information Extraction for Text Data with Probability Graph

  • Authors: Authors: Zhouxiang Zhao, Zhaohui Yang, Ye Hu, Licheng Lin, Zhaoyang Zhang
  • Subjects: Computation and Language (cs.CL); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2309.08879
  • Pdf link: https://arxiv.org/pdf/2309.08879
  • Abstract In this paper, the problem of semantic information extraction for resource constrained text data transmission is studied. In the considered model, a sequence of text data need to be transmitted within a communication resource-constrained network, which only allows limited data transmission. Thus, at the transmitter, the original text data is extracted with natural language processing techniques. Then, the extracted semantic information is captured in a knowledge graph. An additional probability dimension is introduced in this graph to capture the importance of each information. This semantic information extraction problem is posed as an optimization framework whose goal is to extract most important semantic information for transmission. To find an optimal solution for this problem, a Floyd's algorithm based solution coupled with an efficient sorting mechanism is proposed. Numerical results testify the effectiveness of the proposed algorithm with regards to two novel performance metrics including semantic uncertainty and semantic similarity.

GRaCE: Optimizing Grasps to Satisfy Ranked Criteria in Complex Scenario

  • Authors: Authors: Tasbolat Taunyazov, Kelvin Lin, Harold Soh
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.08887
  • Pdf link: https://arxiv.org/pdf/2309.08887
  • Abstract This paper addresses the multi-faceted problem of robot grasping, where multiple criteria may conflict and differ in importance. We introduce Grasp Ranking and Criteria Evaluation (GRaCE), a novel approach that employs hierarchical rule-based logic and a rank-preserving utility function to optimize grasps based on various criteria such as stability, kinematic constraints, and goal-oriented functionalities. Additionally, we propose GRaCE-OPT, a hybrid optimization strategy that combines gradient-based and gradient-free methods to effectively navigate the complex, non-convex utility function. Experimental results in both simulated and real-world scenarios show that GRaCE requires fewer samples to achieve comparable or superior performance relative to existing methods. The modular architecture of GRaCE allows for easy customization and adaptation to specific application needs.

Efficient Methods for Non-stationary Online Learning

  • Authors: Authors: Peng Zhao, Yan-Feng Xie, Lijun Zhang, Zhi-Hua Zhou
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2309.08911
  • Pdf link: https://arxiv.org/pdf/2309.08911
  • Abstract Non-stationary online learning has drawn much attention in recent years. In particular, dynamic regret and adaptive regret are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of the non-stationarity, in which a group of base-learners are maintained and a meta-algorithm is employed to track the best one on the fly. However, the two-layer structure raises the concern about the computational complexity -- those methods typically maintain $\mathcal{O}(\log T)$ base-learners simultaneously for a $T$-round online game and thus perform multiple projections onto the feasible domain per round, which becomes the computational bottleneck when the domain is complicated. In this paper, we present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $\mathcal{O}(\log T)$ to $1$. Moreover, our obtained algorithms require only one gradient query and one function evaluation at each round. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods. Empirical studies verify our theoretical findings.

Delving into Multimodal Prompting for Fine-grained Visual Classification

  • Authors: Authors: Xin Jiang, Hao Tang, Junyao Gao, Xiaoyu Du, Shengfeng He, Zechao Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2309.08912
  • Pdf link: https://arxiv.org/pdf/2309.08912
  • Abstract Fine-grained visual classification (FGVC) involves categorizing fine subdivisions within a broader category, which poses challenges due to subtle inter-class discrepancies and large intra-class variations. However, prevailing approaches primarily focus on uni-modal visual concepts. Recent advancements in pre-trained vision-language models have demonstrated remarkable performance in various high-level vision tasks, yet the applicability of such models to FGVC tasks remains uncertain. In this paper, we aim to fully exploit the capabilities of cross-modal description to tackle FGVC tasks and propose a novel multimodal prompting solution, denoted as MP-FGVC, based on the contrastive language-image pertaining (CLIP) model. Our MP-FGVC comprises a multimodal prompts scheme and a multimodal adaptation scheme. The former includes Subcategory-specific Vision Prompt (SsVP) and Discrepancy-aware Text Prompt (DaTP), which explicitly highlights the subcategory-specific discrepancies from the perspectives of both vision and language. The latter aligns the vision and text prompting elements in a common semantic space, facilitating cross-modal collaborative reasoning through a Vision-Language Fusion Module (VLFM) for further improvement on FGVC. Moreover, we tailor a two-stage optimization strategy for MP-FGVC to fully leverage the pre-trained CLIP model and expedite efficient adaptation for FGVC. Extensive experiments conducted on four FGVC datasets demonstrate the effectiveness of our MP-FGVC.

Bidirectional Graph GAN: Representing Brain Structure-Function Connections for Alzheimer's Disease

  • Authors: Authors: Shuqiang Wang, Chen Ding
  • Subjects: Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)
  • Arxiv link: https://arxiv.org/abs/2309.08916
  • Pdf link: https://arxiv.org/pdf/2309.08916
  • Abstract The relationship between brain structure and function is critical for revealing the pathogenesis of brain disease, including Alzheimer's disease (AD). However, it is a great challenge to map brain structure-function connections due to various reasons. In this work, a bidirectional graph generative adversarial networks (BGGAN) is proposed to represent brain structure-function connections. Specifically, by designing a module incorporating inner graph convolution network (InnerGCN), the generators of BGGAN can employ features of direct and indirect brain regions to learn the mapping function between structural domain and functional domain. Besides, a new module named Balancer is designed to counterpoise the optimization between generators and discriminators. By introducing the Balancer into BGGAN, both the structural generator and functional generator can not only alleviate the issue of mode collapse but also learn complementarity of structural and functional features. Experimental results using ADNI datasets show that the both the generated structure connections and generated function connections can improve the identification accuracy of AD. More importantly, based the proposed model, it is found that the relationship between brain structure and function is not a complete one-to-one correspondence. Brain structure is the basis of brain function. The strong structural connections are almost accompanied by strong functional connections.

Inverse classification with logistic and softmax classifiers: efficient optimization

  • Authors: Authors: Miguel Á. Carreira-Perpiñán, Suryabhan Singh Hada
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2309.08945
  • Pdf link: https://arxiv.org/pdf/2309.08945
  • Abstract In recent years, a certain type of problems have become of interest where one wants to query a trained classifier. Specifically, one wants to find the closest instance to a given input instance such that the classifier's predicted label is changed in a desired way. Examples of these ``inverse classification'' problems are counterfactual explanations, adversarial examples and model inversion. All of them are fundamentally optimization problems over the input instance vector involving a fixed classifier, and it is of interest to achieve a fast solution for interactive or real-time applications. We focus on solving this problem efficiently for two of the most widely used classifiers: logistic regression and softmax classifiers. Owing to special properties of these models, we show that the optimization can be solved in closed form for logistic regression, and iteratively but extremely fast for the softmax classifier. This allows us to solve either case exactly (to nearly machine precision) in a runtime of milliseconds to around a second even for very high-dimensional instances and many classes.

ExBluRF: Efficient Radiance Fields for Extreme Motion Blurred Images

  • Authors: Authors: Dongwoo Lee, Jeongtaek Oh, Jaesung Lim, Sunghyun Cho, Kyoung Mu Lee
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2309.08957
  • Pdf link: https://arxiv.org/pdf/2309.08957
  • Abstract We present ExBluRF, a novel view synthesis method for extreme motion blurred images based on efficient radiance fields optimization. Our approach consists of two main components: 6-DOF camera trajectory-based motion blur formulation and voxel-based radiance fields. From extremely blurred images, we optimize the sharp radiance fields by jointly estimating the camera trajectories that generate the blurry images. In training, multiple rays along the camera trajectory are accumulated to reconstruct single blurry color, which is equivalent to the physical motion blur operation. We minimize the photo-consistency loss on blurred image space and obtain the sharp radiance fields with camera trajectories that explain the blur of all images. The joint optimization on the blurred image space demands painfully increasing computation and resources proportional to the blur size. Our method solves this problem by replacing the MLP-based framework to low-dimensional 6-DOF camera poses and voxel-based radiance fields. Compared with the existing works, our approach restores much sharper 3D scenes from challenging motion blurred views with the order of 10 times less training time and GPU memory consumption.

FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization

  • Authors: Authors: Nan Ma, Mohan Wang, Yiheng Han, Yong-Jin Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2309.08966
  • Pdf link: https://arxiv.org/pdf/2309.08966
  • Abstract Cross-modality point cloud registration is confronted with significant challenges due to inherent differences in modalities between different sensors. We propose a cross-modality point cloud registration framework FF-LOGO: a cross-modality point cloud registration method with feature filtering and local-global optimization. The cross-modality feature correlation filtering module extracts geometric transformation-invariant features from cross-modality point clouds and achieves point selection by feature matching. We also introduce a cross-modality optimization process, including a local adaptive key region aggregation module and a global modality consistency fusion optimization module. Experimental results demonstrate that our two-stage optimization significantly improves the registration accuracy of the feature association and selection module. Our method achieves a substantial increase in recall rate compared to the current state-of-the-art methods on the 3DCSR dataset, improving from 40.59% to 75.74%. Our code will be available at https://github.com/wangmohan17/FFLOGO.

Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations

  • Authors: Authors: Fucheng Jia, Shiqi Jiang, Ting Cao, Wei Cui, Tianrui Xia, Xu Cao, Yuanchun Li, Deyu Zhang, Ju Ren, Yunxin Liu, Lili Qiu, Mao Yang
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2309.08978
  • Pdf link: https://arxiv.org/pdf/2309.08978
  • Abstract Web applications are increasingly becoming the primary platform for AI service delivery, making in-browser deep learning (DL) inference more prominent. However, current in-browser inference systems fail to effectively utilize advanced web programming techniques and customize kernels for various client devices, leading to suboptimal performance. To address the issues, this paper presents the first in-browser inference system, nn-JIT.web, which enables just-in-time (JIT) auto-generation of optimized kernels for both CPUs and GPUs during inference. The system achieves this by using two novel web programming techniques that can significantly reduce kernel generation time, compared to other tensor compilers such as TVM, while maintaining or even improving performance. The first technique, Tensor-Web Compiling Co-Design, lowers compiling costs by unifying tensor and web compiling and eliminating redundant and ineffective compiling passes. The second technique, Web-Specific Lite Kernel Optimization Space Design, reduces kernel tuning costs by focusing on web programming requirements and efficient hardware resource utilization, limiting the optimization space to only dozens. nn-JIT.web is evaluated for modern transformer models on a range of client devices, including the mainstream CPUs and GPUs from ARM, Intel, AMD and Nvidia. Results show that nn-JIT.web can achieve up to 8.2x faster within 30 seconds compared to the baselines across various models.

QTOS: An Open-Source Quadruped Trajectory Optimization Stack

  • Authors: Authors: Alexy Skoutnev, Andrew Cinar, Praful Sigdel, Forrest Laine
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09058
  • Pdf link: https://arxiv.org/pdf/2309.09058
  • Abstract We introduce a new open-source framework, Quadruped Trajectory Optimization Stack (QTOS), which integrates a global planner, local planner, simulator, controller, and robot interface into a single package. QTOS serves as a full-stack interface, simplifying continuous motion planning on an open-source quadruped platform by bridging the gap between middleware and gait planning. It empowers users to effortlessly translate high-level navigation objectives into low-level robot commands. Furthermore, QTOS enhances the stability and adaptability of long-distance gait planning across challenging terrain.

Achieving Ultra-Reliable Low-Latency Communication (URLLC) in Next-Generation Cellular Networks with Programmable Data Planes

  • Authors: Authors: Kerim Gökarslan
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2309.09079
  • Pdf link: https://arxiv.org/pdf/2309.09079
  • Abstract Recent advancements in wireless technologies towards the next-generation cellular networks have brought a new era that made it possible to apply cellular technology on traditionally-wired networks with tighter requirements, such as industrial networks. The next-generation cellular technologies (e.g., 5G and Beyond) introduce the concept of ultra-reliable low-latency communications (URLLC). This thesis presents a Software-Defined Networking (SDN) architecture with programmable data planes for the next-generation cellular networks to achieve URLLC. Our design deploys programmable switches between the cellular core and Radio Access Networks (RAN) to monitor and modify data traffic at the line speed. We introduce the concept of \textit{intra-cellular optimization}, a relaxation in cellular networks to allow pre-authorized in-network devices to communicate without being required to signal the core network. We also present a control structure, Unified Control Plane (UCP), containing a novel Ethernet Layer control protocol and an adapted version of link-state routing information distribution among the programmable switches. Our implementation uses P4 with an 5G implementation (Open5Gs) and a UE/RAN simulator. We implement a Python simulator to evaluate the performance of our system on multi-switch topologies by simulating the switch behavior. Our evaluation indicates latency reduction up to 2x with \textit{intra-cellular optimization} compared to the conventional architecture. We show that our design has a ten-millisecond level of control latency, and achieves fine-grained network security and monitoring.

CppFlow: Generative Inverse Kinematics for Efficient and Robust Cartesian Path Planning

  • Authors: Authors: Jeremy Morgan, David Millard, Gaurav S. Sukhatme
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09102
  • Pdf link: https://arxiv.org/pdf/2309.09102
  • Abstract In this work we present CppFlow - a novel and performant planner for the Cartesian Path Planning problem, which finds valid trajectories up to 129x faster than current methods, while also succeeding on more difficult problems where others fail. At the core of the proposed algorithm is the use of a learned, generative Inverse Kinematics solver, which is able to efficiently produce promising entire candidate solution trajectories on the GPU. Precise, valid solutions are then found through classical approaches such as differentiable programming, global search, and optimization. In combining approaches from these two paradigms we get the best of both worlds - efficient approximate solutions from generative AI which are made exact using the guarantees of traditional planning and optimization. We evaluate our system against other state of the art methods on a set of established baselines as well as new ones introduced in this work and find that our method significantly outperforms others in terms of the time to find a valid solution and planning success rate, and performs comparably in terms of trajectory length over time. The work is made open source and available for use upon acceptance.

Uncertainty-aware 3D Object-Level Mapping with Deep Shape Priors

  • Authors: Authors: Ziwei Liao, Jun Yang, Jingxing Qian, Angela P. Schoellig, Steven L. Waslander
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09118
  • Pdf link: https://arxiv.org/pdf/2309.09118
  • Abstract 3D object-level mapping is a fundamental problem in robotics, which is especially challenging when object CAD models are unavailable during inference. In this work, we propose a framework that can reconstruct high-quality object-level maps for unknown objects. Our approach takes multiple RGB-D images as input and outputs dense 3D shapes and 9-DoF poses (including 3 scale parameters) for detected objects. The core idea of our approach is to leverage a learnt generative model for shape categories as a prior and to formulate a probabilistic, uncertainty-aware optimization framework for 3D reconstruction. We derive a probabilistic formulation that propagates shape and pose uncertainty through two novel loss functions. Unlike current state-of-the-art approaches, we explicitly model the uncertainty of the object shapes and poses during our optimization, resulting in a high-quality object-level mapping system. Moreover, the resulting shape and pose uncertainties, which we demonstrate can accurately reflect the true errors of our object maps, can also be useful for downstream robotics tasks such as active vision. We perform extensive evaluations on indoor and outdoor real-world datasets, achieving achieves substantial improvements over state-of-the-art methods. Our code will be available at https://github.com/TRAILab/UncertainShapePose.

Conditional Mutual Information Constrained Deep Learning for Classification

  • Authors: Authors: En-Hui Yang, Shayan Mohajer Hamidi, Linfeng Ye, Renhao Tan, Beverly Yang
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2309.09123
  • Pdf link: https://arxiv.org/pdf/2309.09123
  • Abstract The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the DNN, respectively. By using NCMI to evaluate popular DNNs pretrained over ImageNet in the literature, it is shown that their validation accuracies over ImageNet validation data set are more or less inversely proportional to their NCMI values. Based on this observation, the standard deep learning (DL) framework is further modified to minimize the standard cross entropy function subject to an NCMI constraint, yielding CMI constrained deep learning (CMIC-DL). A novel alternating learning algorithm is proposed to solve such a constrained optimization problem. Extensive experiment results show that DNNs trained within CMIC-DL outperform the state-of-the-art models trained within the standard DL and other loss functions in the literature in terms of both accuracy and robustness against adversarial attacks. In addition, visualizing the evolution of learning process through the lens of CMI and NCMI is also advocated.

A Contracting Dynamical System Perspective toward Interval Markov Decision Processes

  • Authors: Authors: Saber Jafarpour, Samuel Coogan
  • Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2309.09146
  • Pdf link: https://arxiv.org/pdf/2309.09146
  • Abstract Interval Markov decision processes are a class of Markov models where the transition probabilities between the states belong to intervals. In this paper, we study the problem of efficient estimation of the optimal policies in Interval Markov Decision Processes (IMDPs) with continuous action-space. Given an IMDP, we show that the pessimistic (resp. the optimistic) value iterations, i.e., the value iterations under the assumption of a competitive adversary (resp. cooperative agent), are monotone dynamical systems and are contracting with respect to the $\ell_{\infty}$-norm. Inspired by this dynamical system viewpoint, we introduce another IMDP, called the action-space relaxation IMDP. We show that the action-space relaxation IMDP has two key features: (i) its optimal value is an upper bound for the optimal value of the original IMDP, and (ii) its value iterations can be efficiently solved using tools and techniques from convex optimization. We then consider the policy optimization problems at each step of the value iterations as a feedback controller of the value function. Using this system-theoretic perspective, we propose an iteration-distributed implementation of the value iterations for approximating the optimal value of the action-space relaxation IMDP.

Consensus-Based Leader-Follower Formation Tracking for Control-Affine Nonlinear Multiagent Systems

  • Authors: Authors: Clinton Enwerem, John S. Baras
  • Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2309.09156
  • Pdf link: https://arxiv.org/pdf/2309.09156
  • Abstract In the typical multiagent formation tracking problem centered on consensus, the prevailing assumption in the literature is that the agents' nonlinear models can be approximated by integrator systems, by their feedback-linearized equivalents, or by dynamics composed of deterministic linear and nonlinear terms. The resulting approaches associated with such assumptions, however, are hardly applicable to general nonlinear systems. To this end, we present consensus-based control laws for multiagent formation tracking in finite-dimensional state space, with the agents represented by a more general class of dynamics: control-affine nonlinear systems. The agents also exchange information via a leader-follower communication topology modeled as an undirected and connected graph with a single leader node. By leveraging standard tools from algebraic graph theory and Lyapunov analysis, we first derive a locally asymptotically stabilizing formation tracking law. Next, to demonstrate the effectiveness of our approach, we present results from numerical simulations of an example in robotics. These results -- together with a comparison of the formation errors obtained with our approach and those realized via an optimization-based method -- further validate our theoretical propositions.

Spline-Based Minimum-Curvature Trajectory Optimization for Autonomous Racing

  • Authors: Authors: Haoru Xue, Tianwei Yue, John M. Dolan
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2309.09186
  • Pdf link: https://arxiv.org/pdf/2309.09186
  • Abstract We propose a novel B-spline trajectory optimization method for autonomous racing. We consider the unavailability of sophisticated race car and race track dynamics in early-stage autonomous motorsports development and derive methods that work with limited dynamics data and additional conservative constraints. We formulate a minimum-curvature optimization problem with only the spline control points as optimization variables. We then compare the current state-of-the-art method with our optimization result, which achieves a similar level of optimality with a 90% reduction on the decision variable dimension, and in addition offers mathematical smoothness guarantee and flexible manipulation options. We concurrently reduce the problem computation time from seconds to milliseconds for a long race track, enabling future online adaptation of the previously offline technique.

MFRL-BI: Design of a Model-free Reinforcement Learning Process Control Scheme by Using Bayesian Inference

  • Authors: Authors: Yanrong Li, Juan Du, Wei Jiang
  • Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2309.09205
  • Pdf link: https://arxiv.org/pdf/2309.09205
  • Abstract Design of process control scheme is critical for quality assurance to reduce variations in manufacturing systems. Taking semiconductor manufacturing as an example, extensive literature focuses on control optimization based on certain process models (usually linear models), which are obtained by experiments before a manufacturing process starts. However, in real applications, pre-defined models may not be accurate, especially for a complex manufacturing system. To tackle model inaccuracy, we propose a model-free reinforcement learning (MFRL) approach to conduct experiments and optimize control simultaneously according to real-time data. Specifically, we design a novel MFRL control scheme by updating the distribution of disturbances using Bayesian inference to reduce their large variations during manufacturing processes. As a result, the proposed MFRL controller is demonstrated to perform well in a nonlinear chemical mechanical planarization (CMP) process when the process model is unknown. Theoretical properties are also guaranteed when disturbances are additive. The numerical studies also demonstrate the effectiveness and efficiency of our methodology.

Neural Gradient Learning and Optimization for Oriented Point Normal Estimation

  • Authors: Authors: Qing Li, Huifang Feng, Kanle Shi, Yi Fang, Yu-Shen Liu, Zhizhong Han
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2309.09211
  • Pdf link: https://arxiv.org/pdf/2309.09211
  • Abstract We propose Neural Gradient Learning (NGL), a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation. It has excellent gradient approximation properties for the underlying geometry of the data. We utilize a simple neural network to parameterize the objective function to produce gradients at points using a global implicit representation. However, the derived gradients usually drift away from the ground-truth oriented normals due to the lack of local detail descriptions. Therefore, we introduce Gradient Vector Optimization (GVO) to learn an angular distance field based on local plane geometry to refine the coarse gradient vectors. Finally, we formulate our method with a two-phase pipeline of coarse estimation followed by refinement. Moreover, we integrate two weighting functions, i.e., anisotropic kernel and inlier score, into the optimization to improve the robust and detail-preserving performance. Our method efficiently conducts global gradient approximation while achieving better accuracy and generalization ability of local feature description. This leads to a state-of-the-art normal estimator that is robust to noise, outliers and point density variations. Extensive evaluations show that our method outperforms previous works in both unoriented and oriented normal estimation on widely used benchmarks. The source code and pre-trained models are available at https://github.com/LeoQLi/NGLO.

Fresh Multiple Access: A Unified Framework Based on Large Models and Mean-Field Approximations

  • Authors: Authors: Haiming Hui, Shuqi Wei, Wei Chen
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2309.09226
  • Pdf link: https://arxiv.org/pdf/2309.09226
  • Abstract Information freshness has attracted increasingly attention in the past decade as it plays a critical role in the emerging real-time applications. Age of information (AoI) holds the promise of effectively characterizing the information freshness, hence widely considered as a fundamental performance metric. However, in multiple-device scenarios, most existing works focus on the analysis and optimization of AoI based on queueing systems. The study for a unified approach for general multiple access control scheme in freshness-oriented scenarios remains open. In this paper, we take into consideration the combination of the fundamental freshness metric AoI and multiple access control schemes to achieve efficient cross-layer analysis and optimization in freshness-oriented scenarios, which is referred to as fresh multiple access. To this end, we build a unified framework with a discrete-time tandem queue model for fresh multiple access. The unified framework enables the analysis and optimization for general multiple access protocols in fresh multiple access. To handle the high dimension framework embedded in fresh multiple access, we introduce large model approaches for the Markov chain formulation in AoI oriented scenarios. Two typical AoI-based metric are studied including age of incorrect information (AoII) and peak AoII. Moreover, to address the computational complexity of the large model, we present mean-field approximations which significantly reduces the dimension of the Markov chain model by approximating the integral affect of massive devices in fresh multiple access.

Convex Latent-Optimized Adversarial Regularizers for Imaging Inverse Problems

  • Authors: Authors: Huayu Wang, Chen Luo, Taofeng Xie, Qiyu Jin, Guoqing Chen, Zhuo-Xu Cui, Dong Liang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2309.09250
  • Pdf link: https://arxiv.org/pdf/2309.09250
  • Abstract Recently, data-driven techniques have demonstrated remarkable effectiveness in addressing challenges related to MR imaging inverse problems. However, these methods still exhibit certain limitations in terms of interpretability and robustness. In response, we introduce Convex Latent-Optimized Adversarial Regularizers (CLEAR), a novel and interpretable data-driven paradigm. CLEAR represents a fusion of deep learning (DL) and variational regularization. Specifically, we employ a latent optimization technique to adversarially train an input convex neural network, and its set of minima can fully represent the real data manifold. We utilize it as a convex regularizer to formulate a CLEAR-informed variational regularization model that guides the solution of the imaging inverse problem on the real data manifold. Leveraging its inherent convexity, we have established the convergence of the projected subgradient descent algorithm for the CLEAR-informed regularization model. This convergence guarantees the attainment of a unique solution to the imaging inverse problem, subject to certain assumptions. Furthermore, we have demonstrated the robustness of our CLEAR-informed model, explicitly showcasing its capacity to achieve stable reconstruction even in the presence of measurement interference. Finally, we illustrate the superiority of our approach using MRI reconstruction as an example. Our method consistently outperforms conventional data-driven techniques and traditional regularization approaches, excelling in both reconstruction quality and robustness.

User Assignment and Resource Allocation for Hierarchical Federated Learning over Wireless Networks

  • Authors: Authors: Tinghao Zhang, Kwok-Yan Lam, Jun Zhao
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2309.09253
  • Pdf link: https://arxiv.org/pdf/2309.09253
  • Abstract The large population of wireless users is a key driver of data-crowdsourced Machine Learning (ML). However, data privacy remains a significant concern. Federated Learning (FL) encourages data sharing in ML without requiring data to leave users' devices but imposes heavy computation and communications overheads on mobile devices. Hierarchical FL (HFL) alleviates this problem by performing partial model aggregation at edge servers. HFL can effectively reduce energy consumption and latency through effective resource allocation and appropriate user assignment. Nevertheless, resource allocation in HFL involves optimizing multiple variables, and the objective function should consider both energy consumption and latency, making the development of resource allocation algorithms very complicated. Moreover, it is challenging to perform user assignment, which is a combinatorial optimization problem in a large search space. This article proposes a spectrum resource optimization algorithm (SROA) and a two-stage iterative algorithm (TSIA) for HFL. Given an arbitrary user assignment pattern, SROA optimizes CPU frequency, transmit power, and bandwidth to minimize system cost. TSIA aims to find a user assignment pattern that considerably reduces the total system cost. Experimental results demonstrate the superiority of the proposed HFL framework over existing studies in energy and latency reduction.

RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose Estimation

  • Authors: Authors: Lijun Li, Linrui Tian1, Xindi Zhang, Qi Wang, Bang Zhang, Liefeng Bo, Mengyuan Liu, Chen Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2309.09301
  • Pdf link: https://arxiv.org/pdf/2309.09301
  • Abstract The current interacting hand (IH) datasets are relatively simplistic in terms of background and texture, with hand joints being annotated by a machine annotator, which may result in inaccuracies, and the diversity of pose distribution is limited. However, the variability of background, pose distribution, and texture can greatly influence the generalization ability. Therefore, we present a large-scale synthetic dataset RenderIH for interacting hands with accurate and diverse pose annotations. The dataset contains 1M photo-realistic images with varied backgrounds, perspectives, and hand textures. To generate natural and diverse interacting poses, we propose a new pose optimization algorithm. Additionally, for better pose estimation accuracy, we introduce a transformer-based pose estimation network, TransHand, to leverage the correlation between interacting hands and verify the effectiveness of RenderIH in improving results. Our dataset is model-agnostic and can improve more accuracy of any hand pose estimation method in comparison to other real or synthetic datasets. Experiments have shown that pretraining on our synthetic data can significantly decrease the error from 6.76mm to 5.79mm, and our Transhand surpasses contemporary methods. Our dataset and code are available at https://github.com/adwardlee/RenderIH.

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

  • Authors: Authors: Yuxi Ren, Jie Wu, Peng Zhang, Manlin Zhang, Xuefeng Xiao, Qian He, Rui Wang, Min Zheng, Xin Pan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2309.09310
  • Pdf link: https://arxiv.org/pdf/2309.09310
  • Abstract Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation. However, the success of these GAN models hinges on ponderous computational costs and labor-expensive training data. Current efficient GAN learning techniques often fall into two orthogonal aspects: i) model slimming via reduced calculation costs; ii)data/label-efficient learning with fewer training data/labels. To combine the best of both worlds, we propose a new learning paradigm, Unified GAN Compression (UGC), with a unified optimization objective to seamlessly prompt the synergy of model-efficient and label-efficient learning. UGC sets up semi-supervised-driven network architecture search and adaptive online semi-supervised distillation stages sequentially, which formulates a heterogeneous mutual learning scheme to obtain an architecture-flexible, label-efficient, and performance-excellent model.

Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Sentence Embeddings

  • Authors: Authors: Stephen Fitz
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2309.09397
  • Pdf link: https://arxiv.org/pdf/2309.09397
  • Abstract As Large Language Models are deployed within Artificial Intelligence systems, that are increasingly integrated with human society, it becomes more important than ever to study their internal structures. Higher level abilities of LLMs such as GPT-3.5 emerge in large part due to informative language representations they induce from raw text data during pre-training on trillions of words. These embeddings exist in vector spaces of several thousand dimensions, and their processing involves mapping between multiple vector spaces, with total number of parameters on the order of trillions. Furthermore, these language representations are induced by gradient optimization, resulting in a black box system that is hard to interpret. In this paper, we take a look at the topological structure of neuronal activity in the "brain" of Chat-GPT's foundation language model, and analyze it with respect to a metric representing the notion of fairness. We develop a novel approach to visualize GPT's moral dimensions. We first compute a fairness metric, inspired by social psychology literature, to identify factors that typically influence fairness assessments in humans, such as legitimacy, need, and responsibility. Subsequently, we summarize the manifold's shape using a lower-dimensional simplicial complex, whose topology is derived from this metric. We color it with a heat map associated with this fairness metric, producing human-readable visualizations of the high-dimensional sentence manifold. Our results show that sentence embeddings based on GPT-3.5 can be decomposed into two submanifolds corresponding to fair and unfair moral judgments. This indicates that GPT-based language models develop a moral dimension within their representation spaces and induce an understanding of fairness during their training process.

A Schedule of Duties in the Cloud Space Using a Modified Salp Swarm Algorithm

  • Authors: Authors: Hossein Jamali, Ponkoj Chandra Shill, David Feil-Seifer, Frederick C. Harris, Jr., Sergiu M. Dascalu
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2309.09441
  • Pdf link: https://arxiv.org/pdf/2309.09441
  • Abstract Cloud computing is a concept introduced in the information technology era, with the main components being the grid, distributed, and valuable computing. The cloud is being developed continuously and, naturally, comes up with many challenges, one of which is scheduling. A schedule or timeline is a mechanism used to optimize the time for performing a duty or set of duties. A scheduling process is accountable for choosing the best resources for performing a duty. The main goal of a scheduling algorithm is to improve the efficiency and quality of the service while at the same time ensuring the acceptability and effectiveness of the targets. The task scheduling problem is one of the most important NP-hard issues in the cloud domain and, so far, many techniques have been proposed as solutions, including using genetic algorithms (GAs), particle swarm optimization, (PSO), and ant colony optimization (ACO). To address this problem, in this paper, one of the collective intelligence algorithms, called the Salp Swarm Algorithm (SSA), has been expanded, improved, and applied. The performance of the proposed algorithm has been compared with that of GAs, PSO, continuous ACO, and the basic SSA. The results show that our algorithm has generally higher performance than the other algorithms. For example, compared to the basic SSA, the proposed method has an average reduction of approximately 21% in makespan.

Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles

  • Authors: Authors: Noah Golowich, Dhruv Rohatgi, Ankur Moitra
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2309.09457
  • Pdf link: https://arxiv.org/pdf/2309.09457
  • Abstract The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map $\phi(x, a)$ that maps state-action pairs to $d$-dimensional vectors, and that the rewards and transitions are linear functions in this representation. But where do these features come from? In the absence of expert domain knowledge, a tempting strategy is to use the ``kitchen sink" approach and hope that the true features are included in a much larger set of potential features. In this paper we revisit linear MDPs from the perspective of feature selection. In a $k$-sparse linear MDP, there is an unknown subset $S \subset [d]$ of size $k$ containing all the relevant features, and the goal is to learn a near-optimal policy in only poly$(k,\log d)$ interactions with the environment. Our main result is the first polynomial-time algorithm for this problem. In contrast, earlier works either made prohibitively strong assumptions that obviated the need for exploration, or required solving computationally intractable optimization problems. Along the way we introduce the notion of an emulator: a succinct approximate representation of the transitions that suffices for computing certain Bellman backups. Since linear MDPs are a non-parametric model, it is not even obvious whether polynomial-sized emulators exist. We show that they do exist and can be computed efficiently via convex programming. As a corollary of our main result, we give an algorithm for learning a near-optimal policy in block MDPs whose decoding function is a low-depth decision tree; the algorithm runs in quasi-polynomial time and takes a polynomial number of samples. This can be seen as a reinforcement learning analogue of classic results in computational learning theory. Furthermore, it gives a natural model where improving the sample complexity via representation learning is computationally feasible.

Stealthy Physical Masked Face Recognition Attack via Adversarial Style Optimization

  • Authors: Authors: Huihui Gong, Minjing Dong, Siqi Ma, Seyit Camtepe, Surya Nepal, Chang Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2309.09480
  • Pdf link: https://arxiv.org/pdf/2309.09480
  • Abstract Deep neural networks (DNNs) have achieved state-of-the-art performance on face recognition (FR) tasks in the last decade. In real scenarios, the deployment of DNNs requires taking various face accessories into consideration, like glasses, hats, and masks. In the COVID-19 pandemic era, wearing face masks is one of the most effective ways to defend against the novel coronavirus. However, DNNs are known to be vulnerable to adversarial examples with a small but elaborated perturbation. Thus, a facial mask with adversarial perturbations may pose a great threat to the widely used deep learning-based FR models. In this paper, we consider a challenging adversarial setting: targeted attack against FR models. We propose a new stealthy physical masked FR attack via adversarial style optimization. Specifically, we train an adversarial style mask generator that hides adversarial perturbations inside style masks. Moreover, to ameliorate the phenomenon of sub-optimization with one fixed style, we propose to discover the optimal style given a target through style optimization in a continuous relaxation manner. We simultaneously optimize the generator and the style selection for generating strong and stealthy adversarial style masks. We evaluated the effectiveness and transferability of our proposed method via extensive white-box and black-box digital experiments. Furthermore, we also conducted physical attack experiments against local FR models and online platforms.

LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

  • Authors: Authors: Zecheng Tang, Chenfei Wu, Juntao Li, Nan Duan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2309.09506
  • Pdf link: https://arxiv.org/pdf/2309.09506
  • Abstract Graphic layout generation, a growing research field, plays a significant role in user engagement and information perception. Existing methods primarily treat layout generation as a numerical optimization task, focusing on quantitative aspects while overlooking the semantic information of layout, such as the relationship between each layout element. In this paper, we propose LayoutNUWA, the first model that treats layout generation as a code generation task to enhance semantic information and harness the hidden layout expertise of large language models~(LLMs). More concretely, we develop a Code Instruct Tuning (CIT) approach comprising three interconnected modules: 1) the Code Initialization (CI) module quantifies the numerical conditions and initializes them as HTML code with strategically placed masks; 2) the Code Completion (CC) module employs the formatting knowledge of LLMs to fill in the masked portions within the HTML code; 3) the Code Rendering (CR) module transforms the completed code into the final layout output, ensuring a highly interpretable and transparent layout generation procedure that directly maps code to a visualized layout. We attain significant state-of-the-art performance (even over 50% improvements) on multiple datasets, showcasing the strong capabilities of LayoutNUWA. Our code is available at https://github.com/ProjectNUWA/LayoutNUWA.

Pruning Large Language Models via Accuracy Predictor

  • Authors: Authors: Yupeng Ji, Yibo Cao, Jiucai Liu
  • Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2309.09507
  • Pdf link: https://arxiv.org/pdf/2309.09507
  • Abstract Large language models(LLMs) containing tens of billions of parameters (or even more) have demonstrated impressive capabilities in various NLP tasks. However, substantial model size poses challenges to training, inference, and deployment so that it is necessary to compress the model. At present, most model compression for LLMs requires manual design of pruning features, which has problems such as complex optimization pipeline and difficulty in retaining the capabilities of certain parts of the model.Therefore, we propose a novel pruning approach: firstly, a training set of a certain number of architecture-accuracy pairs is established, and then a non-neural model is trained as an accuracy predictor. Using the accuracy predictor to further optimize the search space and search, the optimal model can be automatically selected. Experiments show that our proposed approach is effective and efficient. Compared with the baseline, the perplexity(PPL) on Wikitext2 and PTB dropped by 9.48% and 5,76% respectively, and the average accuracy of MMLU increased by 6.28%.

Proof-of-Prospect-Theory: A Novel Game-based Consensus Mechanism for Blockchain

  • Authors: Authors: Yuqi Xie, Changbing Tang, Feilong Lin, Guanrong Chen, Zhao Zhang, Zhonglong Zheng
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2309.09529
  • Pdf link: https://arxiv.org/pdf/2309.09529
  • Abstract Blockchain technology is a breakthrough in changing the ways of business and organization operations, in which the consensus problem is challenging with practical constraints, such as computational power and consensus standard. In this paper, a novel consensus mechanism named Proof-of-Prospect-Theory (PoPT) is designed from the view of game theory, where the game prospect value is considered as an important election criterion of the block-recorder. PoPT portrays the popularity of a node in the network as an attribute, which is constituted by the subjective sensibilities of nodes. Furthermore, the performances of the PoPT and the willingness of ordinary nodes to participate in the consensus are analyzed, exploring fairness, decentralization, credibility, and the motivating ability of the consensus mechanism. Finally, numerical simulations with optimization of the PoPT consensus mechanism are demonstrated in the scenario of a smart grid system to illustrate the effectiveness of the PoPT.

Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation

  • Authors: Authors: Huan Liu, Zichang Tan, Qiang Chen, Yunchao Wei, Yao Zhao, Jingdong Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2309.09667
  • Pdf link: https://arxiv.org/pdf/2309.09667
  • Abstract Detecting and grounding multi-modal media manipulation (DGM^4) has become increasingly crucial due to the widespread dissemination of face forgery and text misinformation. In this paper, we present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM^4 problem. Unlike previous state-of-the-art methods that solely focus on the image (RGB) domain to describe visual forgery features, we additionally introduce the frequency domain as a complementary viewpoint. By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts. Then, our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands. Moreover, to address the semantic conflicts between image and frequency domains, the forgery-aware mutual module is developed to further enable the effective interaction of disparate image and frequency features, resulting in aligned and comprehensive visual forgery representations. Finally, based on visual and textual forgery features, we propose a unified decoder that comprises two symmetric cross-modal interaction modules responsible for gathering modality-specific forgery information, along with a fusing interaction module for aggregation of both modalities. The proposed unified decoder formulates our UFAFormer as a unified framework, ultimately simplifying the overall architecture and facilitating the optimization process. Experimental results on the DGM^4 dataset, containing several perturbations, demonstrate the superior performance of our framework compared to previous methods, setting a new benchmark in the field.

Distributed course allocation with asymmetric friendships

  • Authors: Authors: Ilya Khakhiashvili, Lihi Dery, Tal Grinshpoun
  • Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2309.09684
  • Pdf link: https://arxiv.org/pdf/2309.09684
  • Abstract Students' decisions on whether to take a class are strongly affected by whether their friends plan to take the class with them. A student may prefer to be assigned to a course they likes less, just to be with their friends, rather than taking a more preferred class alone. It has been shown that taking classes with friends positively affects academic performance. Thus, academic institutes should prioritize friendship relations when assigning course seats. The introduction of friendship relations results in several non-trivial changes to current course allocation methods. This paper explores how course allocation mechanisms can account for friendships between students and provide a unique, distributed solution. In particular, we model the problem as an asymmetric distributed constraint optimization problem and develop a new dedicated algorithm. Our extensive evaluation includes both simulated data and data derived from a user study on 177 students' preferences over courses and friends. The results show that our algorithm obtains high utility for the students while keeping the solution fair and observing courses' seat capacity limitations.

Securing Fixed Neural Network Steganography

  • Authors: Authors: Zicong Luo, Sheng Li, Guobiao Li, Zhenxing Qian, Xinpeng Zhang
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2309.09700
  • Pdf link: https://arxiv.org/pdf/2309.09700
  • Abstract Image steganography is the art of concealing secret information in images in a way that is imperceptible to unauthorized parties. Recent advances show that is possible to use a fixed neural network (FNN) for secret embedding and extraction. Such fixed neural network steganography (FNNS) achieves high steganographic performance without training the networks, which could be more useful in real-world applications. However, the existing FNNS schemes are vulnerable in the sense that anyone can extract the secret from the stego-image. To deal with this issue, we propose a key-based FNNS scheme to improve the security of the FNNS, where we generate key-controlled perturbations from the FNN for data embedding. As such, only the receiver who possesses the key is able to correctly extract the secret from the stego-image using the FNN. In order to improve the visual quality and undetectability of the stego-image, we further propose an adaptive perturbation optimization strategy by taking the perturbation cost into account. Experimental results show that our proposed scheme is capable of preventing unauthorized secret extraction from the stego-images. Furthermore, our scheme is able to generate stego-images with higher visual quality than the state-of-the-art FNNS scheme, especially when the FNN is a neural network for ordinary learning tasks.

Multi-Dictionary Tensor Decomposition

  • Authors: Authors: Maxwell McNeil, Petko Bogdanov
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2309.09717
  • Pdf link: https://arxiv.org/pdf/2309.09717
  • Abstract Tensor decomposition methods are popular tools for analysis of multi-way datasets from social media, healthcare, spatio-temporal domains, and others. Widely adopted models such as Tucker and canonical polyadic decomposition (CPD) follow a data-driven philosophy: they decompose a tensor into factors that approximate the observed data well. In some cases side information is available about the tensor modes. For example, in a temporal user-item purchases tensor a user influence graph, an item similarity graph, and knowledge about seasonality or trends in the temporal mode may be available. Such side information may enable more succinct and interpretable tensor decomposition models and improved quality in downstream tasks. We propose a framework for Multi-Dictionary Tensor Decomposition (MDTD) which takes advantage of prior structural information about tensor modes in the form of coding dictionaries to obtain sparsely encoded tensor factors. We derive a general optimization algorithm for MDTD that handles both complete input and input with missing values. Our framework handles large sparse tensors typical to many real-world application domains. We demonstrate MDTD's utility via experiments with both synthetic and real-world datasets. It learns more concise models than dictionary-free counterparts and improves (i) reconstruction quality ($60%$ fewer non-zero coefficients coupled with smaller error); (ii) missing values imputation quality (two-fold MSE reduction with up to orders of magnitude time savings) and (iii) the estimation of the tensor rank. MDTD's quality improvements do not come with a running time premium: it can decompose $19GB$ datasets in less than a minute. It can also impute missing values in sparse billion-entry tensors more accurately and scalably than state-of-the-art competitors.

Learning Covariances for Estimation with Constrained Bilevel Optimization

  • Authors: Authors: Mohamad Qadri, Zachary Manchester, Michael Kaess
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09718
  • Pdf link: https://arxiv.org/pdf/2309.09718
  • Abstract We consider the problem of learning error covariance matrices for robotic state estimation. The convergence of a state estimator to the correct belief over the robot state is dependent on the proper tuning of noise models. During inference, these models are used to weigh different blocks of the Jacobian and error vector resulting from linearization and hence, additionally affect the stability and convergence of the non-linear system. We propose a gradient-based method to estimate well-conditioned covariance matrices by formulating the learning process as a constrained bilevel optimization problem over factor graphs. We evaluate our method against baselines across a range of simulated and real-world tasks and demonstrate that our technique converges to model estimates that lead to better solutions as evidenced by the improved tracking accuracy on unseen test trajectories.

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

  • Authors: Authors: Hao Sun, Li Shen, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, Dacheng Tao
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2309.09719
  • Pdf link: https://arxiv.org/pdf/2309.09719
  • Abstract Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same learning rate across different clients. But they are not efficient when data is heterogeneous. To maximize the performance of optimization methods, the main challenge is how to adjust the learning rate without hurting the convergence. In this paper, we propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate based on local historical gradient squares and synchronized learning rates. Theoretical analysis shows that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients, which enables promising scalability in federated optimization. We also empirically compare our method with several communication-efficient federated optimization methods. Extensive experimental results on Computer Vision (CV) tasks and Natural Language Processing (NLP) task show the efficacy of our proposed FedLALR method and also coincides with our theoretical findings.

Significant improvement of lossy compression rate and speed of HPC data using perceptron parallelized compression

  • Authors: Authors: Xinzhe Chen, Jianjiang Li
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2309.09778
  • Pdf link: https://arxiv.org/pdf/2309.09778
  • Abstract The escalating surge in data generation presents formidable challenges to information technology, necessitating advancements in storage, retrieval, and utilization. With the proliferation of artificial intelligence and big data, the "Data Age 2025" report forecasts an exponential increase in global data production. The escalating data volumes raise concerns about efficient data processing. The paper addresses the predicament of achieving a lower compression ratio while maintaining or surpassing the compression performance of state-of-the-art techniques. This paper introduces a lossy compression framework grounded in the perceptron model for data prediction, striving for high compression quality. The contributions of this study encompass the introduction of positive and negative factors within the relative-to-absolute domain transformation algorithm, the utilization of a three-layer perceptron for improved predictive accuracy, and data selection rule modifications for parallelized compression within compression blocks. Comparative experiments with SZ2.1's PW_REL mode demonstrate a maximum compression ratio reduction of 17.78%. The article is structured as follows: the introduction highlights the data explosion challenge; related work delves into existing solutions; optimization of mapping algorithms in the relative and absolute domains is expounded in Section 3,the design of the new compression framework is detailed in Section 4,In Section 5 we describe the whole process and give pseudo-code, and in Section 6, our solution is evaluated. Finally, in Section 7, we provide an outlook for future work.

DFL-TORO: A One-Shot Demonstration Framework for Learning Time-Optimal Robotic Manufacturing Tasks

  • Authors: Authors: Alireza Barekatain, Hamed Habibi, Holger Voos
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09802
  • Pdf link: https://arxiv.org/pdf/2309.09802
  • Abstract This paper presents DFL-TORO, a novel Demonstration Framework for Learning Time-Optimal Robotic tasks via One-shot kinesthetic demonstration. It aims at optimizing the process of Learning from Demonstration (LfD), applied in the manufacturing sector. As the effectiveness of LfD is challenged by the quality and efficiency of human demonstrations, our approach offers a streamlined method to intuitively capture task requirements from human teachers, by reducing the need for multiple demonstrations. Furthermore, we propose an optimization-based smoothing algorithm that ensures time-optimal and jerk-regulated demonstration trajectories, while also adhering to the robot's kinematic constraints. The result is a significant reduction in noise, thereby boosting the robot's operation efficiency. Evaluations using a Franka Emika Research 3 (FR3) robot for a reaching task further substantiate the efficacy of our framework, highlighting its potential to transform kinesthetic demonstrations in contemporary manufacturing environments.

Energy Management of Hydrogen Hybrid Electric Vehicles -- A Potential Study

  • Authors: Authors: David Theodor Machacek, Nazim Ozan Yazar, Thomas Huber, Christopher Harald Onder
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2309.09804
  • Pdf link: https://arxiv.org/pdf/2309.09804
  • Abstract The hydrogen combustion engine (H$_2$ICE) is known to be able to burn H$_2$ under ultra-lean conditions, while producing no CO$_2$ emissions and extremely low engine-out NO$_x^{\mathrm{eo}}$ emissions. Immediate goals, as for instance the upcoming EURO 7 NO$_x$ limitations, can be reached more easily as extremely low engine-out NO$_x^{\mathrm{eo}}$ emissions facilitate the reduction of the overall tailpipe NO$_x^{\mathrm{tp}}$ emissions. In this work, the feasibility of achieving consistent reductions in NO$_x^{\mathrm{eo}}$ emissions through the implementation of electric hybridization of an H$_2$ICE-equipped passenger car (H$_2$-HEV), combined with a dedicated energy management strategy (EMS) is discussed. In particular, the mixed H$_2$-HEV architecture is investigated and compared to a series H$_2$-HEV, a parallel H$_2$-HEV, and a base H$_2$-vehicle, which is only equipped with an H$_2$ICE. For hybrid vehicles, a low H$_2$ consumption and low NO$_x^{\mathrm{eo}}$ emissions are conflicting objectives, the trade-off of which depends on the EMS and can be represented as a Pareto front. Overall, through the utilization of a dedicated energy management calibration, the mixed H$_2$-HEV demonstrates the capability to consistently achieve extremely low engine-out NO$_x^{\mathrm{eo}}$ emissions. For a broad range of driving missions, the mixed H$2$-HEV is able to decrease the engine-out NO$\mathrm{x}^\mathrm{eo}$ emissions by more than 90%, while, at the same time, the H$_2$ consumption is decreased by over 16%, compared to a comparable non-hybridized H$_2$-vehicle. These significant emission reductions are possible without having to modify the exhaust-gas aftertreatment system, or the optimization of any of the individual drivetrain components, but solely by setting the EMS calibration accordingly.

Coco-LIC: Continuous-Time Tightly-Coupled LiDAR-Inertial-Camera Odometry using Non-Uniform B-spline

  • Authors: Authors: Xiaolei Lang, Chao Chen, Kai Tang, Yukai Ma, Jiajun Lv, Yong Liu, Xingxing Zuo
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09808
  • Pdf link: https://arxiv.org/pdf/2309.09808
  • Abstract In this paper, we propose an efficient continuous-time LiDAR-Inertial-Camera Odometry, utilizing non-uniform B-splines to tightly couple measurements from the LiDAR, IMU, and camera. In contrast to uniform B-spline-based continuous-time methods, our non-uniform B-spline approach offers significant advantages in terms of achieving real-time efficiency and high accuracy. This is accomplished by dynamically and adaptively placing control points, taking into account the varying dynamics of the motion. To enable efficient fusion of heterogeneous LiDAR-Inertial-Camera data within a short sliding-window optimization, we assign depth to visual pixels using corresponding map points from a global LiDAR map, and formulate frame-to-map reprojection factors for the associated pixels in the current image frame. This way circumvents the necessity for depth optimization of visual pixels, which typically entails a lengthy sliding window with numerous control points for continuous-time trajectory estimation. We conduct dedicated experiments on real-world datasets to demonstrate the advantage and efficacy of adopting non-uniform continuous-time trajectory representation. Our LiDAR-Inertial-Camera odometry system is also extensively evaluated on both challenging scenarios with sensor degenerations and large-scale scenarios, and has shown comparable or higher accuracy than the state-of-the-art methods. The codebase of this paper will also be open-sourced at https://github.com/APRIL-ZJU/Coco-LIC.

Learning Inertial Parameter Identification of Unknown Object with Humanoid Robot using Sim-to-Real Adaptation

  • Authors: Authors: Donghoon Baek, Bo Peng, Saurabh Gupta, Joao Ramos
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09810
  • Pdf link: https://arxiv.org/pdf/2309.09810
  • Abstract Understanding the dynamics of unknown object is crucial for collaborative robots including humanoids to more safely and accurately interact with humans. Most relevant literature leverage a force/torque sensor, prior knowledge of object, vision system, and a long-horizon trajectory which are often impractical. Moreover, these methods often entail solving non-linear optimization problem, sometimes yielding physically inconsistent results. In this work, we propose a fast learningbased inertial parameter estimation as more practical manner. We acquire a reliable dataset in a high-fidelity simulation and train a time-series data-driven regression model (e.g., LSTM) to estimate the inertial parameter of unknown objects. We also introduce a novel sim-to-real adaptation method combining Robot System Identification and Gaussian Processes to directly transfer the trained model to real-world application. We demonstrate our method with a 4-DOF single manipulator of physical wheeled humanoid robot, SATYRR. Results show that our method can identify the inertial parameters of various unknown objects faster and more accurately than conventional methods.

DynaPix SLAM: A Pixel-Based Dynamic SLAM Approach

  • Authors: Authors: Chenghao Xu, Elia Bonetto, Aamir Ahmad
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09879
  • Pdf link: https://arxiv.org/pdf/2309.09879
  • Abstract In static environments, visual simultaneous localization and mapping (V-SLAM) methods achieve remarkable performance. However, moving objects severely affect core modules of such systems like state estimation and loop closure detection. To address this, dynamic SLAM approaches often use semantic information, geometric constraints, or optical flow to mask features associated with dynamic entities. These are limited by various factors such as a dependency on the quality of the underlying method, poor generalization to unknown or unexpected moving objects, and often produce noisy results, e.g. by masking static but movable objects or making use of predefined thresholds. In this paper, to address these trade-offs, we introduce a novel visual SLAM system, DynaPix, based on per-pixel motion probability values. Our approach consists of a new semantic-free probabilistic pixel-wise motion estimation module and an improved pose optimization process. Our per-pixel motion probability estimation combines a novel static background differencing method on both images and optical flows from splatted frames. DynaPix fully integrates those motion probabilities into both map point selection and weighted bundle adjustment within the tracking and optimization modules of ORB-SLAM2. We evaluate DynaPix against ORB-SLAM2 and DynaSLAM on both GRADE and TUM-RGBD datasets, obtaining lower errors and longer trajectory tracking times. We will release both source code and data upon acceptance of this work.

Differentiable Boustrophedon Path Plans

  • Authors: Authors: Thomas Manzini, Robin Murphy
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09882
  • Pdf link: https://arxiv.org/pdf/2309.09882
  • Abstract This paper introduces a differentiable representation for optimization of boustrophedon path plans in convex polygons, explores an additional parameter of these path plans that can be optimized, discusses the properties of this representation that can be leveraged during the optimization process, and shows that the previously published attempt at optimization of these path plans was too coarse to be practically useful. Experiments were conducted to show that this differentiable representation can reproduce the same scores from transitional discrete representations of boustrophedon path plans with high fidelity. Finally, optimization via gradient descent was attempted, but found to fail because the search space is far more non-convex than was previously considered in the literature. The wide range of applications for boustrophedon path plans means that this work has the potential to improve path planning efficiency in numerous areas of robotics including mapping and search tasks using uncrewed aerial systems, environmental sampling tasks using uncrewed marine vehicles, and agricultural tasks using ground vehicles, among numerous others applications.

Recycling Krylov Subspaces for Efficient Partitioned Solution of Aerostructural Adjoint Systems

  • Authors: Authors: Christophe Blondeau
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2309.09925
  • Pdf link: https://arxiv.org/pdf/2309.09925
  • Abstract Robust and efficient solvers for coupled-adjoint linear systems are crucial to successful aerostructural optimization. Monolithic and partitioned strategies can be applied. The monolithic approach is expected to offer better robustness and efficiency for strong fluid-structure interactions. However, it requires a high implementation cost and convergence may depend on appropriate scaling and initialization strategies. On the other hand, the modularity of the partitioned method enables a straightforward implementation while its convergence may require relaxation. In addition, a partitioned solver leads to a higher number of iterations to get the same level of convergence as the monolithic one. The objective of this paper is to accelerate the fluid-structure coupled-adjoint partitioned solver by considering techniques borrowed from approximate invariant subspace recycling strategies adapted to sequences of linear systems with varying right-hand sides. Indeed, in a partitioned framework, the structural source term attached to the fluid block of equations affects the right-hand side with the nice property of quickly converging to a constant value. We also consider deflation of approximate eigenvectors in conjunction with advanced inner-outer Krylov solvers for the fluid block equations. We demonstrate the benefit of these techniques by computing the coupled derivatives of an aeroelastic configuration of the ONERA-M6 fixed wing in transonic flow. For this exercise the fluid grid was coupled to a structural model specifically designed to exhibit a high flexibility. All computations are performed using RANS flow modeling and a fully linearized one-equation Spalart-Allmaras turbulence model. Numerical simulations show up to 39% reduction in matrix-vector products for GCRO-DR and up to 19% for the nested FGCRO-DR solver.

Hierarchical Attention and Graph Neural Networks: Toward Drift-Free Pose Estimation

  • Authors: Authors: Kathia Melbouci, Fawzi Nashashibi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09934
  • Pdf link: https://arxiv.org/pdf/2309.09934
  • Abstract The most commonly used method for addressing 3D geometric registration is the iterative closet-point algorithm, this approach is incremental and prone to drift over multiple consecutive frames. The Common strategy to address the drift is the pose graph optimization subsequent to frame-to-frame registration, incorporating a loop closure process that identifies previously visited places. In this paper, we explore a framework that replaces traditional geometric registration and pose graph optimization with a learned model utilizing hierarchical attention mechanisms and graph neural networks. We propose a strategy to condense the data flow, preserving essential information required for the precise estimation of rigid poses. Our results, derived from tests on the KITTI Odometry dataset, demonstrate a significant improvement in pose estimation accuracy. This improvement is especially notable in determining rotational components when compared with results obtained through conventional multi-way registration via pose graph optimization. The code will be made available upon completion of the review process.

Keyword: adam

High-order BDF convolution quadrature for fractional evolution equations with hyper-singular source term

  • Authors: Authors: Jiankang Shi, Minghua Chen, Jianxiong Cao
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2309.09664
  • Pdf link: https://arxiv.org/pdf/2309.09664
  • Abstract Anomalous diffusion in the presence or absence of an external force field is often modelled in terms of the fractional evolution equations, which can involve the hyper-singular source term. For this case, conventional time stepping methods may exhibit a severe order reduction. Although a second-order numerical algorithm is provided for the subdiffusion model with a simple hyper-singular source term $t^{\mu}$, $-2<\mu<-1$ in [arXiv:2207.08447], the convergence analysis remain to be proved. To fill in these gaps, we present a simple and robust smoothing method for the hyper-singular source term, where the Hadamard finite-part integral is introduced. This method is based on the smoothing/ID$m$-BDF$k$ method proposed by the authors [Shi and Chen, SIAM J. Numer. Anal., to appear] for subdiffusion equation with a weakly singular source term. We prove that the $k$th-order convergence rate can be restored for the diffusion-wave case $\gamma \in (1,2)$ and sketch the proof for the subdiffusion case $\gamma \in (0,1)$, even if the source term is hyper-singular and the initial data is not compatible. Numerical experiments are provided to confirm the theoretical results.

Multi-turn Dialogue Comprehension from a Topic-aware Perspective

  • Authors: Authors: Xinbei Ma, Yi Xu, Hai Zhao, Zhuosheng Zhang
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2309.09666
  • Pdf link: https://arxiv.org/pdf/2309.09666
  • Abstract Dialogue related Machine Reading Comprehension requires language models to effectively decouple and model multi-turn dialogue passages. As a dialogue development goes after the intentions of participants, its topic may not keep constant through the whole passage. Hence, it is non-trivial to detect and leverage the topic shift in dialogue modeling. Topic modeling, although has been widely studied in plain text, deserves far more utilization in dialogue reading comprehension. This paper proposes to model multi-turn dialogues from a topic-aware perspective. We start with a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way. Then we use these fragments as topic-aware language processing units in further dialogue comprehension. On one hand, the split segments indict specific topics rather than mixed intentions, thus showing convenient on in-domain topic detection and location. For this task, we design a clustering system with a self-training auto-encoder, and we build two constructed datasets for evaluation. On the other hand, the split segments are an appropriate element of multi-turn dialogue response selection. For this purpose, we further present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements and matches response candidates with a dual cross-attention. Empirical studies on three public benchmarks show great improvements over baselines. Our work continues the previous studies on document topic, and brings the dialogue modeling to a novel topic-aware perspective with exhaustive experiments and analyses.

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

  • Authors: Authors: Hao Sun, Li Shen, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, Dacheng Tao
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2309.09719
  • Pdf link: https://arxiv.org/pdf/2309.09719
  • Abstract Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same learning rate across different clients. But they are not efficient when data is heterogeneous. To maximize the performance of optimization methods, the main challenge is how to adjust the learning rate without hurting the convergence. In this paper, we propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate based on local historical gradient squares and synchronized learning rates. Theoretical analysis shows that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients, which enables promising scalability in federated optimization. We also empirically compare our method with several communication-efficient federated optimization methods. Extensive experimental results on Computer Vision (CV) tasks and Natural Language Processing (NLP) task show the efficacy of our proposed FedLALR method and also coincides with our theoretical findings.

Keyword: gradient

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

  • Authors: Authors: Yi Shen, Pan Xu, Michael M. Zavlanos
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2309.08748
  • Pdf link: https://arxiv.org/pdf/2309.08748
  • Abstract Without direct interaction with the environment. Often, the environment in which the data are collected differs from the environment in which the learned policy is applied. To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed that compute worst-case bounds on the policy values assuming that the distribution of the new environment lies within an uncertainty set. Typically, this uncertainty set is defined based on the KL divergence around the empirical distribution computed from the logging dataset. However, the KL uncertainty set fails to encompass distributions with varying support and lacks awareness of the geometry of the distribution support. As a result, KL approaches fall short in addressing practical environment mismatches and lead to over-fitting to worst-case scenarios. To overcome these limitations, we propose a novel DRO approach that employs the Wasserstein distance instead. While Wasserstein DRO is generally computationally more expensive compared to KL DRO, we present a regularized method and a practical (biased) stochastic gradient descent method to optimize the policy efficiently. We also provide a theoretical analysis of the finite sample complexity and iteration complexity for our proposed method. We further validate our approach using a public dataset that was recorded in a randomized stoke trial.

Control Barrier Function for Linearizable Systems with High Relative Degrees from Signal Temporal Logics: A Reference Governor Approach

  • Authors: Authors: Kaier Liang, Mingyu Cai, Cristian-Ioan Vasile
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2309.08813
  • Pdf link: https://arxiv.org/pdf/2309.08813
  • Abstract This paper considers the safety-critical navigation problem with Signal Temporal Logic (STL) tasks. We developed an explicit reference governor-guided control barrier function (ERG-guided CBF) method that enables the application of first-order CBFs to high-order linearizable systems. This method significantly reduces the conservativeness of the existing CBF approaches for high-order systems. Furthermore, our framework provides safety-critical guarantees in the sense of obstacle avoidance by constructing the margin of safety and updating direction of safe evolution in the agent's state space. To improve control performance and enhance STL satisfaction, we employ efficient gradient-based methods for iteratively learning optimal parameters of ERG-guided CBF. We validate the algorithm through both high-order linear and nonlinear systems. A video demonstration can be found on: \url{https://youtu.be/ZRmsA2FeFR4}

GRaCE: Optimizing Grasps to Satisfy Ranked Criteria in Complex Scenario

  • Authors: Authors: Tasbolat Taunyazov, Kelvin Lin, Harold Soh
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.08887
  • Pdf link: https://arxiv.org/pdf/2309.08887
  • Abstract This paper addresses the multi-faceted problem of robot grasping, where multiple criteria may conflict and differ in importance. We introduce Grasp Ranking and Criteria Evaluation (GRaCE), a novel approach that employs hierarchical rule-based logic and a rank-preserving utility function to optimize grasps based on various criteria such as stability, kinematic constraints, and goal-oriented functionalities. Additionally, we propose GRaCE-OPT, a hybrid optimization strategy that combines gradient-based and gradient-free methods to effectively navigate the complex, non-convex utility function. Experimental results in both simulated and real-world scenarios show that GRaCE requires fewer samples to achieve comparable or superior performance relative to existing methods. The modular architecture of GRaCE allows for easy customization and adaptation to specific application needs.

GCL: Gradient-Guided Contrastive Learning for Medical Image Segmentation with Multi-Perspective Meta Labels

  • Authors: Authors: Yixuan Wu, Jintai Chen, Jiahuan Yan, Yiheng Zhu, Danny Z. Chen, Jian Wu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2309.08888
  • Pdf link: https://arxiv.org/pdf/2309.08888
  • Abstract Since annotating medical images for segmentation tasks commonly incurs expensive costs, it is highly desirable to design an annotation-efficient method to alleviate the annotation burden. Recently, contrastive learning has exhibited a great potential in learning robust representations to boost downstream tasks with limited labels. In medical imaging scenarios, ready-made meta labels (i.e., specific attribute information of medical images) inherently reveal semantic relationships among images, which have been used to define positive pairs in previous work. However, the multi-perspective semantics revealed by various meta labels are usually incompatible and can incur intractable "semantic contradiction" when combining different meta labels. In this paper, we tackle the issue of "semantic contradiction" in a gradient-guided manner using our proposed Gradient Mitigator method, which systematically unifies multi-perspective meta labels to enable a pre-trained model to attain a better high-level semantic recognition ability. Moreover, we emphasize that the fine-grained discrimination ability is vital for segmentation-oriented pre-training, and develop a novel method called Gradient Filter to dynamically screen pixel pairs with the most discriminating power based on the magnitude of gradients. Comprehensive experiments on four medical image segmentation datasets verify that our new method GCL: (1) learns informative image representations and considerably boosts segmentation performance with limited labels, and (2) shows promising generalizability on out-of-distribution datasets.

Efficient Methods for Non-stationary Online Learning

  • Authors: Authors: Peng Zhao, Yan-Feng Xie, Lijun Zhang, Zhi-Hua Zhou
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2309.08911
  • Pdf link: https://arxiv.org/pdf/2309.08911
  • Abstract Non-stationary online learning has drawn much attention in recent years. In particular, dynamic regret and adaptive regret are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of the non-stationarity, in which a group of base-learners are maintained and a meta-algorithm is employed to track the best one on the fly. However, the two-layer structure raises the concern about the computational complexity -- those methods typically maintain $\mathcal{O}(\log T)$ base-learners simultaneously for a $T$-round online game and thus perform multiple projections onto the feasible domain per round, which becomes the computational bottleneck when the domain is complicated. In this paper, we present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $\mathcal{O}(\log T)$ to $1$. Moreover, our obtained algorithms require only one gradient query and one function evaluation at each round. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods. Empirical studies verify our theoretical findings.

Solving Quadratic Systems with Full-Rank Matrices Using Sparse or Generative Priors

  • Authors: Authors: Junren Chen, Shuai Huang, Michael K. Ng, Zhaoqiang Liu
  • Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2309.09032
  • Pdf link: https://arxiv.org/pdf/2309.09032
  • Abstract The problem of recovering a signal $\boldsymbol{x} \in \mathbb{R}^n$ from a quadratic system ${y_i=\boldsymbol{x}^\top\boldsymbol{A}_i\boldsymbol{x},\ i=1,\ldots,m}$ with full-rank matrices $\boldsymbol{A}_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. With i.i.d. standard Gaussian matrices $\boldsymbol{A}_i$, this paper addresses the high-dimensional case where $m\ll n$ by incorporating prior knowledge of $\boldsymbol{x}$. First, we consider a $k$-sparse $\boldsymbol{x}$ and introduce the thresholded Wirtinger flow (TWF) algorithm that does not require the sparsity level $k$. TWF comprises two steps: the spectral initialization that identifies a point sufficiently close to $\boldsymbol{x}$ (up to a sign flip) when $m=O(k^2\log n)$, and the thresholded gradient descent (with a good initialization) that produces a sequence linearly converging to $\boldsymbol{x}$ with $m=O(k\log n)$ measurements. Second, we explore the generative prior, assuming that $\boldsymbol{x}$ lies in the range of an $L$-Lipschitz continuous generative model with $k$-dimensional inputs in an $\ell_2$-ball of radius $r$. We develop the projected gradient descent (PGD) algorithm that also comprises two steps: the projected power method that provides an initial vector with $O\big(\sqrt{\frac{k \log L}{m}}\big)$ $\ell_2$-error given $m=O(k\log(Lnr))$ measurements, and the projected gradient descent that refines the $\ell_2$-error to $O(\delta)$ at a geometric rate when $m=O(k\log\frac{Lrn}{\delta^2})$. Experimental results corroborate our theoretical findings and show that: (i) our approach for the sparse case notably outperforms the existing provable algorithm sparse power factorization; (ii) leveraging the generative prior allows for precise image recovery in the MNIST dataset from a small number of quadratic measurements.

Neural Gradient Learning and Optimization for Oriented Point Normal Estimation

  • Authors: Authors: Qing Li, Huifang Feng, Kanle Shi, Yi Fang, Yu-Shen Liu, Zhizhong Han
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2309.09211
  • Pdf link: https://arxiv.org/pdf/2309.09211
  • Abstract We propose Neural Gradient Learning (NGL), a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation. It has excellent gradient approximation properties for the underlying geometry of the data. We utilize a simple neural network to parameterize the objective function to produce gradients at points using a global implicit representation. However, the derived gradients usually drift away from the ground-truth oriented normals due to the lack of local detail descriptions. Therefore, we introduce Gradient Vector Optimization (GVO) to learn an angular distance field based on local plane geometry to refine the coarse gradient vectors. Finally, we formulate our method with a two-phase pipeline of coarse estimation followed by refinement. Moreover, we integrate two weighting functions, i.e., anisotropic kernel and inlier score, into the optimization to improve the robust and detail-preserving performance. Our method efficiently conducts global gradient approximation while achieving better accuracy and generalization ability of local feature description. This leads to a state-of-the-art normal estimator that is robust to noise, outliers and point density variations. Extensive evaluations show that our method outperforms previous works in both unoriented and oriented normal estimation on widely used benchmarks. The source code and pre-trained models are available at https://github.com/LeoQLi/NGLO.

Convex Latent-Optimized Adversarial Regularizers for Imaging Inverse Problems

  • Authors: Authors: Huayu Wang, Chen Luo, Taofeng Xie, Qiyu Jin, Guoqing Chen, Zhuo-Xu Cui, Dong Liang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2309.09250
  • Pdf link: https://arxiv.org/pdf/2309.09250
  • Abstract Recently, data-driven techniques have demonstrated remarkable effectiveness in addressing challenges related to MR imaging inverse problems. However, these methods still exhibit certain limitations in terms of interpretability and robustness. In response, we introduce Convex Latent-Optimized Adversarial Regularizers (CLEAR), a novel and interpretable data-driven paradigm. CLEAR represents a fusion of deep learning (DL) and variational regularization. Specifically, we employ a latent optimization technique to adversarially train an input convex neural network, and its set of minima can fully represent the real data manifold. We utilize it as a convex regularizer to formulate a CLEAR-informed variational regularization model that guides the solution of the imaging inverse problem on the real data manifold. Leveraging its inherent convexity, we have established the convergence of the projected subgradient descent algorithm for the CLEAR-informed regularization model. This convergence guarantees the attainment of a unique solution to the imaging inverse problem, subject to certain assumptions. Furthermore, we have demonstrated the robustness of our CLEAR-informed model, explicitly showcasing its capacity to achieve stable reconstruction even in the presence of measurement interference. Finally, we illustrate the superiority of our approach using MRI reconstruction as an example. Our method consistently outperforms conventional data-driven techniques and traditional regularization approaches, excelling in both reconstruction quality and robustness.

A Distributed Strategy to Maximize Coverage in a Heterogeneous Sensor Network in the Presence of Obstacles

  • Authors: Authors: Hesam Mosalli, Amir G. Aghdam
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2309.09363
  • Pdf link: https://arxiv.org/pdf/2309.09363
  • Abstract In this paper, an efficient deployment strategy is proposed for a network of mobile and static sensors with nonidentical sensing and communication radii. The multiplicatively weighted Voronoi (MW-Voronoi) diagram is used to partition the field and assign the underlying coverage task to each mobile sensor. A gradient-based method is applied to find the best candidate point based on the detected coverage holes and the coverage priority considering the relative distance of the mobile sensor from the static ones and the obstacles in the field. The sensors move to a new position if such a relocation increases their local coverage. The efficiency of the proposed strategy in different scenarios is demonstrated by simulations.

Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Sentence Embeddings

  • Authors: Authors: Stephen Fitz
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2309.09397
  • Pdf link: https://arxiv.org/pdf/2309.09397
  • Abstract As Large Language Models are deployed within Artificial Intelligence systems, that are increasingly integrated with human society, it becomes more important than ever to study their internal structures. Higher level abilities of LLMs such as GPT-3.5 emerge in large part due to informative language representations they induce from raw text data during pre-training on trillions of words. These embeddings exist in vector spaces of several thousand dimensions, and their processing involves mapping between multiple vector spaces, with total number of parameters on the order of trillions. Furthermore, these language representations are induced by gradient optimization, resulting in a black box system that is hard to interpret. In this paper, we take a look at the topological structure of neuronal activity in the "brain" of Chat-GPT's foundation language model, and analyze it with respect to a metric representing the notion of fairness. We develop a novel approach to visualize GPT's moral dimensions. We first compute a fairness metric, inspired by social psychology literature, to identify factors that typically influence fairness assessments in humans, such as legitimacy, need, and responsibility. Subsequently, we summarize the manifold's shape using a lower-dimensional simplicial complex, whose topology is derived from this metric. We color it with a heat map associated with this fairness metric, producing human-readable visualizations of the high-dimensional sentence manifold. Our results show that sentence embeddings based on GPT-3.5 can be decomposed into two submanifolds corresponding to fair and unfair moral judgments. This indicates that GPT-based language models develop a moral dimension within their representation spaces and induce an understanding of fairness during their training process.

Reducing Adversarial Training Cost with Gradient Approximation

  • Authors: Authors: Huihui Gong, Shuo Yang, Siqi Ma, Seyit Camtepe, Surya Nepal, Chang Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2309.09464
  • Pdf link: https://arxiv.org/pdf/2309.09464
  • Abstract Deep learning models have achieved state-of-the-art performances in various domains, while they are vulnerable to the inputs with well-crafted but small perturbations, which are named after adversarial examples (AEs). Among many strategies to improve the model robustness against AEs, Projected Gradient Descent (PGD) based adversarial training is one of the most effective methods. Unfortunately, the prohibitive computational overhead of generating strong enough AEs, due to the maximization of the loss function, sometimes makes the regular PGD adversarial training impractical when using larger and more complicated models. In this paper, we propose that the adversarial loss can be approximated by the partial sum of Taylor series. Furthermore, we approximate the gradient of adversarial loss and propose a new and efficient adversarial training method, adversarial training with gradient approximation (GAAT), to reduce the cost of building up robust models. Additionally, extensive experiments demonstrate that this efficiency improvement can be achieved without any or with very little loss in accuracy on natural and adversarial examples, which show that our proposed method saves up to 60% of the training time with comparable model test accuracy on MNIST, CIFAR-10 and CIFAR-100 datasets.

Gradpaint: Gradient-Guided Inpainting with Diffusion Models

  • Authors: Authors: Asya Grechka, Guillaume Couairon, Matthieu Cord
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2309.09614
  • Pdf link: https://arxiv.org/pdf/2309.09614
  • Abstract Denoising Diffusion Probabilistic Models (DDPMs) have recently achieved remarkable results in conditional and unconditional image generation. The pre-trained models can be adapted without further training to different downstream tasks, by guiding their iterative denoising process at inference time to satisfy additional constraints. For the specific task of image inpainting, the current guiding mechanism relies on copying-and-pasting the known regions from the input image at each denoising step. However, diffusion models are strongly conditioned by the initial random noise, and therefore struggle to harmonize predictions inside the inpainting mask with the real parts of the input image, often producing results with unnatural artifacts. Our method, dubbed GradPaint, steers the generation towards a globally coherent image. At each step in the denoising process, we leverage the model's "denoised image estimation" by calculating a custom loss measuring its coherence with the masked input image. Our guiding mechanism uses the gradient obtained from backpropagating this loss through the diffusion model itself. GradPaint generalizes well to diffusion models trained on various datasets, improving upon current state-of-the-art supervised and unsupervised methods.

Two-Stage Learning of Highly Dynamic Motions with Rigid and Articulated Soft Quadrupeds

  • Authors: Authors: Francecso Vezzi, Jiatao Ding, Antonin Raffin, Jens Kober, Cosimo Della Santina
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09682
  • Pdf link: https://arxiv.org/pdf/2309.09682
  • Abstract Controlled execution of dynamic motions in quadrupedal robots, especially those with articulated soft bodies, presents a unique set of challenges that traditional methods struggle to address efficiently. In this study, we tackle these issues by relying on a simple yet effective two-stage learning framework to generate dynamic motions for quadrupedal robots. First, a gradient-free evolution strategy is employed to discover simply represented control policies, eliminating the need for a predefined reference motion. Then, we refine these policies using deep reinforcement learning. Our approach enables the acquisition of complex motions like pronking and back-flipping, effectively from scratch. Additionally, our method simplifies the traditionally labour-intensive task of reward shaping, boosting the efficiency of the learning process. Importantly, our framework proves particularly effective for articulated soft quadrupeds, whose inherent compliance and adaptability make them ideal for dynamic tasks but also introduce unique control challenges.

Learning Covariances for Estimation with Constrained Bilevel Optimization

  • Authors: Authors: Mohamad Qadri, Zachary Manchester, Michael Kaess
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09718
  • Pdf link: https://arxiv.org/pdf/2309.09718
  • Abstract We consider the problem of learning error covariance matrices for robotic state estimation. The convergence of a state estimator to the correct belief over the robot state is dependent on the proper tuning of noise models. During inference, these models are used to weigh different blocks of the Jacobian and error vector resulting from linearization and hence, additionally affect the stability and convergence of the non-linear system. We propose a gradient-based method to estimate well-conditioned covariance matrices by formulating the learning process as a constrained bilevel optimization problem over factor graphs. We evaluate our method against baselines across a range of simulated and real-world tasks and demonstrate that our technique converges to model estimates that lead to better solutions as evidenced by the improved tracking accuracy on unseen test trajectories.

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

  • Authors: Authors: Hao Sun, Li Shen, Shixiang Chen, Jingwei Sun, Jing Li, Guangzhong Sun, Dacheng Tao
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2309.09719
  • Pdf link: https://arxiv.org/pdf/2309.09719
  • Abstract Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same learning rate across different clients. But they are not efficient when data is heterogeneous. To maximize the performance of optimization methods, the main challenge is how to adjust the learning rate without hurting the convergence. In this paper, we propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate based on local historical gradient squares and synchronized learning rates. Theoretical analysis shows that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients, which enables promising scalability in federated optimization. We also empirically compare our method with several communication-efficient federated optimization methods. Extensive experimental results on Computer Vision (CV) tasks and Natural Language Processing (NLP) task show the efficacy of our proposed FedLALR method and also coincides with our theoretical findings.

Differentiable Boustrophedon Path Plans

  • Authors: Authors: Thomas Manzini, Robin Murphy
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2309.09882
  • Pdf link: https://arxiv.org/pdf/2309.09882
  • Abstract This paper introduces a differentiable representation for optimization of boustrophedon path plans in convex polygons, explores an additional parameter of these path plans that can be optimized, discusses the properties of this representation that can be leveraged during the optimization process, and shows that the previously published attempt at optimization of these path plans was too coarse to be practically useful. Experiments were conducted to show that this differentiable representation can reproduce the same scores from transitional discrete representations of boustrophedon path plans with high fidelity. Finally, optimization via gradient descent was attempted, but found to fail because the search space is far more non-convex than was previously considered in the literature. The wide range of applications for boustrophedon path plans means that this work has the potential to improve path planning efficiency in numerous areas of robotics including mapping and search tasks using uncrewed aerial systems, environmental sampling tasks using uncrewed marine vehicles, and agricultural tasks using ground vehicles, among numerous others applications.

Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees

  • Authors: Authors: Alexia Jolicoeur-Martineau, Kilian Fatras, Tal Kachman
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2309.09968
  • Pdf link: https://arxiv.org/pdf/2309.09968
  • Abstract Tabular data is hard to acquire and is subject to missing values. This paper proposes a novel approach to generate and impute mixed-type (continuous and categorical) tabular data using score-based diffusion and conditional flow matching. Contrary to previous work that relies on neural networks as function approximators, we instead utilize XGBoost, a popular Gradient-Boosted Tree (GBT) method. In addition to being elegant, we empirically show on various datasets that our method i) generates highly realistic synthetic data when the training dataset is either clean or tainted by missing data and ii) generates diverse plausible data imputations. Our method often outperforms deep-learning generation methods and can trained in parallel using CPUs without the need for a GPU. To make it easily accessible, we release our code through a Python library on PyPI and an R package on CRAN.

Keyword: super-resolution

Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution

  • Authors: Authors: Wenyu Zhang, Xin Deng, Baojun Jia, Xingtong Yu, Yifan Chen, jin Ma, Qing Ding, Xinming Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2309.08919
  • Pdf link: https://arxiv.org/pdf/2309.08919
  • Abstract Current Scene text image super-resolution approaches primarily focus on extracting robust features, acquiring text information, and complex training strategies to generate super-resolution images. However, the upsampling module, which is crucial in the process of converting low-resolution images to high-resolution ones, has received little attention in existing works. To address this issue, we propose the Pixel Adapter Module (PAM) based on graph attention to address pixel distortion caused by upsampling. The PAM effectively captures local structural information by allowing each pixel to interact with its neighbors and update features. Unlike previous graph attention mechanisms, our approach achieves 2-3 orders of magnitude improvement in efficiency and memory utilization by eliminating the dependency on sparse adjacency matrices and introducing a sliding window approach for efficient parallel computation. Additionally, we introduce the MLP-based Sequential Residual Block (MSRB) for robust feature extraction from text images, and a Local Contour Awareness loss ($\mathcal{L}_{lca}$) to enhance the model's perception of details. Comprehensive experiments on TextZoom demonstrate that our proposed method generates high-quality super-resolution images, surpassing existing methods in recognition accuracy. For single-stage and multi-stage strategies, we achieved improvements of 0.7% and 2.6%, respectively, increasing the performance from 52.6% and 53.7% to 53.3% and 56.3%. The code is available at https://github.com/wenyu1009/RTSRN.

zoq avatar Sep 19 '23 06:09 zoq