arxiv-updates
arxiv-updates copied to clipboard
New submissions for Tue, 26 Sep 23
Keyword: sgd
Machine Learning Technique Based Fake News Detection
- Authors: Authors: Biplob Kumar Sutradhar, Md. Zonaid, Nushrat Jahan Ria, Sheak Rashed Haider Noori
- Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13069
- Pdf link: https://arxiv.org/pdf/2309.13069
- Abstract False news has received attention from both the general public and the scholarly world. Such false information has the ability to affect public perception, giving nefarious groups the chance to influence the results of public events like elections. Anyone can share fake news or facts about anyone or anything for their personal gain or to cause someone trouble. Also, information varies depending on the part of the world it is shared on. Thus, in this paper, we have trained a model to classify fake and true news by utilizing the 1876 news data from our collected dataset. We have preprocessed the data to get clean and filtered texts by following the Natural Language Processing approaches. Our research conducts 3 popular Machine Learning (Stochastic gradient descent, Na"ive Bayes, Logistic Regression,) and 2 Deep Learning (Long-Short Term Memory, ASGD Weight-Dropped LSTM, or AWD-LSTM) algorithms. After we have found our best Naive Bayes classifier with 56% accuracy and an F1-macro score of an average of 32%.
Grounding Description-Driven Dialogue State Trackers with Knowledge-Seeking Turns
- Authors: Authors: Alexandru Coca, Bo-Hsiang Tseng, Jinghong Chen, Weizhe Lin, Weixuan Zhang, Tisha Anders, Bill Byrne
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2309.13448
- Pdf link: https://arxiv.org/pdf/2309.13448
- Abstract Schema-guided dialogue state trackers can generalise to new domains without further training, yet they are sensitive to the writing style of the schemata. Augmenting the training set with human or synthetic schema paraphrases improves the model robustness to these variations but can be either costly or difficult to control. We propose to circumvent these issues by grounding the state tracking model in knowledge-seeking turns collected from the dialogue corpus as well as the schema. Including these turns in prompts during finetuning and inference leads to marked improvements in model robustness, as demonstrated by large average joint goal accuracy and schema sensitivity improvements on SGD and SGD-X.
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
- Authors: Authors: Guo-qing Jiang, Jinlong Liu, Zixiang Ding, Lin Guo, Wei Lin
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13681
- Pdf link: https://arxiv.org/pdf/2309.13681
- Abstract As models for nature language processing (NLP), computer vision (CV) and recommendation systems (RS) require surging computation, a large number of GPUs/TPUs are paralleled as a large batch (LB) to improve training throughput. However, training such LB tasks often meets large generalization gap and downgrades final precision, which limits enlarging the batch size. In this work, we develop the variance reduced gradient descent technique (VRGD) based on the gradient signal to noise ratio (GSNR) and apply it onto popular optimizers such as SGD/Adam/LARS/LAMB. We carry out a theoretical analysis of convergence rate to explain its fast training dynamics, and a generalization analysis to demonstrate its smaller generalization gap on LB training. Comprehensive experiments demonstrate that VRGD can accelerate training ($1\sim 2 \times$), narrow generalization gap and improve final accuracy. We push the batch size limit of BERT pretraining up to 128k/64k and DLRM to 512k without noticeable accuracy loss. We improve ImageNet Top-1 accuracy at 96k by $0.52pp$ than LARS. The generalization gap of BERT and ImageNet training is significantly reduce by over $65%$.
Keyword: optimization
A Differentiable Framework for End-to-End Learning of Hybrid Structured Compression
- Authors: Authors: Moonjung Eo, Suhyun Kang, Wonjong Rhee
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
- Arxiv link: https://arxiv.org/abs/2309.13077
- Pdf link: https://arxiv.org/pdf/2309.13077
- Abstract Filter pruning and low-rank decomposition are two of the foundational techniques for structured compression. Although recent efforts have explored hybrid approaches aiming to integrate the advantages of both techniques, their performance gains have been modest at best. In this study, we develop a \textit{Differentiable Framework~(DF)} that can express filter selection, rank selection, and budget constraint into a single analytical formulation. Within the framework, we introduce DML-S for filter selection, integrating scheduling into existing mask learning techniques. Additionally, we present DTL-S for rank selection, utilizing a singular value thresholding operator. The framework with DML-S and DTL-S offers a hybrid structured compression methodology that facilitates end-to-end learning through gradient-base optimization. Experimental results demonstrate the efficacy of DF, surpassing state-of-the-art structured compression methods. Our work establishes a robust and versatile avenue for advancing structured compression techniques.
Multiple Independent DE Optimizations to Tackle Uncertainty and Variability in Demand in Inventory Management
- Authors: Authors: Sarit Maitra, Sukanya Kundu, Vivek Mishra
- Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2309.13095
- Pdf link: https://arxiv.org/pdf/2309.13095
- Abstract To determine the effectiveness of metaheuristic Differential Evolution optimization strategy for inventory management (IM) in the context of stochastic demand, this empirical study undertakes a thorough investigation. The primary objective is to discern the most effective strategy for minimizing inventory costs within the context of uncertain demand patterns. Inventory costs refer to the expenses associated with holding and managing inventory within a business. The approach combines a continuous review of IM policies with a Monte Carlo Simulation (MCS). To find the optimal solution, the study focuses on meta-heuristic approaches and compares multiple algorithms. The outcomes reveal that the Differential Evolution (DE) algorithm outperforms its counterparts in optimizing IM. To fine-tune the parameters, the study employs the Latin Hypercube Sampling (LHS) statistical method. To determine the final solution, a method is employed in this study which combines the outcomes of multiple independent DE optimizations, each initiated with different random initial conditions. This approach introduces a novel and promising dimension to the field of inventory management, offering potential enhancements in performance and cost efficiency, especially in the presence of stochastic demand patterns.
Lamarck's Revenge: Inheritance of Learned Traits Can Make Robot Evolution Better
- Authors: Authors: Jie Luo, Karine Miras, Jakub Tomczak, Agoston E. Eiben
- Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.13099
- Pdf link: https://arxiv.org/pdf/2309.13099
-
Abstract
Evolutionary robot systems offer two principal advantages: an advanced way of developing robots through evolutionary optimization and a special research platform to conduct what-if experiments regarding questions about evolution. Our study sits at the intersection of these. We investigate the question ``What if the 18th-century biologist Lamarck was not completely wrong and individual traits learned during a lifetime could be passed on to offspring through inheritance?'' We research this issue through simulations with an evolutionary robot framework where morphologies (bodies) and controllers (brains) of robots are evolvable and robots also can improve their controllers through learning during their lifetime. Within this framework, we compare a Lamarckian system, where learned bits of the brain are inheritable, with a Darwinian system, where they are not. Analyzing simulations based on these systems, we obtain new insights about Lamarckian evolution dynamics and the interaction between evolution and learning. Specifically, we show that Lamarckism amplifies the emergence of
morphological intelligence', the ability of a given robot body to acquire a good brain by learning, and identify the source of this success:
newborn' robots have a higher fitness because their inherited brains match their bodies better than those in a Darwinian system.
Multi-Agent Reach-Avoid Games: Two Attackers Versus One Defender and Mixed Integer Programming
- Authors: Authors: Hanyang Hu, Minh Bui, Mo Chen
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2309.13155
- Pdf link: https://arxiv.org/pdf/2309.13155
- Abstract We propose a hybrid approach that combines Hamilton-Jacobi (HJ) reachability and mixed-integer optimization for solving a reach-avoid game with multiple attackers and defenders. The reach-avoid game is an important problem with potential applications in air traffic control and multi-agent motion planning; however, solving this game for many attackers and defenders is intractable due to the adversarial nature of the agents and the high problem dimensionality. In this paper, we first propose an HJ reachability-based method for solving the reach-avoid game in which 2 attackers are playing against 1 defender; we derive the numerically convergent optimal winning sets for the two sides in environments with obstacles. Utilizing this result and previous results for the 1 vs. 1 game, we further propose solving the general multi-agent reach-avoid game by determining the defender assignments that can maximize the number of attackers captured via a Mixed Integer Program (MIP). Our method generalizes previous state-of-the-art results and is especially useful when there are fewer defenders than attackers. We validate our theoretical results in numerical simulations.
Cloudy Forecast: How Predictable is Communication Latency in the Cloud?
- Authors: Authors: Owen Hilyard, Bocheng Cui, Marielle Webster, Abishek Bangalore Muralikrishna, Aleksey Charapko
- Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
- Arxiv link: https://arxiv.org/abs/2309.13169
- Pdf link: https://arxiv.org/pdf/2309.13169
- Abstract Many systems and services rely on timing assumptions for performance and availability to perform critical aspects of their operation, such as various timeouts for failure detectors or optimizations to concurrency control mechanisms. Many such assumptions rely on the ability of different components to communicate on time -- a delay in communication may trigger the failure detector or cause the system to enter a less-optimized execution mode. Unfortunately, these timing assumptions are often set with little regard to actual communication guarantees of the underlying infrastructure -- in particular, the variability of communication delays between processes in different nodes/servers. The higher communication variability holds especially true for systems deployed in the public cloud since the cloud is a utility shared by many users and organizations, making it prone to higher performance variance due to noisy neighbor syndrome. In this work, we present Cloud Latency Tester (CLT), a simple tool that can help measure the variability of communication delays between nodes to help engineers set proper values for their timing assumptions. We also provide our observational analysis of running CLT in three major cloud providers and share the lessons we learned.
PAC-NMPC with Learned Perception-Informed Value Function
- Authors: Authors: Adam Polevoy, Mark Gonzales, Marin Kobilarov, Joseph Moore
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.13171
- Pdf link: https://arxiv.org/pdf/2309.13171
- Abstract Nonlinear model predictive control (NMPC) is typically restricted to short, finite horizons to limit the computational burden of online optimization. This makes a global planner necessary to avoid local minima when using NMPC for navigation in complex environments. For this reason, the performance of NMPC approaches are often limited by that of the global planner. While control policies trained with reinforcement learning (RL) can theoretically learn to avoid such local minima, they are usually unable to guarantee enforcement of general state constraints. In this paper, we augment a sampling-based stochastic NMPC (SNMPC) approach with an RL trained perception-informed value function. This allows the system to avoid observable local minima in the environment by reasoning about perception information beyond the finite planning horizon. By using Probably Approximately Correct NMPC (PAC-NMPC) as our base controller, we are also able to generate statistical guarantees of performance and safety. We demonstrate our approach in simulation and on hardware using a 1/10th scale rally car with lidar.
Walking-by-Logic: Signal Temporal Logic-Guided Model Predictive Control for Bipedal Locomotion Resilient to External Perturbations
- Authors: Authors: Zhaoyuan Gu, Rongming Guo, William Yates, Yipu Chen, Ye Zhao
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.13172
- Pdf link: https://arxiv.org/pdf/2309.13172
- Abstract This study proposes a novel planning framework based on a model predictive control formulation that incorporates signal temporal logic (STL) specifications for task completion guarantees and robustness quantification. This marks the first-ever study to apply STL-guided trajectory optimization for bipedal locomotion push recovery, where the robot experiences unexpected disturbances. Existing recovery strategies often struggle with complex task logic reasoning and locomotion robustness evaluation, making them susceptible to failures caused by inappropriate recovery strategies or insufficient robustness. To address this issue, the STL-guided framework generates optimal and safe recovery trajectories that simultaneously satisfy the task specification and maximize the locomotion robustness. Our framework outperforms a state-of-the-art locomotion controller in a high-fidelity dynamic simulation, especially in scenarios involving crossed-leg maneuvers. Furthermore, it demonstrates versatility in tasks such as locomotion on stepping stones, where the robot must select from a set of disjointed footholds to maneuver successfully.
Enhancing Multi-Objective Optimization through Machine Learning-Supported Multiphysics Simulation
- Authors: Authors: Diego Botache, Jens Decke, Winfried Ripken, Abhinay Dornipati, Franz Götz-Hahn, Mohamed Ayeb, Bernhard Sick
- Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2309.13179
- Pdf link: https://arxiv.org/pdf/2309.13179
- Abstract Multiphysics simulations that involve multiple coupled physical phenomena quickly become computationally expensive. This imposes challenges for practitioners aiming to find optimal configurations for these problems satisfying multiple objectives, as optimization algorithms often require querying the simulation many times. This paper presents a methodological framework for training, self-optimizing, and self-organizing surrogate models to approximate and speed up Multiphysics simulations. We generate two real-world tabular datasets, which we make publicly available, and show that surrogate models can be trained on relatively small amounts of data to approximate the underlying simulations accurately. We conduct extensive experiments combining four machine learning and deep learning algorithms with two optimization algorithms and a comprehensive evaluation strategy. Finally, we evaluate the performance of our combined training and optimization pipeline by verifying the generated Pareto-optimal results using the ground truth simulations. We also employ explainable AI techniques to analyse our surrogates and conduct a preselection strategy to determine the most relevant features in our real-world examples. This approach lets us understand the underlying problem and identify critical partial dependencies.
A Practical Survey on Zero-shot Prompt Design for In-context Learning
- Authors: Authors: Yinheng Li
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13205
- Pdf link: https://arxiv.org/pdf/2309.13205
- Abstract The remarkable advancements in large language models (LLMs) have brought about significant improvements in Natural Language Processing(NLP) tasks. This paper presents a comprehensive review of in-context learning techniques, focusing on different types of prompts, including discrete, continuous, few-shot, and zero-shot, and their impact on LLM performance. We explore various approaches to prompt design, such as manual design, optimization algorithms, and evaluation methods, to optimize LLM performance across diverse tasks. Our review covers key research studies in prompt engineering, discussing their methodologies and contributions to the field. We also delve into the challenges faced in evaluating prompt performance, given the absence of a single "best" prompt and the importance of considering multiple metrics. In conclusion, the paper highlights the critical role of prompt design in harnessing the full potential of LLMs and provides insights into the combination of manual design, optimization techniques, and rigorous evaluation for more effective and efficient use of LLMs in various NLP tasks.
Poster: Self-Supervised Quantization-Aware Knowledge Distillation
- Authors: Authors: Kaiqi Zhao, Ming Zhao
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.13220
- Pdf link: https://arxiv.org/pdf/2309.13220
- Abstract Quantization-aware training (QAT) starts with a pre-trained full-precision model and performs quantization during retraining. However, existing QAT works require supervision from the labels and they suffer from accuracy loss due to reduced precision. To address these limitations, this paper proposes a novel Self-Supervised Quantization-Aware Knowledge Distillation framework (SQAKD). SQAKD first unifies the forward and backward dynamics of various quantization functions and then reframes QAT as a co-optimization problem that simultaneously minimizes the KL-Loss and the discretization error, in a self-supervised manner. The evaluation shows that SQAKD significantly improves the performance of various state-of-the-art QAT works. SQAKD establishes stronger baselines and does not require extensive labeled training data, potentially making state-of-the-art QAT research more accessible.
Automatic Reverse Engineering: Creating computer-aided design (CAD) models from multi-view images
- Authors: Authors: Henrik Jobczyk, Hanno Homann
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2309.13281
- Pdf link: https://arxiv.org/pdf/2309.13281
- Abstract Generation of computer-aided design (CAD) models from multi-view images may be useful in many practical applications. To date, this problem is usually solved with an intermediate point-cloud reconstruction and involves manual work to create the final CAD models. In this contribution, we present a novel network for an automated reverse engineering task. Our network architecture combines three distinct stages: A convolutional neural network as the encoder stage, a multi-view pooling stage and a transformer-based CAD sequence generator. The model is trained and evaluated on a large number of simulated input images and extensive optimization of model architectures and hyper-parameters is performed. A proof-of-concept is demonstrated by successfully reconstructing a number of valid CAD models from simulated test image data. Various accuracy metrics are calculated and compared to a state-of-the-art point-based network. Finally, a real world test is conducted supplying the network with actual photographs of two three-dimensional test objects. It is shown that some of the capabilities of our network can be transferred to this domain, even though the training exclusively incorporates purely synthetic training data. However to date, the feasible model complexity is still limited to basic shapes.
Sewage Discharging in a Line: Global Optimization and Grand Cooperation
- Authors: Authors: Xucheng Liu, Lindong Liu, Yifu Li, Anran Li
- Subjects: Computer Science and Game Theory (cs.GT)
- Arxiv link: https://arxiv.org/abs/2309.13300
- Pdf link: https://arxiv.org/pdf/2309.13300
- Abstract Players cooperating in a line is a special while essential phenomenon in real life collaborating activities such as assembly line production, pipeline supply chain management and other streamlining operational settings. In this paper, we study the scenario of cooperative sewage discharge with multiple participants positioning in a line along a river such that the optimization decision and cooperation strategy are mutually affected by both upstream and downstream players. We make three main contributions accordingly: Firstly, we formalize the sewage discharge problem (SDP) for different groups of players, and use greedy strategy and dynamic programming to design the optimal algorithms to solve the SDP in polynomial time. Secondly, we show that the cooperative game defined on sewage discharge problem, referred to as SDG, has a non-empty core due to its special line-positioning structure. Therefore, a grand stable cooperation is guaranteed. Furthermore, inspired by the fact that the SDG is core non-empty while non-convex, we successfully identify a relaxed concept of convexity -- directional-convexity, which can also serve as a sufficient condition for a cooperative game having a non-empty core.
C$^2$VAE: Gaussian Copula-based VAE Differing Disentangled from Coupled Representations with Contrastive Posterior
- Authors: Authors: Zhangkai Wu, Longbing Cao
- Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2309.13303
- Pdf link: https://arxiv.org/pdf/2309.13303
- Abstract We present a self-supervised variational autoencoder (VAE) to jointly learn disentangled and dependent hidden factors and then enhance disentangled representation learning by a self-supervised classifier to eliminate coupled representations in a contrastive manner. To this end, a Contrastive Copula VAE (C$^2$VAE) is introduced without relying on prior knowledge about data in the probabilistic principle and involving strong modeling assumptions on the posterior in the neural architecture. C$^2$VAE simultaneously factorizes the posterior (evidence lower bound, ELBO) with total correlation (TC)-driven decomposition for learning factorized disentangled representations and extracts the dependencies between hidden features by a neural Gaussian copula for copula coupled representations. Then, a self-supervised contrastive classifier differentiates the disentangled representations from the coupled representations, where a contrastive loss regularizes this contrastive classification together with the TC loss for eliminating entangled factors and strengthening disentangled representations. C$^2$VAE demonstrates a strong effect in enhancing disentangled representation learning. C$^2$VAE further contributes to improved optimization addressing the TC-based VAE instability and the trade-off between reconstruction and representation.
CORE: Common Random Reconstruction for Distributed Optimization with Provable Low Communication Complexity
- Authors: Authors: Pengyun Yue, Hanzhen Zhao, Cong Fang, Di He, Liwei Wang, Zhouchen Lin, Song-chun Zhu
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13307
- Pdf link: https://arxiv.org/pdf/2309.13307
- Abstract With distributed machine learning being a prominent technique for large-scale machine learning tasks, communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers. In this paper, we propose a new technique named Common randOm REconstruction(CORE), which can be used to compress the information transmitted between machines in order to reduce communication complexity without other strict conditions. Especially, our technique CORE projects the vector-valued information to a low-dimensional one through common random vectors and reconstructs the information with the same random noises after communication. We apply CORE to two distributed tasks, respectively convex optimization on linear models and generic non-convex optimization, and design new distributed algorithms, which achieve provably lower communication complexities. For example, we show for linear models CORE-based algorithm can encode the gradient vector to $\mathcal{O}(1)$-bits (against $\mathcal{O}(d)$), with the convergence rate not worse, preceding the existing results.
Joint Explainability and Sensitivity-Aware Federated Deep Learning for Transparent 6G RAN Slicing
- Authors: Authors: Swastika Roy, Farhad Rezazadeh, Hatim Chergui, Christos Verikoukis
- Subjects: Networking and Internet Architecture (cs.NI)
- Arxiv link: https://arxiv.org/abs/2309.13325
- Pdf link: https://arxiv.org/pdf/2309.13325
- Abstract In recent years, wireless networks are evolving complex, which upsurges the use of zero-touch artificial intelligence (AI)-driven network automation within the telecommunication industry. In particular, network slicing, the most promising technology beyond 5G, would embrace AI models to manage the complex communication network. Besides, it is also essential to build the trustworthiness of the AI black boxes in actual deployment when AI makes complex resource management and anomaly detection. Inspired by closed-loop automation and Explainable Artificial intelligence (XAI), we design an Explainable Federated deep learning (FDL) model to predict per-slice RAN dropped traffic probability while jointly considering the sensitivity and explainability-aware metrics as constraints in such non-IID setup. In precise, we quantitatively validate the faithfulness of the explanations via the so-called attribution-based \emph{log-odds metric} that is included as a constraint in the run-time FL optimization task. Simulation results confirm its superiority over an unconstrained integrated-gradient (IG) \emph{post-hoc} FDL baseline.
Speeding-up Evolutionary Algorithms to solve Black-Box Optimization Problems
- Authors: Authors: Judith Echevarrieta, Etor Arza, Aritz Pérez
- Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2309.13349
- Pdf link: https://arxiv.org/pdf/2309.13349
- Abstract Population-based evolutionary algorithms are often considered when approaching computationally expensive black-box optimization problems. They employ a selection mechanism to choose the best solutions from a given population after comparing their objective values, which are then used to generate the next population. This iterative process explores the solution space efficiently, leading to improved solutions over time. However, these algorithms require a large number of evaluations to provide a quality solution, which might be computationally expensive when the evaluation cost is high. In some cases, it is possible to replace the original objective function with a less accurate approximation of lower cost. This introduces a trade-off between the evaluation cost and its accuracy. In this paper, we propose a technique capable of choosing an appropriate approximate function cost during the execution of the optimization algorithm. The proposal finds the minimum evaluation cost at which the solutions are still properly ranked, and consequently, more evaluations can be computed in the same amount of time with minimal accuracy loss. An experimental section on four very different problems reveals that the proposed approach can reach the same objective value in less than half of the time in certain cases.
MLPST: MLP is All You Need for Spatio-Temporal Prediction
- Authors: Authors: Zijian Zhang, Ze Huang, Zhiwei Hu, Xiangyu Zhao, Wanyu Wang, Zitao Liu, Junbo Zhang, S. Joe Qin, Hongwei Zhao
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.13363
- Pdf link: https://arxiv.org/pdf/2309.13363
- Abstract Traffic prediction is a typical spatio-temporal data mining task and has great significance to the public transportation system. Considering the demand for its grand application, we recognize key factors for an ideal spatio-temporal prediction method: efficient, lightweight, and effective. However, the current deep model-based spatio-temporal prediction solutions generally own intricate architectures with cumbersome optimization, which can hardly meet these expectations. To accomplish the above goals, we propose an intuitive and novel framework, MLPST, a pure multi-layer perceptron architecture for traffic prediction. Specifically, we first capture spatial relationships from both local and global receptive fields. Then, temporal dependencies in different intervals are comprehensively considered. Through compact and swift MLP processing, MLPST can well capture the spatial and temporal dependencies while requiring only linear computational complexity, as well as model parameters that are more than an order of magnitude lower than baselines. Extensive experiments validated the superior effectiveness and efficiency of MLPST against advanced baselines, and among models with optimal accuracy, MLPST achieves the best time and space efficiency.
Moving Target Defense based Secured Network Slicing System in the O-RAN Architecture
- Authors: Authors: Mojdeh Karbalaee Motalleb, Chafika Benzaïd, Tarik Taleb, Vahid Shah-Mansouri
- Subjects: Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2309.13444
- Pdf link: https://arxiv.org/pdf/2309.13444
- Abstract The open radio access network (O-RAN) architecture's native virtualization and embedded intelligence facilitate RAN slicing and enable comprehensive end-to-end services in post-5G networks. However, any vulnerabilities could harm security. Therefore, artificial intelligence (AI) and machine learning (ML) security threats can even threaten O-RAN benefits. This paper proposes a novel approach to estimating the optimal number of predefined VNFs for each slice while addressing secure AI/ML methods for dynamic service admission control and power minimization in the O-RAN architecture. We solve this problem on two-time scales using mathematical methods for determining the predefined number of VNFs on a large time scale and the proximal policy optimization (PPO), a Deep Reinforcement Learning algorithm, for solving dynamic service admission control and power minimization for different slices on a small-time scale. To secure the ML system for O-RAN, we implement a moving target defense (MTD) strategy to prevent poisoning attacks by adding uncertainty to the system. Our experimental results show that the proposed PPO-based service admission control approach achieves an admission rate above 80% and that the MTD strategy effectively strengthens the robustness of the PPO method against adversarial attacks.
AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming
- Authors: Authors: Siva Satyendra Sahoo, Salim Ullah, Akash Kumar
- Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2309.13445
- Pdf link: https://arxiv.org/pdf/2309.13445
- Abstract With the increasing application of machine learning (ML) algorithms in embedded systems, there is a rising necessity to design low-cost computer arithmetic for these resource-constrained systems. As a result, emerging models of computation, such as approximate and stochastic computing, that leverage the inherent error-resilience of such algorithms are being actively explored for implementing ML inference on resource-constrained systems. Approximate computing (AxC) aims to provide disproportionate gains in the power, performance, and area (PPA) of an application by allowing some level of reduction in its behavioral accuracy (BEHAV). Using approximate operators (AxOs) for computer arithmetic forms one of the more prevalent methods of implementing AxC. AxOs provide the additional scope for finer granularity of optimization, compared to only precision scaling of computer arithmetic. To this end, designing platform-specific and cost-efficient approximate operators forms an important research goal. Recently, multiple works have reported using AI/ML-based approaches for synthesizing novel FPGA-based AxOs. However, most of such works limit usage of AI/ML to designing ML-based surrogate functions used during iterative optimization processes. To this end, we propose a novel data analysis-driven mathematical programming-based approach to synthesizing approximate operators for FPGAs. Specifically, we formulate mixed integer quadratically constrained programs based on the results of correlation analysis of the characterization data and use the solutions to enable a more directed search approach for evolutionary optimization algorithms. Compared to traditional evolutionary algorithms-based optimization, we report up to 21% improvement in the hypervolume, for joint optimization of PPA and BEHAV, in the design of signed 8-bit multipliers.
Communication-Aware Map Compression for Online Path-Planning
- Authors: Authors: Evangelos Psomiadis, Dipankar Maity, Panagiotis Tsiotras
- Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
- Arxiv link: https://arxiv.org/abs/2309.13451
- Pdf link: https://arxiv.org/pdf/2309.13451
- Abstract This paper addresses the problem of the communication of optimally compressed information for mobile robot path-planning. In this context, mobile robots compress their current local maps to assist another robot in reaching a target in an unknown environment. We propose a framework that sequentially selects the optimal compression, guided by the robot's path, by balancing the map resolution and communication cost. Our approach is tractable in close-to-real scenarios and does not necessitate prior environment knowledge. We design a novel decoder that leverages compressed information to estimate the unknown environment via convex optimization with linear constraints and an encoder that utilizes the decoder to select the optimal compression. Numerical simulations are conducted in a large close-to-real map and a maze map and compared with two alternative approaches. The results confirm the effectiveness of our framework in assisting the robot reach its target by reducing transmitted information, on average, by approximately 50% while maintaining satisfactory performance.
An Optimal Control Framework for Influencing Human Driving Behavior in Mixed-Autonomy Traffic
- Authors: Authors: Anirudh Chari, Rui Chen, Jaskaran Grover, Changliu Liu
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2309.13456
- Pdf link: https://arxiv.org/pdf/2309.13456
- Abstract As autonomous vehicles (AVs) become increasingly prevalent, their interaction with human drivers presents a critical challenge. Current AVs lack social awareness, causing behavior that is often awkward or unsafe. To combat this, social AVs, which are proactive rather than reactive in their behavior, have been explored in recent years. With knowledge of robot-human interaction dynamics, a social AV can influence a human driver to exhibit desired behaviors by strategically altering its own behaviors. In this paper, we present a novel framework for achieving human influence. The foundation of our framework lies in an innovative use of control barrier functions to formulate the desired objectives of influence as constraints in an optimal control problem. The computed controls gradually push the system state toward satisfaction of the objectives, e.g. slowing the human down to some desired speed. We demonstrate the proposed framework's feasibility in a variety of scenarios related to car-following and lane changes, including multi-robot and multi-human configurations. In two case studies, we validate the framework's effectiveness when applied to the problems of traffic flow optimization and aggressive behavior mitigation. Given these results, the main contribution of our framework is its versatility in a wide spectrum of influence objectives and mixed-autonomy configurations.
A Unified Scheme of ResNet and Softmax
- Authors: Authors: Zhao Song, Weixin Wang, Junze Yin
- Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2309.13482
- Pdf link: https://arxiv.org/pdf/2309.13482
- Abstract Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical components supporting the functionality of LLMs but also are related to many other machine learning and theoretical computer science fields, including but not limited to image classification, object detection, semantic segmentation, and tensors. Previous research works studied these two concepts separately. In this paper, we provide a theoretical analysis of the regression problem: $| \langle \exp(Ax) + A x , {\bf 1}_n \rangle^{-1} ( \exp(Ax) + Ax ) - b |_2^2$, where $A$ is a matrix in $\mathbb{R}^{n \times d}$, $b$ is a vector in $\mathbb{R}^n$, and ${\bf 1}_n$ is the $n$-dimensional vector whose entries are all $1$. This regression problem is a unified scheme that combines softmax regression and ResNet, which has never been done before. We derive the gradient, Hessian, and Lipschitz properties of the loss function. The Hessian is shown to be positive semidefinite, and its structure is characterized as the sum of a low-rank matrix and a diagonal matrix. This enables an efficient approximate Newton method. As a result, this unified scheme helps to connect two previously thought unrelated fields and provides novel insight into loss landscape and optimization for emerging over-parameterized neural networks, which is meaningful for future research in deep learning models.
Iterative Reachability Estimation for Safe Reinforcement Learning
- Authors: Authors: Milan Ganai, Zheng Gong, Chenning Yu, Sylvia Herbert, Sicun Gao
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.13528
- Pdf link: https://arxiv.org/pdf/2309.13528
- Abstract Ensuring safety is important for the practical deployment of reinforcement learning (RL). Various challenges must be addressed, such as handling stochasticity in the environments, providing rigorous guarantees of persistent state-wise safety satisfaction, and avoiding overly conservative behaviors that sacrifice performance. We propose a new framework, Reachability Estimation for Safe Policy Optimization (RESPO), for safety-constrained RL in general stochastic settings. In the feasible set where there exist violation-free policies, we optimize for rewards while maintaining persistent safety. Outside this feasible set, our optimization produces the safest behavior by guaranteeing entrance into the feasible set whenever possible with the least cumulative discounted violations. We introduce a class of algorithms using our novel reachability estimation function to optimize in our proposed framework and in similar frameworks such as those concurrently handling multiple hard and soft constraints. We theoretically establish that our algorithms almost surely converge to locally optimal policies of our safe optimization framework. We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, PyBullet, and MuJoCo, and show the benefits in improving both reward performance and safety compared with state-of-the-art baselines.
Task-Oriented Dexterous Grasp Synthesis via Differentiable Grasp Wrench Boundary Estimator
- Authors: Authors: Jiayi Chen, Yuxing Chen, Jialiang Zhang, He Wang
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.13586
- Pdf link: https://arxiv.org/pdf/2309.13586
- Abstract Analytical dexterous grasping synthesis is often driven by grasp quality metrics. However, existing metrics possess many problems, such as being computationally expensive, physically inaccurate, and non-differentiable. Moreover, none of them can facilitate the synthesis of non-force-closure grasps, which account for a significant portion of task-oriented grasping such as lid screwing and button pushing. The main challenge behind all the above drawbacks is the difficulty in modeling the complex Grasp Wrench Space (GWS). In this work, we overcome this challenge by proposing a novel GWS estimator, thus enabling gradient-based task-oriented dexterous grasp synthesis for the first time. Our key contribution is a fast, accurate, and differentiable technique to estimate the GWS boundary with good physical interpretability by parallel sampling and mapping, which does not require iterative optimization. Second, based on our differentiable GWS estimator, we derive a task-oriented energy function to enable gradient-based grasp synthesis and a metric to evaluate non-force-closure grasps. Finally, we improve the previous dexterous grasp synthesis pipeline mainly by a novel technique to make nearest-point calculation differentiable, even on mesh edges and vertices. Extensive experiments are performed to verify the efficiency and effectiveness of our methods. Our GWS estimator can run in several milliseconds on GPUs with minimal memory cost, more than three orders of magnitude faster than the classic discretization-based method. Using this GWS estimator, we synthesize 0.1 million dexterous grasps to show that our pipeline can significantly outperform the SOTA method, even in task-unaware force-closure-grasp synthesis. For task-oriented grasp synthesis, we provide some qualitative results.
Shape Optimization by Constrained First-Order Least Mean Approximation
- Authors: Authors: Gerhard Starke
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2309.13595
- Pdf link: https://arxiv.org/pdf/2309.13595
- Abstract In this work, the problem of shape optimization, subject to PDE constraints, is reformulated as an $L^p$ best approximation problem under divergence constraints to the shape tensor introduced in Laurain and Sturm: ESAIM Math. Model. Numer. Anal. 50 (2016). More precisely, the main result of this paper states that the $L^p$ distance of the above approximation problem is equal to the dual norm of the shape derivative considered as a functional on $W^{1,p^\ast}$ (where $1/p + 1/p^\ast = 1$). This implies that for any given shape, one can evaluate its distance from being a stationary one with respect to the shape derivative by simply solving the associated $L^p$-type least mean approximation problem. Moreover, the Lagrange multiplier for the divergence constraint turns out to be the shape deformation of steepest descent. This provides a way, as an alternative to the approach by Deckelnick, Herbert and Hinze: ESAIM Control Optim. Calc. Var. 28 (2022), for computing shape gradients in $W^{1,p^\ast}$ for $p^\ast \in ( 2 , \infty )$. The discretization of the least mean approximation problem is done with (lowest-order) matrix-valued Raviart-Thomas finite element spaces leading to piecewise constant approximations of the shape deformation acting as Lagrange multiplier. Admissible deformations in $W^{1,p^\ast}$ to be used in a shape gradient iteration are reconstructed locally. Our computational results confirm that the $L^p$ distance of the best approximation does indeed measure the distance of the considered shape to optimality. Also confirmed by our computational tests are the observations that choosing $p^\ast$ (much) larger than 2 (which means that $p$ must be close to 1 in our best approximation problem) decreases the chance of encountering mesh degeneracy during the shape gradient iteration.
From Cluster Assumption to Graph Convolution: Graph-based Semi-Supervised Learning Revisited
- Authors: Authors: Zheng Wang, Hongming Ding, Li Pan, Jianhua Li, Zhiguo Gong, Philip S. Yu
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.13599
- Pdf link: https://arxiv.org/pdf/2309.13599
- Abstract Graph-based semi-supervised learning (GSSL) has long been a hot research topic. Traditional methods are generally shallow learners, based on the cluster assumption. Recently, graph convolutional networks (GCNs) have become the predominant techniques for their promising performance. In this paper, we theoretically discuss the relationship between these two types of methods in a unified optimization framework. One of the most intriguing findings is that, unlike traditional ones, typical GCNs may not jointly consider the graph structure and label information at each layer. Motivated by this, we further propose three simple but powerful graph convolution methods. The first is a supervised method OGC which guides the graph convolution process with labels. The others are two unsupervised methods: GGC and its multi-scale version GGCM, both aiming to preserve the graph structure information during the convolution process. Finally, we conduct extensive experiments to show the effectiveness of our methods.
Reinforcement-Enhanced Autoregressive Feature Transformation: Gradient-steered Search in Continuous Space for Postfix Expressions
- Authors: Authors: Dongjie Wang, Meng Xiao, Min Wu, Pengfei Wang, Yuanchun Zhou, Yanjie Fu
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13618
- Pdf link: https://arxiv.org/pdf/2309.13618
- Abstract Feature transformation aims to generate new pattern-discriminative feature space from original features to improve downstream machine learning (ML) task performances. However, the discrete search space for the optimal feature explosively grows on the basis of combinations of features and operations from low-order forms to high-order forms. Existing methods, such as exhaustive search, expansion reduction, evolutionary algorithms, reinforcement learning, and iterative greedy, suffer from large search space. Overly emphasizing efficiency in algorithm design usually sacrifices stability or robustness. To fundamentally fill this gap, we reformulate discrete feature transformation as a continuous space optimization task and develop an embedding-optimization-reconstruction framework. This framework includes four steps: 1) reinforcement-enhanced data preparation, aiming to prepare high-quality transformation-accuracy training data; 2) feature transformation operation sequence embedding, intending to encapsulate the knowledge of prepared training data within a continuous space; 3) gradient-steered optimal embedding search, dedicating to uncover potentially superior embeddings within the learned space; 4) transformation operation sequence reconstruction, striving to reproduce the feature transformation solution to pinpoint the optimal feature space.
Neural Network-PSO-based Velocity Control Algorithm for Landing UAVs on a Boat
- Authors: Authors: Li-Fan Wu, Zihan Wang, Mo Rastgaar, Nina Mahmoudian
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.13679
- Pdf link: https://arxiv.org/pdf/2309.13679
- Abstract Precise landing of Unmanned Aerial Vehicles (UAVs) onto moving platforms like Autonomous Surface Vehicles (ASVs) is both important and challenging, especially in GPS-denied environments, for collaborative navigation of heterogeneous vehicles. UAVs need to land within a confined space onboard ASV to get energy replenishment, while ASV is subject to translational and rotational disturbances due to wind and water flow. Current solutions either rely on high-level waypoint navigation, which struggles to robustly land on varied-speed targets, or necessitate laborious manual tuning of controller parameters, and expensive sensors for target localization. Therefore, we propose an adaptive velocity control algorithm that leverages Particle Swarm Optimization (PSO) and Neural Network (NN) to optimize PID parameters across varying flight altitudes and distinct speeds of a moving boat. The cost function of PSO includes the status change rates of UAV and proximity to the target. The NN further interpolates the PSO-founded PID parameters. The proposed method implemented on a water strider hexacopter design, not only ensures accuracy but also increases robustness. Moreover, this NN-PSO can be readily adapted to suit various mission requirements. Its ability to achieve precise landings extends its applicability to scenarios, including but not limited to rescue missions, package deliveries, and workspace inspections.
GHN-QAT: Training Graph Hypernetworks to Predict Quantization-Robust Parameters of Unseen Limited Precision Neural Networks
- Authors: Authors: Stone Yun, Alexander Wong
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2309.13773
- Pdf link: https://arxiv.org/pdf/2309.13773
- Abstract Graph Hypernetworks (GHN) can predict the parameters of varying unseen CNN architectures with surprisingly good accuracy at a fraction of the cost of iterative optimization. Following these successes, preliminary research has explored the use of GHNs to predict quantization-robust parameters for 8-bit and 4-bit quantized CNNs. However, this early work leveraged full-precision float32 training and only quantized for testing. We explore the impact of quantization-aware training and/or other quantization-based training strategies on quantized robustness and performance of GHN predicted parameters for low-precision CNNs. We show that quantization-aware training can significantly improve quantized accuracy for GHN predicted parameters of 4-bit quantized CNNs and even lead to greater-than-random accuracy for 2-bit quantized CNNs. These promising results open the door for future explorations such as investigating the use of GHN predicted parameters as initialization for further quantized training of individual CNNs, further exploration of "extreme bitwidth" quantization, and mixed precision quantization schemes.
On the Energy Efficiency of THz-NOMA enhanced UAV Cooperative Network with SWIPT
- Authors: Authors: Jalal Jalali, Ata Khalili, Hina Tabassum, Rafael Berkvens, Jeroen Famaey, Walid Saad
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2309.13836
- Pdf link: https://arxiv.org/pdf/2309.13836
- Abstract This paper considers the energy efficiency (EE) maximization of a simultaneous wireless information and power transfer (SWIPT)-assisted unmanned aerial vehicles (UAV) cooperative network operating at TeraHertz (THz) frequencies. The source performs SWIPT enabling the UAV to receive both power and information while also transmitting the information to a designated destination node. Subsequently, the UAV utilizes the harvested energy to relay the data to the intended destination node effectively. Specifically, we maximize EE by optimizing the non-orthogonal multiple access (NOMA) power allocation coefficients, SWIPT power splitting (PS) ratio, and UAV trajectory. The main problem is broken down into a two-stage optimization problem and solved using an alternating optimization approach. In the first stage, optimization of the PS ratio and trajectory is performed by employing successive convex approximation using a lower bound on the exponential factor in the THz channel model. In the second phase, the NOMA power coefficients are optimized using a quadratic transform approach. Numerical results demonstrate the effectiveness of our proposed resource allocation algorithm compared to the baselines where there is no trajectory optimization or no NOMA power or PS optimization.
Backorder Prediction in Inventory Management: Classification Techniques and Cost Considerations
- Authors: Authors: Sarit Maitra, Sukanya Kundu
- Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)
- Arxiv link: https://arxiv.org/abs/2309.13837
- Pdf link: https://arxiv.org/pdf/2309.13837
- Abstract This article introduces an advanced analytical approach for predicting backorders in inventory management. Backorder refers to an order that cannot be immediately fulfilled due to stock depletion. Multiple classification techniques, including Balanced Bagging Classifiers, Fuzzy Logic, Variational Autoencoder - Generative Adversarial Networks, and Multi-layer Perceptron classifiers, are assessed in this work using performance evaluation metrics such as ROC-AUC and PR-AUC. Moreover, this work incorporates a profit function and misclassification costs, considering the financial implications and costs associated with inventory management and backorder handling. The results demonstrate the effectiveness of the predictive model in enhancing inventory system service levels, which leads to customer satisfaction and overall organizational performance. Considering interpretability is a significant aspect of using AI in commercial applications, permutation importance is applied to the selected model to determine the importance of features. This research contributes to the advancement of predictive analytics and offers valuable insights for future investigations in backorder forecasting and inventory control optimization for decision-making.
A Ferroelectric Compute-in-Memory Annealer for Combinatorial Optimization Problems
- Authors: Authors: Xunzhao Yin, Yu Qian, Alptekin Vardar, Marcel Gunther, Franz Muller, Nellie Laleni, Zijian Zhao, Zhouhang Jiang, Zhiguo Shi, Yiyu Shi, Xiao Gong, Cheng Zhuo, Thomas Kampfe, Kai Ni
- Subjects: Emerging Technologies (cs.ET)
- Arxiv link: https://arxiv.org/abs/2309.13853
- Pdf link: https://arxiv.org/pdf/2309.13853
- Abstract Computationally hard combinatorial optimization problems (COPs) are ubiquitous in many applications, including logistical planning, resource allocation, chip design, drug explorations, and more. Due to their critical significance and the inability of conventional hardware in efficiently handling scaled COPs, there is a growing interest in developing computing hardware tailored specifically for COPs, including digital annealers, dynamical Ising machines, and quantum/photonic systems. However, significant hurdles still remain, such as the memory access issue, the system scalability and restricted applicability to certain types of COPs, and VLSI-incompatibility, respectively. Here, a ferroelectric field effect transistor (FeFET) based compute-in-memory (CiM) annealer is proposed. After converting COPs into quadratic unconstrained binary optimization (QUBO) formulations, a hardware-algorithm co-design is conducted, yielding an energy-efficient, versatile, and scalable hardware for COPs. To accelerate the core vector-matrix-vector (VMV) multiplication of QUBO formulations, a FeFET based CiM array is exploited, which can accelerate the intended operation in-situ due to its unique three-terminal structure. In particular, a lossless compression technique is proposed to prune typically sparse QUBO matrix to reduce hardware cost. Furthermore, a multi-epoch simulated annealing (MESA) algorithm is proposed to replace conventional simulated annealing for its faster convergence and better solution quality. The effectiveness of the proposed techniques is validated through the utilization of developed chip prototypes for successfully solving graph coloring problem, indicating great promise of FeFET CiM annealer in solving general COPs.
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
- Authors: Authors: Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- Arxiv link: https://arxiv.org/abs/2309.13860
- Pdf link: https://arxiv.org/pdf/2309.13860
- Abstract Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks. Various speech-based SSL models have been developed and present promising performance on a range of downstream tasks including speech recognition. However, existing speech-based SSL models face a common dilemma in terms of computational cost, which might hinder their potential application and in-depth academic research. To address this issue, we first analyze the computational cost of different modules during HuBERT pre-training and then introduce a stack of efficiency optimizations, which is named Fast-HuBERT in this paper. The proposed Fast-HuBERT can be trained in 1.1 days with 8 V100 GPUs on the Librispeech 960h benchmark, without performance degradation, resulting in a 5.2x speedup, compared to the original implementation. Moreover, we explore two well-studied techniques in the Fast-HuBERT and demonstrate consistent improvements as reported in previous work.
Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
- Authors: Authors: Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
- Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2309.13915
- Pdf link: https://arxiv.org/pdf/2309.13915
- Abstract Policy-based algorithms equipped with deep neural networks have achieved great success in solving high-dimensional policy optimization problems in reinforcement learning. However, current analyses cannot explain why they are resistant to the curse of dimensionality. In this work, we study the sample complexity of the neural policy mirror descent (NPMD) algorithm with convolutional neural networks (CNN) as function approximators. Motivated by the empirical observation that many high-dimensional environments have state spaces possessing low-dimensional structures, such as those taking images as states, we consider the state space to be a $d$-dimensional manifold embedded in the $D$-dimensional Euclidean space with intrinsic dimension $d\ll D$. We show that in each iteration of NPMD, both the value function and the policy can be well approximated by CNNs. The approximation errors are controlled by the size of the networks, and the smoothness of the previous networks can be inherited. As a result, by properly choosing the network size and hyperparameters, NPMD can find an $\epsilon$-optimal policy with $\widetilde{O}(\epsilon^{-\frac{d}{\alpha}-2})$ samples in expectation, where $\alpha\in(0,1]$ indicates the smoothness of environment. Compared to previous work, our result exhibits that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality, providing an explanation for the efficacy of deep policy-based algorithms.
Newton Method-based Subspace Support Vector Data Description
- Authors: Authors: Fahad Sohrab, Firas Laakom, Moncef Gabbouj
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13960
- Pdf link: https://arxiv.org/pdf/2309.13960
- Abstract In this paper, we present an adaptation of Newton's method for the optimization of Subspace Support Vector Data Description (S-SVDD). The objective of S-SVDD is to map the original data to a subspace optimized for one-class classification, and the iterative optimization process of data mapping and description in S-SVDD relies on gradient descent. However, gradient descent only utilizes first-order information, which may lead to suboptimal results. To address this limitation, we leverage Newton's method to enhance data mapping and data description for an improved optimization of subspace learning-based one-class classification. By incorporating this auxiliary information, Newton's method offers a more efficient strategy for subspace learning in one-class classification as compared to gradient-based optimization. The paper discusses the limitations of gradient descent and the advantages of using Newton's method in subspace learning for one-class classification tasks. We provide both linear and nonlinear formulations of Newton's method-based optimization for S-SVDD. In our experiments, we explored both the minimization and maximization strategies of the objective. The results demonstrate that the proposed optimization strategy outperforms the gradient-based S-SVDD in most cases.
Physics-Driven ML-Based Modelling for Correcting Inverse Estimation
- Authors: Authors: Ruiyuan Kang, Tingting Mu, Panos Liatsis, Dimitrios C. Kyritsis
- Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
- Arxiv link: https://arxiv.org/abs/2309.13985
- Pdf link: https://arxiv.org/pdf/2309.13985
- Abstract When deploying machine learning estimators in science and engineering (SAE) domains, it is critical to avoid failed estimations that can have disastrous consequences, e.g., in aero engine design. This work focuses on detecting and correcting failed state estimations before adopting them in SAE inverse problems, by utilizing simulations and performance metrics guided by physical laws. We suggest to flag a machine learning estimation when its physical model error exceeds a feasible threshold, and propose a novel approach, GEESE, to correct it through optimization, aiming at delivering both low error and high efficiency. The key designs of GEESE include (1) a hybrid surrogate error model to provide fast error estimations to reduce simulation cost and to enable gradient based backpropagation of error feedback, and (2) two generative models to approximate the probability distributions of the candidate states for simulating the exploitation and exploration behaviours. All three models are constructed as neural networks. GEESE is tested on three real-world SAE inverse problems and compared to a number of state-of-the-art optimization/search approaches. Results show that it fails the least number of times in terms of finding a feasible state correction, and requires physical evaluations less frequently in general.
DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization
- Authors: Authors: Haoran Ye, Jiarui Wang, Zhiguang Cao, Helan Liang, Yong Li
- Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.14032
- Pdf link: https://arxiv.org/pdf/2309.14032
- Abstract Ant Colony Optimization (ACO) is a meta-heuristic algorithm that has been successfully applied to various Combinatorial Optimization Problems (COPs). Traditionally, customizing ACO for a specific problem requires the expert design of knowledge-driven heuristics. In this paper, we propose DeepACO, a generic framework that leverages deep reinforcement learning to automate heuristic designs. DeepACO serves to strengthen the heuristic measures of existing ACO algorithms and dispense with laborious manual design in future ACO applications. As a neural-enhanced meta-heuristic, DeepACO consistently outperforms its ACO counterparts on eight COPs using a single neural model and a single set of hyperparameters. As a Neural Combinatorial Optimization method, DeepACO performs better than or on par with problem-specific methods on canonical routing problems. Our code is publicly available at https://github.com/henry-yeh/DeepACO.
Tracking Control for a Spherical Pendulum via Curriculum Reinforcement Learning
- Authors: Authors: Pascal Klink, Florian Wolf, Kai Ploeger, Jan Peters, Joni Pajarinen
- Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.14096
- Pdf link: https://arxiv.org/pdf/2309.14096
- Abstract Reinforcement Learning (RL) allows learning non-trivial robot control laws purely from data. However, many successful applications of RL have relied on ad-hoc regularizations, such as hand-crafted curricula, to regularize the learning performance. In this paper, we pair a recent algorithm for automatically building curricula with RL on massively parallelized simulations to learn a tracking controller for a spherical pendulum on a robotic arm via RL. Through an improved optimization scheme that better respects the non-Euclidean task structure, we allow the method to reliably generate curricula of trajectories to be tracked, resulting in faster and more robust learning compared to an RL baseline that does not exploit this form of structured learning. The learned policy matches the performance of an optimal control baseline on the real system, demonstrating the potential of curriculum RL to jointly learn state estimation and control for non-linear tracking tasks.
An optimized quantum minimum searching algorithm with sure-success probability and its experiment simulation with Cirq
- Authors: Authors: Wenjie Liu, Qingshan Wu, Jiahao Shen, Jiaojiao Zhao, Mohammed Zidan, Lian Tong
- Subjects: Emerging Technologies (cs.ET); Data Structures and Algorithms (cs.DS); Quantum Physics (quant-ph)
- Arxiv link: https://arxiv.org/abs/2309.14153
- Pdf link: https://arxiv.org/pdf/2309.14153
- Abstract Finding a minimum is an essential part of mathematical models, and it plays an important role in some optimization problems. Durr and Hoyer proposed a quantum searching algorithm (DHA), with a certain probability of success, to achieve quadratic speed than classical ones. In this paper, we propose an optimized quantum minimum searching algorithm with sure-success probability, which utilizes Grover-Long searching to implement the optimal exact searching, and the dynamic strategy to reduce the iterations of our algorithm. Besides, we optimize the oracle circuit to reduce the number of gates by the simplified rules. The performance evaluation including the theoretical success rate and computational complexity shows that our algorithm has higher accuracy and efficiency than DHA algorithm. Finally, a simulation experiment based on Cirq is performed to verify its feasibility.
Continual Driving Policy Optimization with Closed-Loop Individualized Curricula
- Authors: Authors: Haoyi Niu, Yizhou Xu, Xingjian Jiang, Jianming Hu
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.14209
- Pdf link: https://arxiv.org/pdf/2309.14209
- Abstract The safety of autonomous vehicles (AV) has been a long-standing top concern, stemming from the absence of rare and safety-critical scenarios in the long-tail naturalistic driving distribution. To tackle this challenge, a surge of research in scenario-based autonomous driving has emerged, with a focus on generating high-risk driving scenarios and applying them to conduct safety-critical testing of AV models. However, limited work has been explored on the reuse of these extensive scenarios to iteratively improve AV models. Moreover, it remains intractable and challenging to filter through gigantic scenario libraries collected from other AV models with distinct behaviors, attempting to extract transferable information for current AV improvement. Therefore, we develop a continual driving policy optimization framework featuring Closed-Loop Individualized Curricula (CLIC), which we factorize into a set of standardized sub-modules for flexible implementation choices: AV Evaluation, Scenario Selection, and AV Training. CLIC frames AV Evaluation as a collision prediction task, where it estimates the chance of AV failures in these scenarios at each iteration. Subsequently, by re-sampling from historical scenarios based on these failure probabilities, CLIC tailors individualized curricula for downstream training, aligning them with the evaluated capability of AV. Accordingly, CLIC not only maximizes the utilization of the vast pre-collected scenario library for closed-loop driving policy optimization but also facilitates AV improvement by individualizing its training with more challenging cases out of those poorly organized scenarios. Experimental results clearly indicate that CLIC surpasses other curriculum-based training strategies, showing substantial improvement in managing risky scenarios, while still maintaining proficiency in handling simpler cases.
Daily Assistive Modular Robot Design Based on Multi-Objective Black-Box Optimization
- Authors: Authors: Kento Kawaharazuka, Tasuku Makabe, Kei Okada, Masayuki Inaba
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.14226
- Pdf link: https://arxiv.org/pdf/2309.14226
- Abstract The range of robot activities is expanding from industries with fixed environments to diverse and changing environments, such as nursing care support and daily life support. In particular, autonomous construction of robots that are personalized for each user and task is required. Therefore, we develop an actuator module that can be reconfigured to various link configurations, can carry heavy objects using a locking mechanism, and can be easily operated by human teaching using a releasing mechanism. Given multiple target coordinates, a modular robot configuration that satisfies these coordinates and minimizes the required torque is automatically generated by Tree-structured Parzen Estimator (TPE), a type of black-box optimization. Based on the obtained results, we show that the robot can be reconfigured to perform various functions such as moving monitors and lights, serving food, and so on.
Learning Risk-Aware Quadrupedal Locomotion using Distributional Reinforcement Learning
- Authors: Authors: Lukas Schneider, Jonas Frey, Takahiro Miki, Marco Hutter
- Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.14246
- Pdf link: https://arxiv.org/pdf/2309.14246
- Abstract Deployment in hazardous environments requires robots to understand the risks associated with their actions and movements to prevent accidents. Despite its importance, these risks are not explicitly modeled by currently deployed locomotion controllers for legged robots. In this work, we propose a risk sensitive locomotion training method employing distributional reinforcement learning to consider safety explicitly. Instead of relying on a value expectation, we estimate the complete value distribution to account for uncertainty in the robot's interaction with the environment. The value distribution is consumed by a risk metric to extract risk sensitive value estimates. These are integrated into Proximal Policy Optimization (PPO) to derive our method, Distributional Proximal Policy Optimization (DPPO). The risk preference, ranging from risk-averse to risk-seeking, can be controlled by a single parameter, which enables to adjust the robot's behavior dynamically. Importantly, our approach removes the need for additional reward function tuning to achieve risk sensitivity. We show emergent risk sensitive locomotion behavior in simulation and on the quadrupedal robot ANYmal.
Unwieldy Object Delivery with Nonholonomic Mobile Base: A Stable Pushing Approach
- Authors: Authors: Yujie Tang, Hai Zhu, Susan Potters, Martijn Wisse, Wei Pan
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.14295
- Pdf link: https://arxiv.org/pdf/2309.14295
- Abstract This paper addresses the problem of pushing manipulation with nonholonomic mobile robots. Pushing is a fundamental skill that enables robots to move unwieldy objects that cannot be grasped. We propose a stable pushing method that maintains stiff contact between the robot and the object to avoid consuming repositioning actions. We prove that a line contact, rather than a single point contact, is necessary for nonholonomic robots to achieve stable pushing. We also show that the stable pushing constraint and the nonholonomic constraint of the robot can be simplified as a concise linear motion constraint. Then the pushing planning problem can be formulated as a constrained optimization problem using nonlinear model predictive control (NMPC). According to the experiments, our NMPC-based planner outperforms a reactive pushing strategy in terms of efficiency, reducing the robot's traveled distance by 23.8% and time by 77.4%. Furthermore, our method requires four fewer hyperparameters and decision variables than the Linear Time-Varying (LTV) MPC approach, making it easier to implement. Real-world experiments are carried out to validate the proposed method with two differential-drive robots, Husky and Boxer, under different friction conditions.
LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference
- Authors: Authors: Hongwu Peng, Ran Ran, Yukui Luo, Jiahui Zhao, Shaoyi Huang, Kiran Thorat, Tong Geng, Chenghong Wang, Xiaolin Xu, Wujie Wen, Caiwen Ding
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2309.14331
- Pdf link: https://arxiv.org/pdf/2309.14331
- Abstract The growth of Graph Convolution Network (GCN) model sizes has revolutionized numerous applications, surpassing human performance in areas such as personal healthcare and financial systems. The deployment of GCNs in the cloud raises privacy concerns due to potential adversarial attacks on client data. To address security concerns, Privacy-Preserving Machine Learning (PPML) using Homomorphic Encryption (HE) secures sensitive client data. However, it introduces substantial computational overhead in practical applications. To tackle those challenges, we present LinGCN, a framework designed to reduce multiplication depth and optimize the performance of HE based GCN inference. LinGCN is structured around three key elements: (1) A differentiable structural linearization algorithm, complemented by a parameterized discrete indicator function, co-trained with model weights to meet the optimization goal. This strategy promotes fine-grained node-level non-linear location selection, resulting in a model with minimized multiplication depth. (2) A compact node-wise polynomial replacement policy with a second-order trainable activation function, steered towards superior convergence by a two-level distillation approach from an all-ReLU based teacher model. (3) an enhanced HE solution that enables finer-grained operator fusion for node-wise activation functions, further reducing multiplication level consumption in HE-based inference. Our experiments on the NTU-XVIEW skeleton joint dataset reveal that LinGCN excels in latency, accuracy, and scalability for homomorphically encrypted inference, outperforming solutions such as CryptoGCN. Remarkably, LinGCN achieves a 14.2x latency speedup relative to CryptoGCN, while preserving an inference accuracy of 75% and notably reducing multiplication depth.
Keyword: adam
A Further Study of Vectorial Dual-Bent Functions
- Authors: Authors: Jiaxin Wang, Fang-Wei Fu, Yadi Wei, Jing Yang
- Subjects: Information Theory (cs.IT)
- Arxiv link: https://arxiv.org/abs/2309.13395
- Pdf link: https://arxiv.org/pdf/2309.13395
- Abstract Vectorial dual-bent functions have recently attracted some researchers' interest as they play a significant role in constructing partial difference sets, association schemes, bent partitions and linear codes. In this paper, we further study vectorial dual-bent functions $F: V_{n}^{(p)}\rightarrow V_{m}^{(p)}$, where $2\leq m \leq \frac{n}{2}$, $V_{n}^{(p)}$ denotes an $n$-dimensional vector space over the prime field $\mathbb{F}{p}$. We give new characterizations of certain vectorial dual-bent functions (called vectorial dual-bent functions with Condition A) in terms of amorphic association schemes, linear codes and generalized Hadamard matrices, respectively. When $p=2$, we characterize vectorial dual-bent functions with Condition A in terms of bent partitions. Furthermore, we characterize certain bent partitions in terms of amorphic association schemes, linear codes and generalized Hadamard matrices, respectively. For general vectorial dual-bent functions $F: V{n}^{(p)}\rightarrow V_{m}^{(p)}$ with $F(0)=0, F(x)=F(-x)$ and $2\leq m \leq \frac{n}{2}$, we give a necessary and sufficient condition on constructing association schemes. Based on such a result, more association schemes are constructed from vectorial dual-bent functions.
AdaMap: High-Scalable Real-Time Cooperative Perception at the Edge
- Authors: Authors: Qiang Liu, Yongjie Xue, Yuru Zhang, Dawei Chen, Kyungtae hAN
- Subjects: Robotics (cs.RO); Emerging Technologies (cs.ET)
- Arxiv link: https://arxiv.org/abs/2309.13526
- Pdf link: https://arxiv.org/pdf/2309.13526
- Abstract Cooperative perception is the key approach to augment the perception of connected and automated vehicles (CAVs) toward safe autonomous driving. However, it is challenging to achieve real-time perception sharing for hundreds of CAVs in large-scale deployment scenarios. In this paper, we propose AdaMap, a new high-scalable real-time cooperative perception system, which achieves assured percentile end-to-end latency under time-varying network dynamics. To achieve AdaMap, we design a tightly coupled data plane and control plane. In the data plane, we design a new hybrid localization module to dynamically switch between object detection and tracking, and a novel point cloud representation module to adaptively compress and reconstruct the point cloud of detected objects. In the control plane, we design a new graph-based object selection method to un-select excessive multi-viewed point clouds of objects, and a novel approximated gradient descent algorithm to optimize the representation of point clouds. We implement AdaMap on an emulation platform, including realistic vehicle and server computation and a simulated 5G network, under a 150-CAV trace collected from the CARLA simulator. The evaluation results show that, AdaMap reduces up to 49x average transmission data size at the cost of 0.37 reconstruction loss, as compared to state-of-the-art solutions, which verifies its high scalability, adaptability, and computation efficiency.
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
- Authors: Authors: Guo-qing Jiang, Jinlong Liu, Zixiang Ding, Lin Guo, Wei Lin
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13681
- Pdf link: https://arxiv.org/pdf/2309.13681
- Abstract As models for nature language processing (NLP), computer vision (CV) and recommendation systems (RS) require surging computation, a large number of GPUs/TPUs are paralleled as a large batch (LB) to improve training throughput. However, training such LB tasks often meets large generalization gap and downgrades final precision, which limits enlarging the batch size. In this work, we develop the variance reduced gradient descent technique (VRGD) based on the gradient signal to noise ratio (GSNR) and apply it onto popular optimizers such as SGD/Adam/LARS/LAMB. We carry out a theoretical analysis of convergence rate to explain its fast training dynamics, and a generalization analysis to demonstrate its smaller generalization gap on LB training. Comprehensive experiments demonstrate that VRGD can accelerate training ($1\sim 2 \times$), narrow generalization gap and improve final accuracy. We push the batch size limit of BERT pretraining up to 128k/64k and DLRM to 512k without noticeable accuracy loss. We improve ImageNet Top-1 accuracy at 96k by $0.52pp$ than LARS. The generalization gap of BERT and ImageNet training is significantly reduce by over $65%$.
Identification of Mixtures of Discrete Product Distributions in Near-Optimal Sample and Time Complexity
- Authors: Authors: Spencer L. Gordon, Erik Jahn, Bijan Mazaheri, Yuval Rabani, Leonard J. Schulman
- Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Signal Processing (eess.SP); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2309.13993
- Pdf link: https://arxiv.org/pdf/2309.13993
- Abstract We consider the problem of identifying, from statistics, a distribution of discrete random variables $X_1,\ldots,X_n$ that is a mixture of $k$ product distributions. The best previous sample complexity for $n \in O(k)$ was $(1/\zeta)^{O(k^2 \log k)}$ (under a mild separation assumption parameterized by $\zeta$). The best known lower bound was $\exp(\Omega(k))$. It is known that $n\geq 2k-1$ is necessary and sufficient for identification. We show, for any $n\geq 2k-1$, how to achieve sample complexity and run-time complexity $(1/\zeta)^{O(k)}$. We also extend the known lower bound of $e^{\Omega(k)}$ to match our upper bound across a broad range of $\zeta$. Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.
Keyword: gradient
Machine Learning Technique Based Fake News Detection
- Authors: Authors: Biplob Kumar Sutradhar, Md. Zonaid, Nushrat Jahan Ria, Sheak Rashed Haider Noori
- Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13069
- Pdf link: https://arxiv.org/pdf/2309.13069
- Abstract False news has received attention from both the general public and the scholarly world. Such false information has the ability to affect public perception, giving nefarious groups the chance to influence the results of public events like elections. Anyone can share fake news or facts about anyone or anything for their personal gain or to cause someone trouble. Also, information varies depending on the part of the world it is shared on. Thus, in this paper, we have trained a model to classify fake and true news by utilizing the 1876 news data from our collected dataset. We have preprocessed the data to get clean and filtered texts by following the Natural Language Processing approaches. Our research conducts 3 popular Machine Learning (Stochastic gradient descent, Na"ive Bayes, Logistic Regression,) and 2 Deep Learning (Long-Short Term Memory, ASGD Weight-Dropped LSTM, or AWD-LSTM) algorithms. After we have found our best Naive Bayes classifier with 56% accuracy and an F1-macro score of an average of 32%.
A Differentiable Framework for End-to-End Learning of Hybrid Structured Compression
- Authors: Authors: Moonjung Eo, Suhyun Kang, Wonjong Rhee
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
- Arxiv link: https://arxiv.org/abs/2309.13077
- Pdf link: https://arxiv.org/pdf/2309.13077
- Abstract Filter pruning and low-rank decomposition are two of the foundational techniques for structured compression. Although recent efforts have explored hybrid approaches aiming to integrate the advantages of both techniques, their performance gains have been modest at best. In this study, we develop a \textit{Differentiable Framework~(DF)} that can express filter selection, rank selection, and budget constraint into a single analytical formulation. Within the framework, we introduce DML-S for filter selection, integrating scheduling into existing mask learning techniques. Additionally, we present DTL-S for rank selection, utilizing a singular value thresholding operator. The framework with DML-S and DTL-S offers a hybrid structured compression methodology that facilitates end-to-end learning through gradient-base optimization. Experimental results demonstrate the efficacy of DF, surpassing state-of-the-art structured compression methods. Our work establishes a robust and versatile avenue for advancing structured compression techniques.
Flow Factorized Representation Learning
- Authors: Authors: Yue Song, T. Anderson Keller, Nicu Sebe, Max Welling
- Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2309.13167
- Pdf link: https://arxiv.org/pdf/2309.13167
- Abstract A prominent goal of representation learning research is to achieve representations which are factorized in a useful manner with respect to the ground truth factors of variation. The fields of disentangled and equivariant representation learning have approached this ideal from a range of complimentary perspectives; however, to date, most approaches have proven to either be ill-specified or insufficiently flexible to effectively separate all realistic factors of interest in a learned latent space. In this work, we propose an alternative viewpoint on such structured representation learning which we call Flow Factorized Representation Learning, and demonstrate it to learn both more efficient and more usefully structured representations than existing frameworks. Specifically, we introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations. Each latent flow is generated by the gradient field of a learned potential following dynamic optimal transport. Our novel setup brings new understandings to both \textit{disentanglement} and \textit{equivariance}. We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models. Furthermore, we demonstrate that the transformations learned by our model are flexibly composable and can also extrapolate to new data, implying a degree of robustness and generalizability approaching the ultimate goal of usefully factorized representation learning.
Can I Trust the Explanations? Investigating Explainable Machine Learning Methods for Monotonic Models
- Authors: Authors: Dangxing Chen
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Finance (q-fin.CP)
- Arxiv link: https://arxiv.org/abs/2309.13246
- Pdf link: https://arxiv.org/pdf/2309.13246
- Abstract In recent years, explainable machine learning methods have been very successful. Despite their success, most explainable machine learning methods are applied to black-box models without any domain knowledge. By incorporating domain knowledge, science-informed machine learning models have demonstrated better generalization and interpretation. But do we obtain consistent scientific explanations if we apply explainable machine learning methods to science-informed machine learning models? This question is addressed in the context of monotonic models that exhibit three different types of monotonicity. To demonstrate monotonicity, we propose three axioms. Accordingly, this study shows that when only individual monotonicity is involved, the baseline Shapley value provides good explanations; however, when strong pairwise monotonicity is involved, the Integrated gradients method provides reasonable explanations on average.
Zen: Near-Optimal Sparse Tensor Synchronization for Distributed DNN Training
- Authors: Authors: Zhuang Wang, Zhaozhuo Xu, Anshumali Shrivastava, T. S. Eugene Ng
- Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
- Arxiv link: https://arxiv.org/abs/2309.13254
- Pdf link: https://arxiv.org/pdf/2309.13254
- Abstract Distributed training is the de facto standard to scale up the training of Deep Neural Networks (DNNs) with multiple GPUs. The performance bottleneck of distributed training lies in communications for gradient synchronization. Recently, practitioners have observed sparsity in gradient tensors, suggesting the potential to reduce the traffic volume in communication and improve end-to-end training efficiency. Yet, the optimal communication scheme to fully leverage sparsity is still missing. This paper aims to address this gap. We first analyze the characteristics of sparse tensors in popular DNN models to understand the fundamentals of sparsity. We then systematically explore the design space of communication schemes for sparse tensors and find the optimal one. % We then find the optimal scheme based on the characteristics by systematically exploring the design space. We also develop a gradient synchronization system called Zen that approximately realizes it for sparse tensors. We demonstrate that Zen can achieve up to 5.09x speedup in communication time and up to 2.48x speedup in training throughput compared to the state-of-the-art methods.
CORE: Common Random Reconstruction for Distributed Optimization with Provable Low Communication Complexity
- Authors: Authors: Pengyun Yue, Hanzhen Zhao, Cong Fang, Di He, Liwei Wang, Zhouchen Lin, Song-chun Zhu
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13307
- Pdf link: https://arxiv.org/pdf/2309.13307
- Abstract With distributed machine learning being a prominent technique for large-scale machine learning tasks, communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers. In this paper, we propose a new technique named Common randOm REconstruction(CORE), which can be used to compress the information transmitted between machines in order to reduce communication complexity without other strict conditions. Especially, our technique CORE projects the vector-valued information to a low-dimensional one through common random vectors and reconstructs the information with the same random noises after communication. We apply CORE to two distributed tasks, respectively convex optimization on linear models and generic non-convex optimization, and design new distributed algorithms, which achieve provably lower communication complexities. For example, we show for linear models CORE-based algorithm can encode the gradient vector to $\mathcal{O}(1)$-bits (against $\mathcal{O}(d)$), with the convergence rate not worse, preceding the existing results.
Calibrating LLM-Based Evaluator
- Authors: Authors: Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2309.13308
- Pdf link: https://arxiv.org/pdf/2309.13308
- Abstract Recent advancements in large language models (LLMs) on language modeling and emergent capabilities make them a promising reference-free evaluator of natural language generation quality, and a competent alternative to human evaluation. However, hindered by the closed-source or high computational demand to host and tune, there is a lack of practice to further calibrate an off-the-shelf LLM-based evaluator towards better human alignment. In this work, we propose AutoCalibrate, a multi-stage, gradient-free approach to automatically calibrate and align an LLM-based evaluator toward human preference. Instead of explicitly modeling human preferences, we first implicitly encompass them within a set of human labels. Then, an initial set of scoring criteria is drafted by the language model itself, leveraging in-context learning on different few-shot examples. To further calibrate this set of criteria, we select the best performers and re-draft them with self-refinement. Our experiments on multiple text quality evaluation datasets illustrate a significant improvement in correlation with expert evaluation through calibration. Our comprehensive qualitative analysis conveys insightful intuitions and observations on the essence of effective scoring criteria.
An Interpretable Systematic Review of Machine Learning Models for Predictive Maintenance of Aircraft Engine
- Authors: Authors: Abdullah Al Hasib, Ashikur Rahman, Mahpara Khabir, Md. Tanvir Rouf Shawon
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13310
- Pdf link: https://arxiv.org/pdf/2309.13310
- Abstract This paper presents an interpretable review of various machine learning and deep learning models to predict the maintenance of aircraft engine to avoid any kind of disaster. One of the advantages of the strategy is that it can work with modest datasets. In this study, sensor data is utilized to predict aircraft engine failure within a predetermined number of cycles using LSTM, Bi-LSTM, RNN, Bi-RNN GRU, Random Forest, KNN, Naive Bayes, and Gradient Boosting. We explain how deep learning and machine learning can be used to generate predictions in predictive maintenance using a straightforward scenario with just one data source. We applied lime to the models to help us understand why machine learning models did not perform well than deep learning models. An extensive analysis of the model's behavior is presented for several test data to understand the black box scenario of the models. A lucrative accuracy of 97.8%, 97.14%, and 96.42% are achieved by GRU, Bi-LSTM, and LSTM respectively which denotes the capability of the models to predict maintenance at an early stage.
Joint Explainability and Sensitivity-Aware Federated Deep Learning for Transparent 6G RAN Slicing
- Authors: Authors: Swastika Roy, Farhad Rezazadeh, Hatim Chergui, Christos Verikoukis
- Subjects: Networking and Internet Architecture (cs.NI)
- Arxiv link: https://arxiv.org/abs/2309.13325
- Pdf link: https://arxiv.org/pdf/2309.13325
- Abstract In recent years, wireless networks are evolving complex, which upsurges the use of zero-touch artificial intelligence (AI)-driven network automation within the telecommunication industry. In particular, network slicing, the most promising technology beyond 5G, would embrace AI models to manage the complex communication network. Besides, it is also essential to build the trustworthiness of the AI black boxes in actual deployment when AI makes complex resource management and anomaly detection. Inspired by closed-loop automation and Explainable Artificial intelligence (XAI), we design an Explainable Federated deep learning (FDL) model to predict per-slice RAN dropped traffic probability while jointly considering the sensitivity and explainability-aware metrics as constraints in such non-IID setup. In precise, we quantitatively validate the faithfulness of the explanations via the so-called attribution-based \emph{log-odds metric} that is included as a constraint in the run-time FL optimization task. Simulation results confirm its superiority over an unconstrained integrated-gradient (IG) \emph{post-hoc} FDL baseline.
Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection
- Authors: Authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia
- Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- Arxiv link: https://arxiv.org/abs/2309.13476
- Pdf link: https://arxiv.org/pdf/2309.13476
- Abstract Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-level labelling and introduce a hierarchical interpretation approach to provide both speech-level and sentence-level interpretations, based on gradient-weighted attention maps derived from all attention layers to track interactions between input features. We show that the proposed model outperforms a model that learns at a segment level ($p$=0.854, $r$=0.947, $F1$=0.947 compared to $p$=0.732, $r$=0.808, $F1$=0.768). For model interpretation, using one true positive sample, we show which sentences within a given speech are most relevant to depression detection; and which text tokens and Mel-spectrogram regions within these sentences are most relevant to depression detection. These interpretations allow clinicians to verify the validity of predictions made by depression detection tools, promoting their clinical implementations.
A Unified Scheme of ResNet and Softmax
- Authors: Authors: Zhao Song, Weixin Wang, Junze Yin
- Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2309.13482
- Pdf link: https://arxiv.org/pdf/2309.13482
- Abstract Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical components supporting the functionality of LLMs but also are related to many other machine learning and theoretical computer science fields, including but not limited to image classification, object detection, semantic segmentation, and tensors. Previous research works studied these two concepts separately. In this paper, we provide a theoretical analysis of the regression problem: $| \langle \exp(Ax) + A x , {\bf 1}_n \rangle^{-1} ( \exp(Ax) + Ax ) - b |_2^2$, where $A$ is a matrix in $\mathbb{R}^{n \times d}$, $b$ is a vector in $\mathbb{R}^n$, and ${\bf 1}_n$ is the $n$-dimensional vector whose entries are all $1$. This regression problem is a unified scheme that combines softmax regression and ResNet, which has never been done before. We derive the gradient, Hessian, and Lipschitz properties of the loss function. The Hessian is shown to be positive semidefinite, and its structure is characterized as the sum of a low-rank matrix and a diagonal matrix. This enables an efficient approximate Newton method. As a result, this unified scheme helps to connect two previously thought unrelated fields and provides novel insight into loss landscape and optimization for emerging over-parameterized neural networks, which is meaningful for future research in deep learning models.
Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout
- Authors: Authors: Haoran Wang, Yaoru Sun, Fang Wang, Yeming Chen
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.13508
- Pdf link: https://arxiv.org/pdf/2309.13508
- Abstract Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex long-horizon reinforcement learning (RL) tasks via temporal abstraction. Yet, most goal-conditioned HRL algorithms focused on the subgoal discovery, regardless of inter-level coupling. In essence, for hierarchical systems, the increased inter-level communication and coordination can induce more stable and robust policy improvement. Here, we present a goal-conditioned HRL framework with Guided Cooperation via Model-based Rollout (GCMR), which estimates forward dynamics to promote inter-level cooperation. The GCMR alleviates the state-transition error within off-policy correction through a model-based rollout, further improving the sample efficiency. Meanwhile, to avoid being disrupted by these corrected but possibly unseen or faraway goals, lower-level Q-function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy. Besides, we propose a one-step rollout-based planning to further facilitate inter-level cooperation, where the higher-level Q-function is used to guide the lower-level policy by estimating the value of future states so that global task information is transmitted downwards to avoid local pitfalls. Experimental results demonstrate that incorporating the proposed GCMR framework with ACLG, a disentangled variant of HIGL, yields more stable and robust policy improvement than baselines and substantially outperforms previous state-of-the-art (SOTA) HRL algorithms in both hard-exploration problems and robotic control.
AdaMap: High-Scalable Real-Time Cooperative Perception at the Edge
- Authors: Authors: Qiang Liu, Yongjie Xue, Yuru Zhang, Dawei Chen, Kyungtae hAN
- Subjects: Robotics (cs.RO); Emerging Technologies (cs.ET)
- Arxiv link: https://arxiv.org/abs/2309.13526
- Pdf link: https://arxiv.org/pdf/2309.13526
- Abstract Cooperative perception is the key approach to augment the perception of connected and automated vehicles (CAVs) toward safe autonomous driving. However, it is challenging to achieve real-time perception sharing for hundreds of CAVs in large-scale deployment scenarios. In this paper, we propose AdaMap, a new high-scalable real-time cooperative perception system, which achieves assured percentile end-to-end latency under time-varying network dynamics. To achieve AdaMap, we design a tightly coupled data plane and control plane. In the data plane, we design a new hybrid localization module to dynamically switch between object detection and tracking, and a novel point cloud representation module to adaptively compress and reconstruct the point cloud of detected objects. In the control plane, we design a new graph-based object selection method to un-select excessive multi-viewed point clouds of objects, and a novel approximated gradient descent algorithm to optimize the representation of point clouds. We implement AdaMap on an emulation platform, including realistic vehicle and server computation and a simulated 5G network, under a 150-CAV trace collected from the CARLA simulator. The evaluation results show that, AdaMap reduces up to 49x average transmission data size at the cost of 0.37 reconstruction loss, as compared to state-of-the-art solutions, which verifies its high scalability, adaptability, and computation efficiency.
Tackling the Unlimited Staleness in Federated Learning with Intertwined Data and Device Heterogeneities
- Authors: Authors: Haoming Wang, Wei Gao
- Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
- Arxiv link: https://arxiv.org/abs/2309.13536
- Pdf link: https://arxiv.org/pdf/2309.13536
- Abstract The efficiency of Federated Learning (FL) is often affected by both data and device heterogeneities. Data heterogeneity is defined as the heterogeneity of data distributions on different clients. Device heterogeneity is defined as the clients' variant latencies in uploading their local model updates due to heterogeneous conditions of local hardware resources, and causes the problem of staleness when being addressed by asynchronous FL. Traditional schemes of tackling the impact of staleness consider data and device heterogeneities as two separate and independent aspects in FL, but this assumption is unrealistic in many practical FL scenarios where data and device heterogeneities are intertwined. In these cases, traditional schemes of weighted aggregation in FL have been proved to be ineffective, and a better approach is to convert a stale model update into a non-stale one. In this paper, we present a new FL framework that leverages the gradient inversion technique for such conversion, hence efficiently tackling unlimited staleness in clients' model updates. Our basic idea is to use gradient inversion to get estimations of clients' local training data from their uploaded stale model updates, and use these estimations to compute non-stale client model updates. In this way, we address the problem of possible data quality drop when using gradient inversion, while still preserving the clients' local data privacy. We compared our approach with the existing FL strategies on mainstream datasets and models, and experiment results demonstrate that when tackling unlimited staleness, our approach can significantly improve the trained model accuracy by up to 20% and speed up the FL training progress by up to 35%.
A Novel Stochastic Interacting Particle-Field Algorithm for 3D Parabolic-Parabolic Keller-Segel Chemotaxis System
- Authors: Authors: Zhongjian Wang, Jack Xin, Zhiwen Zhang
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2309.13554
- Pdf link: https://arxiv.org/pdf/2309.13554
- Abstract We introduce an efficient stochastic interacting particle-field (SIPF) algorithm with no history dependence for computing aggregation patterns and near singular solutions of parabolic-parabolic Keller-Segel (KS) chemotaxis system in three space dimensions (3D). The KS solutions are approximated as empirical measures of particles coupled with a smoother field (concentration of chemo-attractant) variable computed by the spectral method. Instead of using heat kernels causing history dependence and high memory cost, we leverage the implicit Euler discretization to derive a one-step recursion in time for stochastic particle positions and the field variable based on the explicit Green's function of an elliptic operator of the form Laplacian minus a positive constant. In numerical experiments, we observe that the resulting SIPF algorithm is convergent and self-adaptive to the high gradient part of solutions. Despite the lack of analytical knowledge (e.g. a self-similar ansatz) of the blowup, the SIPF algorithm provides a low-cost approach to study the emergence of finite time blowup in 3D by only dozens of Fourier modes and through varying the amount of initial mass and tracking the evolution of the field variable. Notably, the algorithm can handle at ease multi-modal initial data and the subsequent complex evolution involving the merging of particle clusters and formation of a finite time singularity.
Task-Oriented Dexterous Grasp Synthesis via Differentiable Grasp Wrench Boundary Estimator
- Authors: Authors: Jiayi Chen, Yuxing Chen, Jialiang Zhang, He Wang
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.13586
- Pdf link: https://arxiv.org/pdf/2309.13586
- Abstract Analytical dexterous grasping synthesis is often driven by grasp quality metrics. However, existing metrics possess many problems, such as being computationally expensive, physically inaccurate, and non-differentiable. Moreover, none of them can facilitate the synthesis of non-force-closure grasps, which account for a significant portion of task-oriented grasping such as lid screwing and button pushing. The main challenge behind all the above drawbacks is the difficulty in modeling the complex Grasp Wrench Space (GWS). In this work, we overcome this challenge by proposing a novel GWS estimator, thus enabling gradient-based task-oriented dexterous grasp synthesis for the first time. Our key contribution is a fast, accurate, and differentiable technique to estimate the GWS boundary with good physical interpretability by parallel sampling and mapping, which does not require iterative optimization. Second, based on our differentiable GWS estimator, we derive a task-oriented energy function to enable gradient-based grasp synthesis and a metric to evaluate non-force-closure grasps. Finally, we improve the previous dexterous grasp synthesis pipeline mainly by a novel technique to make nearest-point calculation differentiable, even on mesh edges and vertices. Extensive experiments are performed to verify the efficiency and effectiveness of our methods. Our GWS estimator can run in several milliseconds on GPUs with minimal memory cost, more than three orders of magnitude faster than the classic discretization-based method. Using this GWS estimator, we synthesize 0.1 million dexterous grasps to show that our pipeline can significantly outperform the SOTA method, even in task-unaware force-closure-grasp synthesis. For task-oriented grasp synthesis, we provide some qualitative results.
Robust Distributed Learning: Tight Error Bounds and Breakdown Point under Data Heterogeneity
- Authors: Authors: Youssef Allouah, Rachid Guerraoui, Nirupam Gupta, Rafaël Pinot, Geovani Rizk
- Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2309.13591
- Pdf link: https://arxiv.org/pdf/2309.13591
- Abstract The theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. Under data heterogeneity however, which is the norm in practical scenarios, established lower bounds on the learning error are essentially vacuous and greatly mismatch empirical observations. This is because the heterogeneity model considered is too restrictive and does not cover basic learning tasks such as least-squares regression. We consider in this paper a more realistic heterogeneity model, namely (G,B)-gradient dissimilarity, and show that it covers a larger class of learning problems than existing theory. Notably, we show that the breakdown point under heterogeneity is lower than the classical fraction 1/2. We also prove a new lower bound on the learning error of any distributed learning algorithm. We derive a matching upper bound for a robust variant of distributed gradient descent, and empirically show that our analysis reduces the gap between theory and practice.
Shape Optimization by Constrained First-Order Least Mean Approximation
- Authors: Authors: Gerhard Starke
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2309.13595
- Pdf link: https://arxiv.org/pdf/2309.13595
- Abstract In this work, the problem of shape optimization, subject to PDE constraints, is reformulated as an $L^p$ best approximation problem under divergence constraints to the shape tensor introduced in Laurain and Sturm: ESAIM Math. Model. Numer. Anal. 50 (2016). More precisely, the main result of this paper states that the $L^p$ distance of the above approximation problem is equal to the dual norm of the shape derivative considered as a functional on $W^{1,p^\ast}$ (where $1/p + 1/p^\ast = 1$). This implies that for any given shape, one can evaluate its distance from being a stationary one with respect to the shape derivative by simply solving the associated $L^p$-type least mean approximation problem. Moreover, the Lagrange multiplier for the divergence constraint turns out to be the shape deformation of steepest descent. This provides a way, as an alternative to the approach by Deckelnick, Herbert and Hinze: ESAIM Control Optim. Calc. Var. 28 (2022), for computing shape gradients in $W^{1,p^\ast}$ for $p^\ast \in ( 2 , \infty )$. The discretization of the least mean approximation problem is done with (lowest-order) matrix-valued Raviart-Thomas finite element spaces leading to piecewise constant approximations of the shape deformation acting as Lagrange multiplier. Admissible deformations in $W^{1,p^\ast}$ to be used in a shape gradient iteration are reconstructed locally. Our computational results confirm that the $L^p$ distance of the best approximation does indeed measure the distance of the considered shape to optimality. Also confirmed by our computational tests are the observations that choosing $p^\ast$ (much) larger than 2 (which means that $p$ must be close to 1 in our best approximation problem) decreases the chance of encountering mesh degeneracy during the shape gradient iteration.
Reinforcement-Enhanced Autoregressive Feature Transformation: Gradient-steered Search in Continuous Space for Postfix Expressions
- Authors: Authors: Dongjie Wang, Meng Xiao, Min Wu, Pengfei Wang, Yuanchun Zhou, Yanjie Fu
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13618
- Pdf link: https://arxiv.org/pdf/2309.13618
- Abstract Feature transformation aims to generate new pattern-discriminative feature space from original features to improve downstream machine learning (ML) task performances. However, the discrete search space for the optimal feature explosively grows on the basis of combinations of features and operations from low-order forms to high-order forms. Existing methods, such as exhaustive search, expansion reduction, evolutionary algorithms, reinforcement learning, and iterative greedy, suffer from large search space. Overly emphasizing efficiency in algorithm design usually sacrifices stability or robustness. To fundamentally fill this gap, we reformulate discrete feature transformation as a continuous space optimization task and develop an embedding-optimization-reconstruction framework. This framework includes four steps: 1) reinforcement-enhanced data preparation, aiming to prepare high-quality transformation-accuracy training data; 2) feature transformation operation sequence embedding, intending to encapsulate the knowledge of prepared training data within a continuous space; 3) gradient-steered optimal embedding search, dedicating to uncover potentially superior embeddings within the learned space; 4) transformation operation sequence reconstruction, striving to reproduce the feature transformation solution to pinpoint the optimal feature space.
PRIS: Practical robust invertible network for image steganography
- Authors: Authors: Hang Yang, Yitian Xu, Xuhua Liu, Xiaodong Ma
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2309.13620
- Pdf link: https://arxiv.org/pdf/2309.13620
- Abstract Image steganography is a technique of hiding secret information inside another image, so that the secret is not visible to human eyes and can be recovered when needed. Most of the existing image steganography methods have low hiding robustness when the container images affected by distortion. Such as Gaussian noise and lossy compression. This paper proposed PRIS to improve the robustness of image steganography, it based on invertible neural networks, and put two enhance modules before and after the extraction process with a 3-step training strategy. Moreover, rounding error is considered which is always ignored by existing methods, but actually it is unavoidable in practical. A gradient approximation function (GAF) is also proposed to overcome the undifferentiable issue of rounding distortion. Experimental results show that our PRIS outperforms the state-of-the-art robust image steganography method in both robustness and practicability. Codes are available at https://github.com/yanghangAI/PRIS, demonstration of our model in practical at this http URL
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
- Authors: Authors: Guo-qing Jiang, Jinlong Liu, Zixiang Ding, Lin Guo, Wei Lin
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13681
- Pdf link: https://arxiv.org/pdf/2309.13681
- Abstract As models for nature language processing (NLP), computer vision (CV) and recommendation systems (RS) require surging computation, a large number of GPUs/TPUs are paralleled as a large batch (LB) to improve training throughput. However, training such LB tasks often meets large generalization gap and downgrades final precision, which limits enlarging the batch size. In this work, we develop the variance reduced gradient descent technique (VRGD) based on the gradient signal to noise ratio (GSNR) and apply it onto popular optimizers such as SGD/Adam/LARS/LAMB. We carry out a theoretical analysis of convergence rate to explain its fast training dynamics, and a generalization analysis to demonstrate its smaller generalization gap on LB training. Comprehensive experiments demonstrate that VRGD can accelerate training ($1\sim 2 \times$), narrow generalization gap and improve final accuracy. We push the batch size limit of BERT pretraining up to 128k/64k and DLRM to 512k without noticeable accuracy loss. We improve ImageNet Top-1 accuracy at 96k by $0.52pp$ than LARS. The generalization gap of BERT and ImageNet training is significantly reduce by over $65%$.
Video Adverse-Weather-Component Suppression Network via Weather Messenger and Adversarial Backpropagation
- Authors: Authors: Yijun Yang, Angelica I. Aviles-Rivero, Huazhu Fu, Ye Liu, Weiming Wang, Lei Zhu
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2309.13700
- Pdf link: https://arxiv.org/pdf/2309.13700
- Abstract Although convolutional neural networks (CNNs) have been proposed to remove adverse weather conditions in single images using a single set of pre-trained weights, they fail to restore weather videos due to the absence of temporal information. Furthermore, existing methods for removing adverse weather conditions (e.g., rain, fog, and snow) from videos can only handle one type of adverse weather. In this work, we propose the first framework for restoring videos from all adverse weather conditions by developing a video adverse-weather-component suppression network (ViWS-Net). To achieve this, we first devise a weather-agnostic video transformer encoder with multiple transformer stages. Moreover, we design a long short-term temporal modeling mechanism for weather messenger to early fuse input adjacent video frames and learn weather-specific information. We further introduce a weather discriminator with gradient reversion, to maintain the weather-invariant common information and suppress the weather-specific information in pixel features, by adversarially predicting weather types. Finally, we develop a messenger-driven video transformer decoder to retrieve the residual weather-specific feature, which is spatiotemporally aggregated with hierarchical pixel features and refined to predict the clean target frame of input videos. Experimental results, on benchmark datasets and real-world weather videos, demonstrate that our ViWS-Net outperforms current state-of-the-art methods in terms of restoring videos degraded by any weather condition.
Adversarial Attacks on Video Object Segmentation with Hard Region Discovery
- Authors: Authors: Ping Li, Yu Zhang, Li Yuan, Jian Zhao, Xianghua Xu, Xiaoqin Zhang
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2309.13857
- Pdf link: https://arxiv.org/pdf/2309.13857
- Abstract Video object segmentation has been applied to various computer vision tasks, such as video editing, autonomous driving, and human-robot interaction. However, the methods based on deep neural networks are vulnerable to adversarial examples, which are the inputs attacked by almost human-imperceptible perturbations, and the adversary (i.e., attacker) will fool the segmentation model to make incorrect pixel-level predictions. This will rise the security issues in highly-demanding tasks because small perturbations to the input video will result in potential attack risks. Though adversarial examples have been extensively used for classification, it is rarely studied in video object segmentation. Existing related methods in computer vision either require prior knowledge of categories or cannot be directly applied due to the special design for certain tasks, failing to consider the pixel-wise region attack. Hence, this work develops an object-agnostic adversary that has adversarial impacts on VOS by first-frame attacking via hard region discovery. Particularly, the gradients from the segmentation model are exploited to discover the easily confused region, in which it is difficult to identify the pixel-wise objects from the background in a frame. This provides a hardness map that helps to generate perturbations with a stronger adversarial power for attacking the first frame. Empirical studies on three benchmarks indicate that our attacker significantly degrades the performance of several state-of-the-art video object segmentation models.
PA-iMFL: Communication-Efficient Privacy Amplification Method against Data Reconstruction Attack in Improved Multi-Layer Federated Learning
- Authors: Authors: Jianhua Wang, Xiaolin Chang, Jelena Mišić, Vojislav B. Mišić, Zhi Chen, Junchao Fan
- Subjects: Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2309.13864
- Pdf link: https://arxiv.org/pdf/2309.13864
- Abstract Recently, big data has seen explosive growth in the Internet of Things (IoT). Multi-layer FL (MFL) based on cloud-edge-end architecture can promote model training efficiency and model accuracy while preserving IoT data privacy. This paper considers an improved MFL, where edge layer devices own private data and can join the training process. iMFL can improve edge resource utilization and also alleviate the strict requirement of end devices, but suffers from the issues of Data Reconstruction Attack (DRA) and unacceptable communication overhead. This paper aims to address these issues with iMFL. We propose a Privacy Amplification scheme on iMFL (PA-iMFL). Differing from standard MFL, we design privacy operations in end and edge devices after local training, including three sequential components, local differential privacy with Laplace mechanism, privacy amplification subsample, and gradient sign reset. Benefitting from privacy operations, PA-iMFL reduces communication overhead and achieves privacy-preserving. Extensive results demonstrate that against State-Of-The-Art (SOTA) DRAs, PA-iMFL can effectively mitigate private data leakage and reach the same level of protection capability as the SOTA defense model. Moreover, due to adopting privacy operations in edge devices, PA-iMFL promotes up to 2.8 times communication efficiency than the SOTA compression method without compromising model accuracy.
Newton Method-based Subspace Support Vector Data Description
- Authors: Authors: Fahad Sohrab, Firas Laakom, Moncef Gabbouj
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.13960
- Pdf link: https://arxiv.org/pdf/2309.13960
- Abstract In this paper, we present an adaptation of Newton's method for the optimization of Subspace Support Vector Data Description (S-SVDD). The objective of S-SVDD is to map the original data to a subspace optimized for one-class classification, and the iterative optimization process of data mapping and description in S-SVDD relies on gradient descent. However, gradient descent only utilizes first-order information, which may lead to suboptimal results. To address this limitation, we leverage Newton's method to enhance data mapping and data description for an improved optimization of subspace learning-based one-class classification. By incorporating this auxiliary information, Newton's method offers a more efficient strategy for subspace learning in one-class classification as compared to gradient-based optimization. The paper discusses the limitations of gradient descent and the advantages of using Newton's method in subspace learning for one-class classification tasks. We provide both linear and nonlinear formulations of Newton's method-based optimization for S-SVDD. In our experiments, we explored both the minimization and maximization strategies of the objective. The results demonstrate that the proposed optimization strategy outperforms the gradient-based S-SVDD in most cases.
Physics-Driven ML-Based Modelling for Correcting Inverse Estimation
- Authors: Authors: Ruiyuan Kang, Tingting Mu, Panos Liatsis, Dimitrios C. Kyritsis
- Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
- Arxiv link: https://arxiv.org/abs/2309.13985
- Pdf link: https://arxiv.org/pdf/2309.13985
- Abstract When deploying machine learning estimators in science and engineering (SAE) domains, it is critical to avoid failed estimations that can have disastrous consequences, e.g., in aero engine design. This work focuses on detecting and correcting failed state estimations before adopting them in SAE inverse problems, by utilizing simulations and performance metrics guided by physical laws. We suggest to flag a machine learning estimation when its physical model error exceeds a feasible threshold, and propose a novel approach, GEESE, to correct it through optimization, aiming at delivering both low error and high efficiency. The key designs of GEESE include (1) a hybrid surrogate error model to provide fast error estimations to reduce simulation cost and to enable gradient based backpropagation of error feedback, and (2) two generative models to approximate the probability distributions of the candidate states for simulating the exploitation and exploration behaviours. All three models are constructed as neural networks. GEESE is tested on three real-world SAE inverse problems and compared to a number of state-of-the-art optimization/search approaches. Results show that it fails the least number of times in terms of finding a feasible state correction, and requires physical evaluations less frequently in general.
Diffeomorphic Transformations for Time Series Analysis: An Efficient Approach to Nonlinear Warping
- Authors: Authors: Iñigo Martinez
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.14029
- Pdf link: https://arxiv.org/pdf/2309.14029
- Abstract The proliferation and ubiquity of temporal data across many disciplines has sparked interest for similarity, classification and clustering methods specifically designed to handle time series data. A core issue when dealing with time series is determining their pairwise similarity, i.e., the degree to which a given time series resembles another. Traditional distance measures such as the Euclidean are not well-suited due to the time-dependent nature of the data. Elastic metrics such as dynamic time warping (DTW) offer a promising approach, but are limited by their computational complexity, non-differentiability and sensitivity to noise and outliers. This thesis proposes novel elastic alignment methods that use parametric & diffeomorphic warping transformations as a means of overcoming the shortcomings of DTW-based metrics. The proposed method is differentiable & invertible, well-suited for deep learning architectures, robust to noise and outliers, computationally efficient, and is expressive and flexible enough to capture complex patterns. Furthermore, a closed-form solution was developed for the gradient of these diffeomorphic transformations, which allows an efficient search in the parameter space, leading to better solutions at convergence. Leveraging the benefits of these closed-form diffeomorphic transformations, this thesis proposes a suite of advancements that include: (a) an enhanced temporal transformer network for time series alignment and averaging, (b) a deep-learning based time series classification model to simultaneously align and classify signals with high accuracy, (c) an incremental time series clustering algorithm that is warping-invariant, scalable and can operate under limited computational and time resources, and finally, (d) a normalizing flow model that enhances the flexibility of affine transformations in coupling and autoregressive layers.
An automatic selection of optimal recurrent neural network architecture for processes dynamics modelling purposes
- Authors: Authors: Krzysztof Laddach, Rafał Łangowski, Tomasz A. Rutkowski, Bartosz Puchalski
- Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.14037
- Pdf link: https://arxiv.org/pdf/2309.14037
- Abstract A problem related to the development of algorithms designed to find the structure of artificial neural network used for behavioural (black-box) modelling of selected dynamic processes has been addressed in this paper. The research has included four original proposals of algorithms dedicated to neural network architecture search. Algorithms have been based on well-known optimisation techniques such as evolutionary algorithms and gradient descent methods. In the presented research an artificial neural network of recurrent type has been used, whose architecture has been selected in an optimised way based on the above-mentioned algorithms. The optimality has been understood as achieving a trade-off between the size of the neural network and its accuracy in capturing the response of the mathematical model under which it has been learnt. During the optimisation, original specialised evolutionary operators have been proposed. The research involved an extended validation study based on data generated from a mathematical model of the fast processes occurring in a pressurised water nuclear reactor.
Exploring the Impact of Serverless Computing on Peer To Peer Training Machine Learning
- Authors: Authors: Amine Barral, Ranim Trabelsi, Fehmi Jaafar, Fabio Petrillo
- Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.14139
- Pdf link: https://arxiv.org/pdf/2309.14139
- Abstract The increasing demand for computational power in big data and machine learning has driven the development of distributed training methodologies. Among these, peer-to-peer (P2P) networks provide advantages such as enhanced scalability and fault tolerance. However, they also encounter challenges related to resource consumption, costs, and communication overhead as the number of participating peers grows. In this paper, we introduce a novel architecture that combines serverless computing with P2P networks for distributed training and present a method for efficient parallel gradient computation under resource constraints. Our findings show a significant enhancement in gradient computation time, with up to a 97.34% improvement compared to conventional P2P distributed training methods. As for costs, our examination confirmed that the serverless architecture could incur higher expenses, reaching up to 5.4 times more than instance-based architectures. It is essential to consider that these higher costs are associated with marked improvements in computation time, particularly under resource-constrained scenarios. Despite the cost-time trade-off, the serverless approach still holds promise due to its pay-as-you-go model. Utilizing dynamic resource allocation, it enables faster training times and optimized resource utilization, making it a promising candidate for a wide range of machine learning applications.
SPIRT: A Fault-Tolerant and Reliable Peer-to-Peer Serverless ML Training Architecture
- Authors: Authors: Amine Barrak, Mayssa Jaziri, Ranim Trabelsi, Fehmi Jaafar, Fabio Petrillo
- Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.14148
- Pdf link: https://arxiv.org/pdf/2309.14148
- Abstract The advent of serverless computing has ushered in notable advancements in distributed machine learning, particularly within parameter server-based architectures. Yet, the integration of serverless features within peer-to-peer (P2P) distributed networks remains largely uncharted. In this paper, we introduce SPIRT, a fault-tolerant, reliable, and secure serverless P2P ML training architecture. designed to bridge this existing gap. Capitalizing on the inherent robustness and reliability innate to P2P systems, SPIRT employs RedisAI for in-database operations, leading to an 82% reduction in the time required for model updates and gradient averaging across a variety of models and batch sizes. This architecture showcases resilience against peer failures and adeptly manages the integration of new peers, thereby highlighting its fault-tolerant characteristics and scalability. Furthermore, SPIRT ensures secure communication between peers, enhancing the reliability of distributed machine learning tasks. Even in the face of Byzantine attacks, the system's robust aggregation algorithms maintain high levels of accuracy. These findings illuminate the promising potential of serverless architectures in P2P distributed machine learning, offering a significant stride towards the development of more efficient, scalable, and resilient applications.
Sparse grid based Chebyshev HOPGD for parameterized linear systems
- Authors: Authors: Siobhán Correnty, Melina A. Freitag, Kirk M. Soodhalter
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2309.14178
- Pdf link: https://arxiv.org/pdf/2309.14178
- Abstract We consider approximating solutions to parameterized linear systems of the form $A(\mu_1,\mu_2) x(\mu_1,\mu_2) = b$, where $(\mu_1, \mu_2) \in \mathbb{R}^2$. Here the matrix $A(\mu_1,\mu_2) \in \mathbb{R}^{n \times n}$ is nonsingular, large, and sparse and depends nonlinearly on the parameters $\mu_1$ and $\mu_2$. Specifically, the system arises from a discretization of a partial differential equation and $x(\mu_1,\mu_2) \in \mathbb{R}^n$, $b \in \mathbb{R}^n$. This work combines companion linearization with the Krylov subspace method preconditioned bi-conjugate gradient (BiCG) and a decomposition of a tensor matrix of precomputed solutions, called snapshots. As a result, a reduced order model of $x(\mu_1,\mu_2)$ is constructed, and this model can be evaluated in a cheap way for many values of the parameters. The decomposition is performed efficiently using the sparse grid based higher-order proper generalized decomposition (HOPGD), and the snapshots are generated as one variable functions of $\mu_1$ or of $\mu_2$. Tensor decompositions performed on a set of snapshots can fail to reach a certain level of accuracy, and it is not possible to know a priori if the decomposition will be successful. This method offers a way to generate a new set of solutions on the same parameter space at little additional cost. An interpolation of the model is used to produce approximations on the entire parameter space, and this method can be used to solve a parameter estimation problem. Numerical examples of a parameterized Helmholtz equation show the competitiveness of our approach. The simulations are reproducible, and the software is available online.
Spring-IMU Fusion Based Proprioception for Feedback Control of Soft Manipulators
- Authors: Authors: Yinan Meng, Guoxin Fang, Jiong Yang, Yuhu Guo, Charlie C.L. Wang
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.14279
- Pdf link: https://arxiv.org/pdf/2309.14279
- Abstract This paper presents a novel framework to realize proprioception and closed-loop control for soft manipulators. Deformations with large elongation and large bending can be precisely predicted using geometry-based sensor signals obtained from the inductive springs and the inertial measurement units (IMUs) with the help of machine learning techniques. Multiple geometric signals are fused into robust pose estimations, and a data-efficient training process is achieved after applying the strategy of sim-to-real transfer. As a result, we can achieve proprioception that is robust to the variation of external loading and has an average error of 0.7% across the workspace on a pneumatic-driven soft manipulator. The realized proprioception on soft manipulator is then contributed to building a sensor-space based algorithm for closed-loop control. A gradient descent solver is developed to drive the end-effector to achieve the required poses by iteratively computing a sequence of reference sensor signals. A conventional controller is employed in the inner loop of our algorithm to update actuators (i.e., the pressures in chambers) for approaching a reference signal in the sensor-space. The systematic function of closed-loop control has been demonstrated in tasks like path following and pick-and-place under different external loads.
Adaptive least-squares space-time finite element methods
- Authors: Authors: Christian Köthe, Richard Löscher, Olaf Steinbach
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2309.14300
- Pdf link: https://arxiv.org/pdf/2309.14300
- Abstract We consider the numerical solution of an abstract operator equation $Bu=f$ by using a least-squares approach. We assume that $B: X \to Y^$ is an isomorphism, and that $A : Y \to Y^$ implies a norm in $Y$, where $X$ and $Y$ are Hilbert spaces. The minimizer of the least-squares functional $\frac{1}{2} , | Bu-f |_{A^{-1}}^2$, i.e., the solution of the operator equation, is then characterized by the gradient equation $Su=B^* A^{-1}f$ with an elliptic and self-adjoint operator $S:=B^* A^{-1} B : X \to X^*$. When introducing the adjoint $p = A^{-1}(f-Bu)$ we end up with a saddle point formulation to be solved numerically by using a mixed finite element method. Based on a discrete inf-sup stability condition we derive related a priori error estimates. While the adjoint $p$ is zero by construction, its approximation $p_h$ serves as a posteriori error indicator to drive an adaptive scheme when discretized appropriately. While this approach can be applied to rather general equations, here we consider second order linear partial differential equations, including the Poisson equation, the heat equation, and the wave equation, in order to demonstrate its potential, which allows to use almost arbitrary space-time finite element methods for the adaptive solution of time-dependent partial differential equations.
Small-scale proxies for large-scale Transformer training instabilities
- Authors: Authors: Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.14322
- Pdf link: https://arxiv.org/pdf/2309.14322
- Abstract Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study training stability and instability at smaller scales. First, we focus on two sources of training instability described in previous work: the growth of logits in attention layers (Dehghani et al., 2023) and divergence of the output logits from the log probabilities (Chowdhery et al., 2022). By measuring the relationship between learning rate and loss across scales, we show that these instabilities also appear in small models when training at high learning rates, and that mitigations previously employed at large scales are equally effective in this regime. This prompts us to investigate the extent to which other known optimizer and model interventions influence the sensitivity of the final loss to changes in the learning rate. To this end, we study methods such as warm-up, weight decay, and the $\mu$Param (Yang et al., 2022), and combine techniques to train small models that achieve similar losses across orders of magnitude of learning rate variation. Finally, to conclude our exploration we study two cases where instabilities can be predicted before they emerge by examining the scaling behavior of model activation and gradient norms.
Keyword: super-resolution
Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data
- Authors: Authors: Wai Tong Chung, Bassem Akoush, Pushan Sharma, Alex Tamkin, Ki Sung Jung, Jacqueline Chen, Jack Guo, Davy Brouzet, Mohsen Talei, Bruno Savard, Alexei Y Poludnenko, Matthias Ihme
- Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)
- Arxiv link: https://arxiv.org/abs/2309.13457
- Pdf link: https://arxiv.org/pdf/2309.13457
- Abstract Analysis of compressible turbulent flows is essential for applications related to propulsion, energy generation, and the environment. Here, we present BLASTNet 2.0, a 2.2 TB network-of-datasets containing 744 full-domain samples from 34 high-fidelity direct numerical simulations, which addresses the current limited availability of 3D high-fidelity reacting and non-reacting compressible turbulent flow simulation data. With this data, we benchmark a total of 49 variations of five deep learning approaches for 3D super-resolution - which can be applied for improving scientific imaging, simulations, turbulence models, as well as in computer vision applications. We perform neural scaling analysis on these models to examine the performance of different machine learning (ML) approaches, including two scientific ML techniques. We demonstrate that (i) predictive performance can scale with model size and cost, (ii) architecture matters significantly, especially for smaller models, and (iii) the benefits of physics-based losses can persist with increasing model size. The outcomes of this benchmark study are anticipated to offer insights that can aid the design of 3D super-resolution models, especially for turbulence models, while this data is expected to foster ML methods for a broad range of flow physics applications. This data is publicly available with download links and browsing tools consolidated at https://blastnet.github.io.
Adaptation of the super resolution SOTA for Art Restoration in camera capture images
- Authors: Authors: Sandeep Nagar
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- Arxiv link: https://arxiv.org/abs/2309.13655
- Pdf link: https://arxiv.org/pdf/2309.13655
- Abstract Preserving cultural heritage is of paramount importance. In the domain of art restoration, developing a computer vision model capable of effectively restoring deteriorated images of art pieces was difficult, but now we have a good computer vision state-of-art. Traditional restoration methods are often time-consuming and require extensive expertise. The aim of this work is to design an automated solution based on computer vision models that can enhance and reconstruct degraded artworks, improving their visual quality while preserving their original characteristics and artifacts. The model should handle a diverse range of deterioration types, including but not limited to noise, blur, scratches, fading, and other common forms of degradation. We adapt the current state-of-art for the image super-resolution based on the Diffusion Model (DM) and fine-tune it for Image art restoration. Our results show that instead of fine-tunning multiple different models for different kinds of degradation, fine-tuning one super-resolution, We train it on multiple datasets to make it robust. code link: https://github.com/Naagar/art_restoration_DM
A Lightweight Recurrent Grouping Attention Network for Video Super-Resolution
- Authors: Authors: Yonggui Zhu, Guofang Li
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2309.13940
- Pdf link: https://arxiv.org/pdf/2309.13940
- Abstract Effective aggregation of temporal information of consecutive frames is the core of achieving video super-resolution. Many scholars have utilized structures such as sliding windows and recurrent to gather spatio-temporal information of frames. However, although the performance of the constructed VSR models is improving, the size of the models is also increasing, exacerbating the demand on the equipment. Thus, to reduce the stress on the device, we propose a novel lightweight recurrent grouping attention network. The parameters of this model are only 0.878M, which is much lower than the current mainstream model for studying video super-resolution. We design forward feature extraction module and backward feature extraction module to collect temporal information between consecutive frames from two directions. Moreover, a new grouping mechanism is proposed to efficiently collect spatio-temporal information of the reference frame and its neighboring frames. The attention supplementation module is presented to further enhance the information gathering range of the model. The feature reconstruction module aims to aggregate information from different directions to reconstruct high-resolution features. Experiments demonstrate that our model achieves state-of-the-art performance on multiple datasets.
Data Upcycling Knowledge Distillation for Image Super-Resolution
- Authors: Authors: Yun Zhang, Wei Li, Simiao Li, Jie Hu, Hanting Chen, Hailing Wang, Zhijun Tu, Wenjia Wang, Bingyi Jing, Yunhe Wang
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.14162
- Pdf link: https://arxiv.org/pdf/2309.14162
- Abstract Knowledge distillation (KD) emerges as a challenging yet promising technique for compressing deep learning models, characterized by the transmission of extensive learning representations from proficient and computationally intensive teacher models to compact student models. However, only a handful of studies have endeavored to compress the models for single image super-resolution (SISR) through KD, with their effects on student model enhancement remaining marginal. In this paper, we put forth an approach from the perspective of efficient data utilization, namely, the Data Upcycling Knowledge Distillation (DUKD) which facilitates the student model by the prior knowledge teacher provided via upcycled in-domain data derived from their inputs. This upcycling process is realized through two efficient image zooming operations and invertible data augmentations which introduce the label consistency regularization to the field of KD for SISR and substantially boosts student model's generalization. The DUKD, due to its versatility, can be applied across a broad spectrum of teacher-student architectures. Comprehensive experiments across diverse benchmarks demonstrate that our proposed DUKD method significantly outperforms previous art, exemplified by an increase of up to 0.5dB in PSNR over baselines methods, and a 67% parameters reduced RCAN model's performance remaining on par with that of the RCAN teacher model.