arxiv-updates
arxiv-updates copied to clipboard
New submissions for Thu, 28 Sep 23
Keyword: sgd
There is no result
Keyword: optimization
A Review on AI Algorithms for Energy Management in E-Mobility Services
- Authors: Authors: Sen Yan, Maqsood Hussain Shah, Ji Li, Noel O'Connor, Mingming Liu
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2309.15140
- Pdf link: https://arxiv.org/pdf/2309.15140
- Abstract E-mobility, or electric mobility, has emerged as a pivotal solution to address pressing environmental and sustainability concerns in the transportation sector. The depletion of fossil fuels, escalating greenhouse gas emissions, and the imperative to combat climate change underscore the significance of transitioning to electric vehicles (EVs). This paper seeks to explore the potential of artificial intelligence (AI) in addressing various challenges related to effective energy management in e-mobility systems (EMS). These challenges encompass critical factors such as range anxiety, charge rate optimization, and the longevity of energy storage in EVs. By analyzing existing literature, we delve into the role that AI can play in tackling these challenges and enabling efficient energy management in EMS. Our objectives are twofold: to provide an overview of the current state-of-the-art in this research domain and propose effective avenues for future investigations. Through this analysis, we aim to contribute to the advancement of sustainable and efficient e-mobility solutions, shaping a greener and more sustainable future for transportation.
Learning Optimal Trajectories for Quadrotors
- Authors: Authors: Yuwei Wu, Xiatao Sun, Igor Spasojevic, Vijay Kumar
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.15191
- Pdf link: https://arxiv.org/pdf/2309.15191
- Abstract This paper presents a novel learning-based trajectory planning framework for quadrotors that combines model-based optimization techniques with deep learning. Specifically, we formulate the trajectory optimization problem as a quadratic programming (QP) problem with dynamic and collision-free constraints using piecewise trajectory segments through safe flight corridors [1]. We train neural networks to directly learn the time allocation for each segment to generate optimal smooth and fast trajectories. Furthermore, the constrained optimization problem is applied as a separate implicit layer for back-propagating in the network, for which the differential loss function can be obtained. We introduce an additional penalty function to penalize time allocations which result in solutions that violate the constraints to accelerate the training process and increase the success rate of the original optimization problem. To this end, we enable a flexible number of sequences of piece-wise trajectories by adding an extra end-of-sentence token during training. We illustrate the performance of the proposed method via extensive simulation and experimentation and show that it works in real time in diverse, cluttered environments.
Finding Biomechanically Safe Trajectories for Robot Manipulation of the Human Body in a Search and Rescue Scenario
- Authors: Authors: Elizabeth Peiros, Zih-Yun Chiu, Yuheng Zhi, Nikhil Shinde, Michael C. Yip
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.15265
- Pdf link: https://arxiv.org/pdf/2309.15265
- Abstract There has been increasing awareness of the difficulties in reaching and extracting people from mass casualty scenarios, such as those arising from natural disasters. While platforms have been designed to consider reaching casualties and even carrying them out of harm's way, the challenge of repositioning a casualty from its found configuration to one suitable for extraction has not been explicitly explored. Furthermore, this planning problem needs to incorporate biomechanical safety considerations for the casualty. Thus, we present a first solution to biomechanically safe trajectory generation for repositioning limbs of unconscious human casualties. We describe biomechanical safety as mathematical constraints, mechanical descriptions of the dynamics for the robot-human coupled system, and the planning and trajectory optimization process that considers this coupled and constrained system. We finally evaluate our approach over several variations of the problem and demonstrate it on a real robot and human subject. This work provides a crucial part of search and rescue that can be used in conjunction with past and present works involving robots and vision systems designed for search and rescue.
Joint Computing, Pushing, and Caching Optimization for Mobile Edge Computing Networks via Soft Actor-Critic Learning
- Authors: Authors: Xiangyu Gao, Yaping Sun, Hao Chen, Xiaodong Xu, Shuguang Cui
- Subjects: Information Theory (cs.IT)
- Arxiv link: https://arxiv.org/abs/2309.15369
- Pdf link: https://arxiv.org/pdf/2309.15369
- Abstract Mobile edge computing (MEC) networks bring computing and storage capabilities closer to edge devices, which reduces latency and improves network performance. However, to further reduce transmission and computation costs while satisfying user-perceived quality of experience, a joint optimization in computing, pushing, and caching is needed. In this paper, we formulate the joint-design problem in MEC networks as an infinite-horizon discounted-cost Markov decision process and solve it using a deep reinforcement learning (DRL)-based framework that enables the dynamic orchestration of computing, pushing, and caching. Through the deep networks embedded in the DRL structure, our framework can implicitly predict user future requests and push or cache the appropriate content to effectively enhance system performance. One issue we encountered when considering three functions collectively is the curse of dimensionality for the action space. To address it, we relaxed the discrete action space into a continuous space and then adopted soft actor-critic learning to solve the optimization problem, followed by utilizing a vector quantization method to obtain the desired discrete action. Additionally, an action correction method was proposed to compress the action space further and accelerate the convergence. Our simulations under the setting of a general single-user, single-server MEC network with dynamic transmission link quality demonstrate that the proposed framework effectively decreases transmission bandwidth and computing cost by proactively pushing data on future demand to users and jointly optimizing the three functions. We also conduct extensive parameter tuning analysis, which shows that our approach outperforms the baselines under various parameter settings.
Intelligent trading strategy based on improved directional change and regime change detection
- Authors: Authors: Bing Wu, Xiangzu Han
- Subjects: Computational Engineering, Finance, and Science (cs.CE)
- Arxiv link: https://arxiv.org/abs/2309.15383
- Pdf link: https://arxiv.org/pdf/2309.15383
- Abstract Previous research primarily characterized price movements according to time intervals, resulting in temporal discontinuity and overlooking crucial activities in financial markets. Directional Change (DC) is an alternative approach to sampling price data, highlighting significant points while blurring out noise details in price movements. However, traditional DC treated the thresholds of upward and downward trends with distinct intrinsic patterns as equivalent and preset them as fixed values, which are dependent on the subjective judgment of traders. To enhance the generalization performance of this methodology, we improved DC by introducing a modified threshold selection technique. Specifically, we addressed upward and downward trends distinctly by incorporating a decay coefficient. Further, we simultaneously optimized the threshold and decay coefficient using the Bayesian Optimization Algorithm (BOA). Additionally, we recognized the abnormal market state by regime change detection based on the Hidden Markov Model (RCD-HMM) to reduce the risk. Our Intelligent Trading Algorithm (ITA) was constructed based on above methods and the experiments were carried out on tick data from diverse currency pairs in the forex market. The experimental results showed a significant increase in profit and reduction in risk of DC-based trading strategies, which demonstrated the effectiveness of our proposed methods.
The Triad of Failure Modes and a Possible Way Out
- Authors: Authors: Emanuele Sansone
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2309.15420
- Pdf link: https://arxiv.org/pdf/2309.15420
- Abstract We present a novel objective function for cluster-based self-supervised learning (SSL) that is designed to circumvent the triad of failure modes, namely representation collapse, cluster collapse, and the problem of invariance to permutations of cluster assignments. This objective consists of three key components: (i) A generative term that penalizes representation collapse, (ii) a term that promotes invariance to data augmentations, thereby addressing the issue of label permutations and (ii) a uniformity term that penalizes cluster collapse. Additionally, our proposed objective possesses two notable advantages. Firstly, it can be interpreted from a Bayesian perspective as a lower bound on the data log-likelihood. Secondly, it enables the training of a standard backbone architecture without the need for asymmetric elements like stop gradients, momentum encoders, or specialized clustering layers. Due to its simplicity and theoretical foundation, our proposed objective is well-suited for optimization. Experiments on both toy and real world data demonstrate its effectiveness
Evaluation of Constrained Reinforcement Learning Algorithms for Legged Locomotion
- Authors: Authors: Joonho Lee, Lukas Schroth, Victor Klemm, Marko Bjelonic, Alexander Reske, Marco Hutter
- Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2309.15430
- Pdf link: https://arxiv.org/pdf/2309.15430
- Abstract Shifting from traditional control strategies to Deep Reinforcement Learning (RL) for legged robots poses inherent challenges, especially when addressing real-world physical constraints during training. While high-fidelity simulations provide significant benefits, they often bypass these essential physical limitations. In this paper, we experiment with the Constrained Markov Decision Process (CMDP) framework instead of the conventional unconstrained RL for robotic applications. We perform a comparative study of different constrained policy optimization algorithms to identify suitable methods for practical implementation. Our robot experiments demonstrate the critical role of incorporating physical constraints, yielding successful sim-to-real transfers, and reducing operational errors on physical systems. The CMDP formulation streamlines the training process by separately handling constraints from rewards. Our findings underscore the potential of constrained RL for the effective development and deployment of learned controllers in robotics.
In-Hand Re-grasp Manipulation with Passive Dynamic Actions via Imitation Learning
- Authors: Authors: Dehao Wei, Guokang Sun, Zeyu Ren, Shuang Li, Zhufeng Shao, Xiang Li, Nikos Tsagarakis, Shaohua Ma
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.15455
- Pdf link: https://arxiv.org/pdf/2309.15455
- Abstract Re-grasp manipulation leverages on ergonomic tools to assist humans in accomplishing diverse tasks. In certain scenarios, humans often employ external forces to effortlessly and precisely re-grasp tools like a hammer. Previous development on controllers for in-grasp sliding motion using passive dynamic actions (e.g.,gravity) relies on apprehension of finger-object contact information, and requires customized design for individual objects with varied geometry and weight distribution. It limits their adaptability to diverse objects. In this paper, we propose an end-to-end sliding motion controller based on imitation learning (IL) that necessitates minimal prior knowledge of object mechanics, relying solely on object position information. To expedite training convergence, we utilize a data glove to collect expert data trajectories and train the policy through Generative Adversarial Imitation Learning (GAIL). Simulation results demonstrate the controller's versatility in performing in-hand sliding tasks with objects of varying friction coefficients, geometric shapes, and masses. By migrating to a physical system using visual position estimation, the controller demonstrated an average success rate of 86%, surpassing the baseline algorithm's success rate of 35% of Behavior Cloning(BC) and 20% of Proximal Policy Optimization (PPO).
DTC: Deep Tracking Control -- A Unifying Approach to Model-Based Planning and Reinforcement-Learning for Versatile and Robust Locomotion
- Authors: Authors: Fabian Jenelten, Junzhe He, Farbod Farshidian, Marco Hutter
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.15462
- Pdf link: https://arxiv.org/pdf/2309.15462
- Abstract Legged locomotion is a complex control problem that requires both accuracy and robustness to cope with real-world challenges. Legged systems have traditionally been controlled using trajectory optimization with inverse dynamics. Such hierarchical model-based methods are appealing due to intuitive cost function tuning, accurate planning, and most importantly, the insightful understanding gained from more than one decade of extensive research. However, model mismatch and violation of assumptions are common sources of faulty operation and may hinder successful sim-to-real transfer. Simulation-based reinforcement learning, on the other hand, results in locomotion policies with unprecedented robustness and recovery skills. Yet, all learning algorithms struggle with sparse rewards emerging from environments where valid footholds are rare, such as gaps or stepping stones. In this work, we propose a hybrid control architecture that combines the advantages of both worlds to simultaneously achieve greater robustness, foot-placement accuracy, and terrain generalization. Our approach utilizes a model-based planner to roll out a reference motion during training. A deep neural network policy is trained in simulation, aiming to track the optimized footholds. We evaluate the accuracy of our locomotion pipeline on sparse terrains, where pure data-driven methods are prone to fail. Furthermore, we demonstrate superior robustness in the presence of slippery or deformable ground when compared to model-based counterparts. Finally, we show that our proposed tracking controller generalizes across different trajectory optimization methods not seen during training. In conclusion, our work unites the predictive capabilities and optimality guarantees of online planning with the inherent robustness attributed to offline learning.
Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey
- Authors: Authors: Sicong Liu, Bin Guo, Cheng Fang, Ziqi Wang, Shiyan Luo, Zimu Zhou, Zhiwen Yu
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
- Arxiv link: https://arxiv.org/abs/2309.15467
- Pdf link: https://arxiv.org/pdf/2309.15467
- Abstract The emerging field of artificial intelligence of things (AIoT, AI+IoT) is driven by the widespread use of intelligent infrastructures and the impressive success of deep learning (DL). With the deployment of DL on various intelligent infrastructures featuring rich sensors and weak DL computing capabilities, a diverse range of AIoT applications has become possible. However, DL models are notoriously resource-intensive. Existing research strives to realize near-/realtime inference of AIoT live data and low-cost training using AIoT datasets on resource-scare infrastructures. Accordingly, the accuracy and responsiveness of DL models are bounded by resource availability. To this end, the algorithm-system co-design that jointly optimizes the resource-friendly DL models and model-adaptive system scheduling improves the runtime resource availability and thus pushes the performance boundary set by the standalone level. Unlike previous surveys on resource-friendly DL models or hand-crafted DL compilers/frameworks with partially fine-tuned components, this survey aims to provide a broader optimization space for more free resource-performance tradeoffs. The cross-level optimization landscape involves various granularity, including the DL model, computation graph, operator, memory schedule, and hardware instructor in both on-device and distributed paradigms. Furthermore, due to the dynamic nature of AIoT context, which includes heterogeneous hardware, agnostic sensing data, varying user-specified performance demands, and resource constraints, this survey explores the context-aware inter-/intra-device controllers for automatic cross-level adaptation. Additionally, we identify some potential directions for resource-efficient AIoT systems. By consolidating problems and techniques scattered over diverse levels, we aim to help readers understand their connections and stimulate further discussions.
Towards Human-Like RL: Taming Non-Naturalistic Behavior in Deep RL via Adaptive Behavioral Costs in 3D Games
- Authors: Authors: Kuo-Hao Ho, Ping-Chun Hsieh, Chiu-Chou Lin, You-Ren Luo, Feng-Jian Wang, I-Chen Wu
- Subjects: Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.15484
- Pdf link: https://arxiv.org/pdf/2309.15484
- Abstract In this paper, we propose a new approach called Adaptive Behavioral Costs in Reinforcement Learning (ABC-RL) for training a human-like agent with competitive strength. While deep reinforcement learning agents have recently achieved superhuman performance in various video games, some of these unconstrained agents may exhibit actions, such as shaking and spinning, that are not typically observed in human behavior, resulting in peculiar gameplay experiences. To behave like humans and retain similar performance, ABC-RL augments behavioral limitations as cost signals in reinforcement learning with dynamically adjusted weights. Unlike traditional constrained policy optimization, we propose a new formulation that minimizes the behavioral costs subject to a constraint of the value function. By leveraging the augmented Lagrangian, our approach is an approximation of the Lagrangian adjustment, which handles the trade-off between the performance and the human-like behavior. Through experiments conducted on 3D games in DMLab-30 and Unity ML-Agents Toolkit, we demonstrate that ABC-RL achieves the same performance level while significantly reducing instances of shaking and spinning. These findings underscore the effectiveness of our proposed approach in promoting more natural and human-like behavior during gameplay.
Residual Scheduling: A New Reinforcement Learning Approach to Solving Job Shop Scheduling Problem
- Authors: Authors: Kuo-Hao Ho, Ruei-Yu Jheng, Ji-Han Wu, Fan Chiang, Yen-Chi Chen, Yuan-Yu Wu, I-Chen Wu
- Subjects: Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.15517
- Pdf link: https://arxiv.org/pdf/2309.15517
- Abstract Job-shop scheduling problem (JSP) is a mathematical optimization problem widely used in industries like manufacturing, and flexible JSP (FJSP) is also a common variant. Since they are NP-hard, it is intractable to find the optimal solution for all cases within reasonable times. Thus, it becomes important to develop efficient heuristics to solve JSP/FJSP. A kind of method of solving scheduling problems is construction heuristics, which constructs scheduling solutions via heuristics. Recently, many methods for construction heuristics leverage deep reinforcement learning (DRL) with graph neural networks (GNN). In this paper, we propose a new approach, named residual scheduling, to solving JSP/FJSP. In this new approach, we remove irrelevant machines and jobs such as those finished, such that the states include the remaining (or relevant) machines and jobs only. Our experiments show that our approach reaches state-of-the-art (SOTA) among all known construction heuristics on most well-known open JSP and FJSP benchmarks. In addition, we also observe that even though our model is trained for scheduling problems of smaller sizes, our method still performs well for scheduling problems of large sizes. Interestingly in our experiments, our approach even reaches zero gap for 49 among 50 JSP instances whose job numbers are more than 150 on 20 machines.
Raijū: Reinforcement Learning-Guided Post-Exploitation for Automating Security Assessment of Network Systems
- Authors: Authors: Van-Hau Pham, Hien Do Hoang, Phan Thanh Trung, Van Dinh Quoc, Trong-Nghia To, Phan The Duy
- Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.15518
- Pdf link: https://arxiv.org/pdf/2309.15518
- Abstract In order to assess the risks of a network system, it is important to investigate the behaviors of attackers after successful exploitation, which is called post-exploitation. Although there are various efficient tools supporting post-exploitation implementation, no application can automate this process. Most of the steps of this process are completed by experts who have profound knowledge of security, known as penetration testers or pen-testers. To this end, our study proposes the Raij=u framework, a Reinforcement Learning (RL)-driven automation approach that assists pen-testers in quickly implementing the process of post-exploitation for security-level evaluation in network systems. We implement two RL algorithms, Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO), to train specialized agents capable of making intelligent actions, which are Metasploit modules to automatically launch attacks of privileges escalation, gathering hashdump, and lateral movement. By leveraging RL, we aim to empower these agents with the ability to autonomously select and execute actions that can exploit vulnerabilities in target systems. This approach allows us to automate certain aspects of the penetration testing workflow, making it more efficient and responsive to emerging threats and vulnerabilities. The experiments are performed in four real environments with agents trained in thousands of episodes. The agents automatically select actions and launch attacks on the environments and achieve over 84% of successful attacks with under 55 attack steps given. Moreover, the A2C algorithm has proved extremely effective in the selection of proper actions for automation of post-exploitation.
Learning Spatial-Temporal Regularized Tensor Sparse RPCA for Background Subtraction
- Authors: Authors: Basit Alawode, Sajid Javed
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2309.15576
- Pdf link: https://arxiv.org/pdf/2309.15576
- Abstract Video background subtraction is one of the fundamental problems in computer vision that aims to segment all moving objects. Robust principal component analysis has been identified as a promising unsupervised paradigm for background subtraction tasks in the last decade thanks to its competitive performance in a number of benchmark datasets. Tensor robust principal component analysis variations have improved background subtraction performance further. However, because moving object pixels in the sparse component are treated independently and do not have to adhere to spatial-temporal structured-sparsity constraints, performance is reduced for sequences with dynamic backgrounds, camouflaged, and camera jitter problems. In this work, we present a spatial-temporal regularized tensor sparse RPCA algorithm for precise background subtraction. Within the sparse component, we impose spatial-temporal regularizations in the form of normalized graph-Laplacian matrices. To do this, we build two graphs, one across the input tensor spatial locations and the other across its frontal slices in the time domain. While maximizing the objective function, we compel the tensor sparse component to serve as the spatiotemporal eigenvectors of the graph-Laplacian matrices. The disconnected moving object pixels in the sparse component are preserved by the proposed graph-based regularizations since they both comprise of spatiotemporal subspace-based structure. Additionally, we propose a unique objective function that employs batch and online-based optimization methods to jointly maximize the background-foreground and spatial-temporal regularization components. Experiments are performed on six publicly available background subtraction datasets that demonstrate the superior performance of the proposed algorithm compared to several existing methods. Our source code will be available very soon.
Perception for Humanoid Robots
- Authors: Authors: Arindam Roychoudhury, Shahram Khorshidi, Subham Agrawal, Maren Bennewitz
- Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2309.15616
- Pdf link: https://arxiv.org/pdf/2309.15616
- Abstract Purpose of Review: The field of humanoid robotics, perception plays a fundamental role in enabling robots to interact seamlessly with humans and their surroundings, leading to improved safety, efficiency, and user experience. This scientific study investigates various perception modalities and techniques employed in humanoid robots, including visual, auditory, and tactile sensing by exploring recent state-of-the-art approaches for perceiving and understanding the internal state, the environment, objects, and human activities. Recent Findings: Internal state estimation makes extensive use of Bayesian filtering methods and optimization techniques based on maximum a-posteriori formulation by utilizing proprioceptive sensing. In the area of external environment understanding, with an emphasis on robustness and adaptability to dynamic, unforeseen environmental changes, the new slew of research discussed in this study have focused largely on multi-sensor fusion and machine learning in contrast to the use of hand-crafted, rule-based systems. Human robot interaction methods have established the importance of contextual information representation and memory for understanding human intentions. Summary: This review summarizes the recent developments and trends in the field of perception in humanoid robots. Three main areas of application are identified, namely, internal state estimation, external environment estimation, and human robot interaction. The applications of diverse sensor modalities in each of these areas are considered and recent significant works are discussed.
A City-centric Approach to Estimate and Evaluate Global Urban Air Mobility Demand
- Authors: Authors: Lukas Asmer, Roman Jaksche, Henry Pak, Petra Kokus
- Subjects: Systems and Control (eess.SY); Physics and Society (physics.soc-ph)
- Arxiv link: https://arxiv.org/abs/2309.15621
- Pdf link: https://arxiv.org/pdf/2309.15621
- Abstract Urban Air Mobility (UAM) is expected to effectively complement the existing transportation system by providing fast and safe travel options, contributing to decarbonization, and providing benefits to citizens and communities. A preliminary estimate of the potential global demand for UAM, the associated aircraft movements, and the required vehicles is essential for the UAM industry for their long-term planning, but also of interest to other stakeholders such as governments and transportation planners to develop appropriate strategies and actions to implement UAM. This paper proposes a city-centric forecasting methodology that provides preliminary estimates of the potential global UAM demand for intra-city air taxi services for 990 cities worldwide. By summing all city-specific results, an estimate of the global UAM demand is obtained. By varying the parameters of the UAM system, sensitivity studies and different market scenarios are developed and analyzed. Sensitivity analyses show how strongly demand decreases when air taxi ticket prices increase. Considering low ticket prices and high vertiport densities, possible market development scenarios show that there is a market potential for UAM in over 200 cities worldwide by 2050. The study highlights the significant impact of low ticket prices and the need for high vertiport densities to drive UAM demand. This highlights the need for careful optimization of system components to minimize costs and increase the quality of UAM services.
Design and Optimization of Residual Neural Network Accelerators for Low-Power FPGAs Using High-Level Synthesis
- Authors: Authors: Filippo Minnella, Teodoro Urso, Mihai T. Lazarescu, Luciano Lavagno
- Subjects: Hardware Architecture (cs.AR); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2309.15631
- Pdf link: https://arxiv.org/pdf/2309.15631
- Abstract Residual neural networks are widely used in computer vision tasks. They enable the construction of deeper and more accurate models by mitigating the vanishing gradient problem. Their main innovation is the residual block which allows the output of one layer to bypass one or more intermediate layers and be added to the output of a later layer. Their complex structure and the buffering required by the residual block make them difficult to implement on resource-constrained platforms. We present a novel design flow for implementing deep learning models for field programmable gate arrays optimized for ResNets, using a strategy to reduce their buffering overhead to obtain a resource-efficient implementation of the residual layer. Our high-level synthesis (HLS)-based flow encompasses a thorough set of design principles and optimization strategies, exploiting in novel ways standard techniques such as temporal reuse and loop merging to efficiently map ResNet models, and potentially other skip connection-based NN architectures, into FPGA. The models are quantized to 8-bit integers for both weights and activations, 16-bit for biases, and 32-bit for accumulations. The experimental results are obtained on the CIFAR-10 dataset using ResNet8 and ResNet20 implemented with Xilinx FPGAs using HLS on the Ultra96-V2 and Kria KV260 boards. Compared to the state-of-the-art on the Kria KV260 board, our ResNet20 implementation achieves 2.88X speedup with 0.5% higher accuracy of 91.3%, while ResNet8 accuracy improves by 2.8% to 88.7%. The throughputs of ResNet8 and ResNet20 are 12971 FPS and 3254 FPS on the Ultra96 board, and 30153 FPS and 7601 FPS on the Kria KV26, respectively. They Pareto-dominate state-of-the-art solutions concerning accuracy, throughput, and energy.
Federated Deep Equilibrium Learning: A Compact Shared Representation for Edge Communication Efficiency
- Authors: Authors: Long Tan Le, Tuan Dung Nguyen, Tung-Anh Nguyen, Choong Seon Hong, Nguyen H. Tran
- Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
- Arxiv link: https://arxiv.org/abs/2309.15659
- Pdf link: https://arxiv.org/pdf/2309.15659
- Abstract Federated Learning (FL) is a prominent distributed learning paradigm facilitating collaboration among nodes within an edge network to co-train a global model without centralizing data. By shifting computation to the network edge, FL offers robust and responsive edge-AI solutions and enhance privacy-preservation. However, deploying deep FL models within edge environments is often hindered by communication bottlenecks, data heterogeneity, and memory limitations. To address these challenges jointly, we introduce FeDEQ, a pioneering FL framework that effectively employs deep equilibrium learning and consensus optimization to exploit a compact shared data representation across edge nodes, allowing the derivation of personalized models specific to each node. We delve into a unique model structure composed of an equilibrium layer followed by traditional neural network layers. Here, the equilibrium layer functions as a global feature representation that edge nodes can adapt to personalize their local layers. Capitalizing on FeDEQ's compactness and representation power, we present a novel distributed algorithm rooted in the alternating direction method of multipliers (ADMM) consensus optimization and theoretically establish its convergence for smooth objectives. Experiments across various benchmarks demonstrate that FeDEQ achieves performance comparable to state-of-the-art personalized methods while employing models of up to 4 times smaller in communication size and 1.5 times lower memory footprint during training.
End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning
- Authors: Authors: Wujun Wen, Jinrong Zhang, Shenglan Liu, Yunheng Li, Qifeng Li, Lin Feng
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2309.15683
- Pdf link: https://arxiv.org/pdf/2309.15683
- Abstract Temporal Action Segmentation (TAS) from video is a kind of frame recognition task for long video with multiple action classes. As an video understanding task for long videos, current methods typically combine multi-modality action recognition models with temporal models to convert feature sequences to label sequences. This approach can only be applied to offline scenarios, which severely limits the TAS application. Therefore, this paper proposes an end-to-end Streaming Video Temporal Action Segmentation with Reinforce Learning (SVTAS-RL). The end-to-end SVTAS which regard TAS as an action segment clustering task can expand the application scenarios of TAS; and RL is used to alleviate the problem of inconsistent optimization objective and direction. Through extensive experiments, the SVTAS-RL model achieves a competitive performance to the state-of-the-art model of TAS on multiple datasets, and shows greater advantages on the ultra-long video dataset EGTEA. This indicates that our method can replace all current TAS models end-to-end and SVTAS-RL is more suitable for long video TAS. Code is availabel at https://github.com/Thinksky5124/SVTAS.
Physics Inspired Hybrid Attention for SAR Target Recognition
- Authors: Authors: Zhongling Huang, Chong Wu, Xiwen Yao, Zhicheng Zhao, Xiankai Huang, Junwei Han
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- Arxiv link: https://arxiv.org/abs/2309.15697
- Pdf link: https://arxiv.org/pdf/2309.15697
- Abstract There has been a recent emphasis on integrating physical models and deep neural networks (DNNs) for SAR target recognition, to improve performance and achieve a higher level of physical interpretability. The attributed scattering center (ASC) parameters garnered the most interest, being considered as additional input data or features for fusion in most methods. However, the performance greatly depends on the ASC optimization result, and the fusion strategy is not adaptable to different types of physical information. Meanwhile, the current evaluation scheme is inadequate to assess the model's robustness and generalizability. Thus, we propose a physics inspired hybrid attention (PIHA) mechanism and the once-for-all (OFA) evaluation protocol to address the above issues. PIHA leverages the high-level semantics of physical information to activate and guide the feature group aware of local semantics of target, so as to re-weight the feature importance based on knowledge prior. It is flexible and generally applicable to various physical models, and can be integrated into arbitrary DNNs without modifying the original architecture. The experiments involve a rigorous assessment using the proposed OFA, which entails training and validating a model on either sufficient or limited data and evaluating on multiple test sets with different data distributions. Our method outperforms other state-of-the-art approaches in 12 test scenarios with same ASC parameters. Moreover, we analyze the working mechanism of PIHA and evaluate various PIHA enabled DNNs. The experiments also show PIHA is effective for different physical information. The source code together with the adopted physical information is available at https://github.com/XAI4SAR.
Maximum Weight Entropy
- Authors: Authors: Antoine de Mathelin, François Deheeger, Mathilde Mougeot, Nicolas Vayatis
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.15704
- Pdf link: https://arxiv.org/pdf/2309.15704
- Abstract This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods. It proposes a practical solution to the lack of prediction diversity observed recently for standard approaches when used out-of-distribution (Ovadia et al., 2019; Liu et al., 2021). Considering that this issue is mainly related to a lack of weight diversity, we claim that standard methods sample in "over-restricted" regions of the weight space due to the use of "over-regularization" processes, such as weight decay and zero-mean centered Gaussian priors. We propose to solve the problem by adopting the maximum entropy principle for the weight distribution, with the underlying idea to maximize the weight diversity. Under this paradigm, the epistemic uncertainty is described by the weight distribution of maximal entropy that produces neural networks "consistent" with the training observations. Considering stochastic neural networks, a practical optimization is derived to build such a distribution, defined as a trade-off between the average empirical risk and the weight distribution entropy. We develop a novel weight parameterization for the stochastic model, based on the singular value decomposition of the neural network's hidden representations, which enables a large increase of the weight entropy for a small empirical risk penalization. We provide both theoretical and numerical results to assess the efficiency of the approach. In particular, the proposed algorithm appears in the top three best methods in all configurations of an extensive out-of-distribution detection benchmark including more than thirty competitors.
Temporal graph models fail to capture global temporal dynamics
- Authors: Authors: Michał Daniluk, Jacek Dąbrowski
- Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.15730
- Pdf link: https://arxiv.org/pdf/2309.15730
- Abstract A recently released Temporal Graph Benchmark is analyzed in the context of Dynamic Link Property Prediction. We outline our observations and propose a trivial optimization-free baseline of "recently popular nodes" outperforming other methods on all medium and large-size datasets in the Temporal Graph Benchmark. We propose two measures based on Wasserstein distance which can quantify the strength of short-term and long-term global dynamics of datasets. By analyzing our unexpectedly strong baseline, we show how standard negative sampling evaluation can be unsuitable for datasets with strong temporal dynamics. We also show how simple negative-sampling can lead to model degeneration during training, resulting in impossible to rank, fully saturated predictions of temporal graph networks. We propose improved negative sampling schemes for both training and evaluation and prove their usefulness. We conduct a comparison with a model trained non-contrastively without negative sampling. Our results provide a challenging baseline and indicate that temporal graph network architectures need deep rethinking for usage in problems with significant global dynamics, such as social media, cryptocurrency markets or e-commerce. We open-source the code for baselines, measures and proposed negative sampling schemes.
Development of a Whole-body Work Imitation Learning System by a Biped and Bi-armed Humanoid
- Authors: Authors: Yutaro Matsuura, Kento Kawaharazuka, Naoki Hiraoka, Kunio Kojima, Kei Okada, Masayuki Inaba
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.15756
- Pdf link: https://arxiv.org/pdf/2309.15756
- Abstract Imitation learning has been actively studied in recent years. In particular, skill acquisition by a robot with a fixed body, whose root link position and posture and camera angle of view do not change, has been realized in many cases. On the other hand, imitation of the behavior of robots with floating links, such as humanoid robots, is still a difficult task. In this study, we develop an imitation learning system using a biped robot with a floating link. There are two main problems in developing such a system. The first is a teleoperation device for humanoids, and the second is a control system that can withstand heavy workloads and long-term data collection. For the first point, we use the whole body control device TABLIS. It can control not only the arms but also the legs and can perform bilateral control with the robot. By connecting this TABLIS with the high-power humanoid robot JAXON, we construct a control system for imitation learning. For the second point, we will build a system that can collect long-term data based on posture optimization, and can simultaneously move the robot's limbs. We combine high-cycle posture generation with posture optimization methods, including whole-body joint torque minimization and contact force optimization. We designed an integrated system with the above two features to achieve various tasks through imitation learning. Finally, we demonstrate the effectiveness of this system by experiments of manipulating flexible fabrics such that not only the hands but also the head and waist move simultaneously, manipulating objects using legs characteristic of humanoids, and lifting heavy objects that require large forces.
Generating Transferable Adversarial Simulation Scenarios for Self-Driving via Neural Rendering
- Authors: Authors: Yasasa Abeysirigoonawardena, Kevin Xie, Chuhan Chen, Salar Hosseini, Ruiting Chen, Ruiqi Wang, Florian Shkurti
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2309.15770
- Pdf link: https://arxiv.org/pdf/2309.15770
- Abstract Self-driving software pipelines include components that are learned from a significant number of training examples, yet it remains challenging to evaluate the overall system's safety and generalization performance. Together with scaling up the real-world deployment of autonomous vehicles, it is of critical importance to automatically find simulation scenarios where the driving policies will fail. We propose a method that efficiently generates adversarial simulation scenarios for autonomous driving by solving an optimal control problem that aims to maximally perturb the policy from its nominal trajectory. Given an image-based driving policy, we show that we can inject new objects in a neural rendering representation of the deployment scene, and optimize their texture in order to generate adversarial sensor inputs to the policy. We demonstrate that adversarial scenarios discovered purely in the neural renderer (surrogate scene) can often be successfully transferred to the deployment scene, without further optimization. We demonstrate this transfer occurs both in simulated and real environments, provided the learned surrogate scene is sufficiently close to the deployment scene.
Importance-Weighted Offline Learning Done Right
- Authors: Authors: Germano Gabbianelli, Gergely Neu, Matteo Papini
- Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2309.15771
- Pdf link: https://arxiv.org/pdf/2309.15771
- Abstract We study the problem of offline policy optimization in stochastic contextual bandit problems, where the goal is to learn a near-optimal policy based on a dataset of decision data collected by a suboptimal behavior policy. Rather than making any structural assumptions on the reward function, we assume access to a given policy class and aim to compete with the best comparator policy within this class. In this setting, a standard approach is to compute importance-weighted estimators of the value of each policy, and select a policy that minimizes the estimated value up to a "pessimistic" adjustment subtracted from the estimates to reduce their random fluctuations. In this paper, we show that a simple alternative approach based on the "implicit exploration" estimator of \citet{Neu2015} yields performance guarantees that are superior in nearly all possible terms to all previous results. Most notably, we remove an extremely restrictive "uniform coverage" assumption made in all previous works. These improvements are made possible by the observation that the upper and lower tails importance-weighted estimators behave very differently from each other, and their careful control can massively improve on previous results that were all based on symmetric two-sided concentration inequalities. We also extend our results to infinite policy classes in a PAC-Bayesian fashion, and showcase the robustness of our algorithm to the choice of hyper-parameters by means of numerical simulations.
Learning the Efficient Frontier
- Authors: Authors: Philippe Chatigny, Ivan Sergienko, Ryan Ferguson, Jordan Weir, Maxime Bergeron
- Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
- Arxiv link: https://arxiv.org/abs/2309.15775
- Pdf link: https://arxiv.org/pdf/2309.15775
- Abstract The efficient frontier (EF) is a fundamental resource allocation problem where one has to find an optimal portfolio maximizing a reward at a given level of risk. This optimal solution is traditionally found by solving a convex optimization problem. In this paper, we introduce NeuralEF: a fast neural approximation framework that robustly forecasts the result of the EF convex optimization problem with respect to heterogeneous linear constraints and variable number of optimization inputs. By reformulating an optimization problem as a sequence to sequence problem, we show that NeuralEF is a viable solution to accelerate large-scale simulation while handling discontinuous behavior.
Time-Domain Channel Measurements and Small-Scale Fading Characterization for RIS-Assisted Wireless Communication Systems
- Authors: Authors: Yanqing Ren, Mingyong Zhou, Xiaokun Teng, Shengguo Meng, Wankai Tang, Xiao Li, Shi Jin, Michail Matthaiou
- Subjects: Information Theory (cs.IT)
- Arxiv link: https://arxiv.org/abs/2309.15776
- Pdf link: https://arxiv.org/pdf/2309.15776
- Abstract As a potentially revolutionary enabling technology for the sixth generation (6G) mobile communication system, reconfigurable intelligent surfaces (RISs) have attracted extensive attention from industry and academia. In RIS-assisted wireless communication systems, practical channel measurements and modeling serve as the foundation for system design, network optimization, and performance evaluation. In this paper, a RIS time-domain channel measurement system, based on a software defined radio (SDR) platform, is developed for the first time to investigate the small-scale fading characteristics of RIS-assisted channels. We present RIS channel measurements in corridor and laboratory scenarios and compare the power delay profile (PDP) of the channel without RIS, with RIS specular reflection, and with RIS intelligent reflection. The multipath component parameters and cluster parameters based on the Saleh-Valenzuela model are extracted. We find that the PDPs of the RIS-assisted channel fit the power-law decay model and approximate the law of square decay. Through intelligent reflection, the RIS can decrease the delay and concentrate the energy of the virtual line-of-sight (VLOS) path, thereby reducing delay spread and mitigating multipath fading. Furthermore, the cluster characteristics of RIS-assisted channels are highly related to the measurement environment. In the laboratory scenario, a single cluster dominated by the VLOS path with smooth envelope is observed. On the other hand, in the corridor scenario, some additional clusters introduced by the RIS reflection are created.
Model-based design of temporal analysis for products (TAP) reactors: A simulated case study in oxidative propane dehydrogenation
- Authors: Authors: Adam C. Yonge, Gabriel S. Gusmão, Rebecca Fushimi, A.J. Medford
- Subjects: Computational Engineering, Finance, and Science (cs.CE)
- Arxiv link: https://arxiv.org/abs/2309.15786
- Pdf link: https://arxiv.org/pdf/2309.15786
- Abstract Temporal analysis of products (TAP) reactors enable experiments that probe numerous kinetic processes within a single set of experimental data through variations in pulse intensity, delay, or temperature. Selecting additional TAP experiments often involves arbitrary selection of reaction conditions or the use of chemical intuition. To make experiment selection in TAP more robust, we explore the efficacy of model-based design of experiments (MBDoE) for precision in TAP reactor kinetic modeling. We successfully applied this approach to a case study of synthetic oxidative propane dehydrogenation (OPDH) that involves pulses of propane and oxygen. We found that experiments identified as optimal through the MBDoE for precision generally reduce parameter uncertainties to a higher degree than alternative experiments. The performance of MBDoE for model divergence was also explored for OPDH, with the relevant active sites (catalyst structure) being unknown. An experiment that maximized the divergence between the three proposed mechanisms was identified and led to clear mechanism discrimination. However, re-optimization of kinetic parameters eliminated the ability to discriminate. The findings yield insight into the prospects and limitations of MBDoE for TAP and transient kinetic experiments.
Convolutional Networks with Oriented 1D Kernels
- Authors: Authors: Alexandre Kirchmeyer, Jia Deng
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2309.15812
- Pdf link: https://arxiv.org/pdf/2309.15812
- Abstract In computer vision, 2D convolution is arguably the most important operation performed by a ConvNet. Unsurprisingly, it has been the focus of intense software and hardware optimization and enjoys highly efficient implementations. In this work, we ask an intriguing question: can we make a ConvNet work without 2D convolutions? Surprisingly, we find that the answer is yes -- we show that a ConvNet consisting entirely of 1D convolutions can do just as well as 2D on ImageNet classification. Specifically, we find that one key ingredient to a high-performing 1D ConvNet is oriented 1D kernels: 1D kernels that are oriented not just horizontally or vertically, but also at other angles. Our experiments show that oriented 1D convolutions can not only replace 2D convolutions but also augment existing architectures with large kernels, leading to improved accuracy with minimal FLOPs increase. A key contribution of this work is a highly-optimized custom CUDA implementation of oriented 1D kernels, specialized to the depthwise convolution setting. Our benchmarks demonstrate that our custom CUDA implementation almost perfectly realizes the theoretical advantage of 1D convolution: it is faster than a native horizontal convolution for any arbitrary angle. Code is available at https://github.com/princeton-vl/Oriented1D.
Keyword: adam
Efficient Low-rank Backpropagation for Vision Transformer Adaptation
- Authors: Authors: Yuedong Yang, Hung-Yueh Chiang, Guihong Li, Diana Marculescu, Radu Marculescu
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.15275
- Pdf link: https://arxiv.org/pdf/2309.15275
- Abstract The increasing scale of vision transformers (ViT) has made the efficient fine-tuning of these large models for specific needs a significant challenge in various applications. This issue originates from the computationally demanding matrix multiplications required during the backpropagation process through linear layers in ViT. In this paper, we tackle this problem by proposing a new Low-rank BackPropagation via Walsh-Hadamard Transformation (LBP-WHT) method. Intuitively, LBP-WHT projects the gradient into a low-rank space and carries out backpropagation. This approach substantially reduces the computation needed for adapting ViT, as matrix multiplication in the low-rank space is far less resource-intensive. We conduct extensive experiments with different models (ViT, hybrid convolution-ViT model) on multiple datasets to demonstrate the effectiveness of our method. For instance, when adapting an EfficientFormer-L1 model on CIFAR100, our LBP-WHT achieves 10.4% higher accuracy than the state-of-the-art baseline, while requiring 9 MFLOPs less computation. As the first work to accelerate ViT adaptation with low-rank backpropagation, our LBP-WHT method is complementary to many prior efforts and can be combined with them for better performance.
Joint Sampling and Optimisation for Inverse Rendering
- Authors: Authors: Martin Balint, Karol Myszkowski, Hans-Peter Seidel, Gurprit Singh
- Subjects: Graphics (cs.GR); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.15676
- Pdf link: https://arxiv.org/pdf/2309.15676
- Abstract When dealing with difficult inverse problems such as inverse rendering, using Monte Carlo estimated gradients to optimise parameters can slow down convergence due to variance. Averaging many gradient samples in each iteration reduces this variance trivially. However, for problems that require thousands of optimisation iterations, the computational cost of this approach rises quickly. We derive a theoretical framework for interleaving sampling and optimisation. We update and reuse past samples with low-variance finite-difference estimators that describe the change in the estimated gradients between each iteration. By combining proportional and finite-difference samples, we continuously reduce the variance of our novel gradient meta-estimators throughout the optimisation process. We investigate how our estimator interlinks with Adam and derive a stable combination. We implement our method for inverse path tracing and demonstrate how our estimator speeds up convergence on difficult optimisation tasks.
Keyword: gradient
Efficient Low-rank Backpropagation for Vision Transformer Adaptation
- Authors: Authors: Yuedong Yang, Hung-Yueh Chiang, Guihong Li, Diana Marculescu, Radu Marculescu
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.15275
- Pdf link: https://arxiv.org/pdf/2309.15275
- Abstract The increasing scale of vision transformers (ViT) has made the efficient fine-tuning of these large models for specific needs a significant challenge in various applications. This issue originates from the computationally demanding matrix multiplications required during the backpropagation process through linear layers in ViT. In this paper, we tackle this problem by proposing a new Low-rank BackPropagation via Walsh-Hadamard Transformation (LBP-WHT) method. Intuitively, LBP-WHT projects the gradient into a low-rank space and carries out backpropagation. This approach substantially reduces the computation needed for adapting ViT, as matrix multiplication in the low-rank space is far less resource-intensive. We conduct extensive experiments with different models (ViT, hybrid convolution-ViT model) on multiple datasets to demonstrate the effectiveness of our method. For instance, when adapting an EfficientFormer-L1 model on CIFAR100, our LBP-WHT achieves 10.4% higher accuracy than the state-of-the-art baseline, while requiring 9 MFLOPs less computation. As the first work to accelerate ViT adaptation with low-rank backpropagation, our LBP-WHT method is complementary to many prior efforts and can be combined with them for better performance.
Hypergraph $p$-Laplacians, Scale Spaces, and Information Flow in Networks
- Authors: Authors: Ariane Fazeny, Daniel Tenbrinck, Martin Burger
- Subjects: Social and Information Networks (cs.SI); Combinatorics (math.CO)
- Arxiv link: https://arxiv.org/abs/2309.15419
- Pdf link: https://arxiv.org/pdf/2309.15419
- Abstract This paper models opinion formation in social networks using oriented hypergraphs and it defines gradient flows in the form of diffusion equations on oriented hypergraphs. Therefore, this paper uses the gradient, adjoint and $p$-Laplacian definitions for oriented hypergraphs, introduced in arXiv:2304.06468v1, and applies them to modelling group dynamics and information flow in social networks.
The Triad of Failure Modes and a Possible Way Out
- Authors: Authors: Emanuele Sansone
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2309.15420
- Pdf link: https://arxiv.org/pdf/2309.15420
- Abstract We present a novel objective function for cluster-based self-supervised learning (SSL) that is designed to circumvent the triad of failure modes, namely representation collapse, cluster collapse, and the problem of invariance to permutations of cluster assignments. This objective consists of three key components: (i) A generative term that penalizes representation collapse, (ii) a term that promotes invariance to data augmentations, thereby addressing the issue of label permutations and (ii) a uniformity term that penalizes cluster collapse. Additionally, our proposed objective possesses two notable advantages. Firstly, it can be interpreted from a Bayesian perspective as a lower bound on the data log-likelihood. Secondly, it enables the training of a standard backbone architecture without the need for asymmetric elements like stop gradients, momentum encoders, or specialized clustering layers. Due to its simplicity and theoretical foundation, our proposed objective is well-suited for optimization. Experiments on both toy and real world data demonstrate its effectiveness
Design and Optimization of Residual Neural Network Accelerators for Low-Power FPGAs Using High-Level Synthesis
- Authors: Authors: Filippo Minnella, Teodoro Urso, Mihai T. Lazarescu, Luciano Lavagno
- Subjects: Hardware Architecture (cs.AR); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2309.15631
- Pdf link: https://arxiv.org/pdf/2309.15631
- Abstract Residual neural networks are widely used in computer vision tasks. They enable the construction of deeper and more accurate models by mitigating the vanishing gradient problem. Their main innovation is the residual block which allows the output of one layer to bypass one or more intermediate layers and be added to the output of a later layer. Their complex structure and the buffering required by the residual block make them difficult to implement on resource-constrained platforms. We present a novel design flow for implementing deep learning models for field programmable gate arrays optimized for ResNets, using a strategy to reduce their buffering overhead to obtain a resource-efficient implementation of the residual layer. Our high-level synthesis (HLS)-based flow encompasses a thorough set of design principles and optimization strategies, exploiting in novel ways standard techniques such as temporal reuse and loop merging to efficiently map ResNet models, and potentially other skip connection-based NN architectures, into FPGA. The models are quantized to 8-bit integers for both weights and activations, 16-bit for biases, and 32-bit for accumulations. The experimental results are obtained on the CIFAR-10 dataset using ResNet8 and ResNet20 implemented with Xilinx FPGAs using HLS on the Ultra96-V2 and Kria KV260 boards. Compared to the state-of-the-art on the Kria KV260 board, our ResNet20 implementation achieves 2.88X speedup with 0.5% higher accuracy of 91.3%, while ResNet8 accuracy improves by 2.8% to 88.7%. The throughputs of ResNet8 and ResNet20 are 12971 FPS and 3254 FPS on the Ultra96 board, and 30153 FPS and 7601 FPS on the Kria KV26, respectively. They Pareto-dominate state-of-the-art solutions concerning accuracy, throughput, and energy.
Uniform Poincaré inequalities for the Discrete de Rham complex on general domains
- Authors: Authors: Daniele A. Di Pietro, Marien-Lorenzo Hanot
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2309.15667
- Pdf link: https://arxiv.org/pdf/2309.15667
- Abstract In this paper we prove Poincar'e inequalities for the Discrete de Rham (DDR) sequence on a general connected polyhedral domain $\Omega$ of $\mathbb{R}^3$. We unify the ideas behind the inequalities for all three operators in the sequence, deriving new proofs for the Poincar'e inequalities for the gradient and the divergence, and extending the available Poincar'e inequality for the curl to domains with arbitrary second Betti numbers. A key preliminary step consists in deriving "mimetic" Poincar'e inequalities giving the existence and stability of the solutions to topological balance problems useful in general discrete geometric settings. As an example of application, we study the stability of a novel DDR scheme for the magnetostatics problem on domains with general topology.
Joint Sampling and Optimisation for Inverse Rendering
- Authors: Authors: Martin Balint, Karol Myszkowski, Hans-Peter Seidel, Gurprit Singh
- Subjects: Graphics (cs.GR); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2309.15676
- Pdf link: https://arxiv.org/pdf/2309.15676
- Abstract When dealing with difficult inverse problems such as inverse rendering, using Monte Carlo estimated gradients to optimise parameters can slow down convergence due to variance. Averaging many gradient samples in each iteration reduces this variance trivially. However, for problems that require thousands of optimisation iterations, the computational cost of this approach rises quickly. We derive a theoretical framework for interleaving sampling and optimisation. We update and reuse past samples with low-variance finite-difference estimators that describe the change in the estimated gradients between each iteration. By combining proportional and finite-difference samples, we continuously reduce the variance of our novel gradient meta-estimators throughout the optimisation process. We investigate how our estimator interlinks with Adam and derive a stable combination. We implement our method for inverse path tracing and demonstrate how our estimator speeds up convergence on difficult optimisation tasks.
Recycling MMGKS for large-scale dynamic and streaming data
- Authors: Authors: Mirjeta Pasha, Eric de Sturler, Misha E. Kilmer
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2309.15759
- Pdf link: https://arxiv.org/pdf/2309.15759
- Abstract Reconstructing high-quality images with sharp edges requires the use of edge-preserving constraints in the regularized form of the inverse problem. The use of the $\ell_q$-norm on the gradient of the image is a common such constraint. For implementation purposes, the $\ell_q$-norm term is typically replaced with a sequence of $\ell_2$-norm weighted gradient terms with the weights determined from the current solution estimate. While (hybrid) Krylov subspace methods can be employed on this sequence, it would require generating a new Krylov subspace for every new two-norm regularized problem. The majorization-minimization Krylov subspace method (MM-GKS) addresses this disadvantage by combining norm reweighting with generalized Krylov subspaces (GKS). After projecting the problem using a small dimensional subspace - one that expands each iteration - the regularization parameter is selected. Basis expansion repeats until a sufficiently accurate solution is found. Unfortunately, for large-scale problems that require many expansion steps to converge, storage and the cost of repeated orthogonalizations presents overwhelming memory and computational requirements. In this paper we present a new method, recycled MM-GKS (RMM-GKS), that keeps the memory requirements bounded through recycling the solution subspace. Specifically, our method alternates between enlarging and compressing the GKS subspace, recycling directions that are deemed most important via one of our tailored compression routines. We further generalize the RMM-GKS approach to handle experiments where the data is either not all available simultaneously, or needs to be treated as such because of the extreme memory requirements. Numerical examples from dynamic photoacoustic tomography and streaming X-ray computerized tomography (CT) imaging are used to illustrate the effectiveness of the described methods.
Automated Detection of Persistent Inflammatory Biomarkers in Post-COVID-19 Patients Using Machine Learning Techniques
- Authors: Authors: Ghizal Fatima, Fadhil G. Al-Amran, Maitham G. Yousif
- Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)
- Arxiv link: https://arxiv.org/abs/2309.15838
- Pdf link: https://arxiv.org/pdf/2309.15838
- Abstract The COVID-19 pandemic has left a lasting impact on individuals, with many experiencing persistent symptoms, including inflammation, in the post-acute phase of the disease. Detecting and monitoring these inflammatory biomarkers is critical for timely intervention and improved patient outcomes. This study employs machine learning techniques to automate the identification of persistent inflammatory biomarkers in 290 post-COVID-19 patients, based on medical data collected from hospitals in Iraq. The data encompassed a wide array of clinical parameters, such as C-reactive protein and interleukin-6 levels, patient demographics, comorbidities, and treatment histories. Rigorous data preprocessing and feature selection processes were implemented to optimize the dataset for machine learning analysis. Various machine learning algorithms, including logistic regression, random forests, support vector machines, and gradient boosting, were deployed to construct predictive models. These models exhibited promising results, showcasing high accuracy and precision in the identification of patients with persistent inflammation. The findings of this study underscore the potential of machine learning in automating the detection of persistent inflammatory biomarkers in post-COVID-19 patients. These models can serve as valuable tools for healthcare providers, facilitating early diagnosis and personalized treatment strategies for individuals at risk of persistent inflammation, ultimately contributing to improved post-acute COVID-19 care and patient well-being. Keywords: COVID-19, post-COVID-19, inflammation, biomarkers, machine learning, early detection.
Keyword: super-resolution
Neural Operators for Accelerating Scientific Simulations and Design
- Authors: Authors: Kamyar Azzizadenesheli, Nikola Kovachki, Zongyi Li, Miguel Liu-Schiaffini, Jean Kossaifi, Anima Anandkumar
- Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
- Arxiv link: https://arxiv.org/abs/2309.15325
- Pdf link: https://arxiv.org/pdf/2309.15325
- Abstract Scientific discovery and engineering design are currently limited by the time and cost of physical experiments, selected mostly through trial-and-error and intuition that require deep domain expertise. Numerical simulations present an alternative to physical experiments, but are usually infeasible for complex real-world domains due to the computational requirements of existing numerical methods. Artificial intelligence (AI) presents a potential paradigm shift through the development of fast data-driven surrogate models. In particular, an AI framework, known as neural operators, presents a principled framework for learning mappings between functions defined on continuous domains, e.g., spatiotemporal processes and partial differential equations (PDE). They can extrapolate and predict solutions at new locations unseen during training, i.e., perform zero-shot super-resolution. Neural operators can augment or even replace existing simulators in many applications, such as computational fluid dynamics, weather forecasting, and material modeling, while being 4-5 orders of magnitude faster. Further, neural operators can be integrated with physics and other domain constraints enforced at finer resolutions to obtain high-fidelity solutions and good generalization. Since neural operators are differentiable, they can directly optimize parameters for inverse design and other inverse problems. We believe that neural operators present a transformative approach to simulation and design, enabling rapid research and development.
Uncertainty Quantification via Neural Posterior Principal Components
- Authors: Authors: Elias Nehme, Omer Yair, Tomer Michaeli
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2309.15533
- Pdf link: https://arxiv.org/pdf/2309.15533
- Abstract Uncertainty quantification is crucial for the deployment of image restoration models in safety-critical domains, like autonomous driving and biological imaging. To date, methods for uncertainty visualization have mainly focused on per-pixel estimates. However, a heatmap of per-pixel variances is typically of little practical use, as it does not capture the strong correlations between pixels. A more natural measure of uncertainty corresponds to the variances along the principal components (PCs) of the posterior distribution. Theoretically, the PCs can be computed by applying PCA on samples generated from a conditional generative model for the input image. However, this requires generating a very large number of samples at test time, which is painfully slow with the current state-of-the-art (diffusion) models. In this work, we present a method for predicting the PCs of the posterior distribution for any input image, in a single forward pass of a neural network. Our method can either wrap around a pre-trained model that was trained to minimize the mean square error (MSE), or can be trained from scratch to output both a predicted image and the posterior PCs. We showcase our method on multiple inverse problems in imaging, including denoising, inpainting, super-resolution, and biological image-to-image translation. Our method reliably conveys instance-adaptive uncertainty directions, achieving uncertainty quantification comparable with posterior samplers while being orders of magnitude faster. Examples are available at https://eliasnehme.github.io/NPPC/