arxiv-updates
arxiv-updates copied to clipboard
New submissions for Tue, 19 Dec 23
Keyword: sgd
There is no result
Keyword: optimization
Towards Goal-oriented Intelligent Tutoring Systems in Online Education
- Authors: Authors: Yang Deng, Zifeng Ren, An Zhang, Wenqiang Lei, Tat-Seng Chua
- Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
- Arxiv link: https://arxiv.org/abs/2312.10053
- Pdf link: https://arxiv.org/pdf/2312.10053
- Abstract Interactive Intelligent Tutoring Systems (ITSs) enhance traditional ITSs by promoting effective learning through interactions and problem resolution in online education. Yet, proactive engagement, prioritizing resource optimization with planning and assessment capabilities, is often overlooked in current ITS designs. In this work, we investigate a new task, named Goal-oriented Intelligent Tutoring Systems (GITS), which aims to enable the student's mastery of a designated concept by strategically planning a customized sequence of exercises and assessment. To address the problem of goal-oriented policy learning in GITS, we propose a novel graph-based reinforcement learning framework, named Planning-Assessment-Interaction (PAI). Specifically, we first leverage cognitive structure information to improve state representation learning and action selection for planning the next action, which can be either to tutor an exercise or to assess the target concept. Further, we use a dynamically updated cognitive diagnosis model to simulate student responses to exercises and concepts. Three benchmark datasets across different subjects are constructed for enabling offline academic research on GITS. Experimental results demonstrate the effectiveness and efficiency of PAI and extensive analyses of various types of students are conducted to showcase the challenges in this task.
Advancements in Content-Based Image Retrieval: A Comprehensive Survey of Relevance Feedback Techniques
- Authors: Authors: Hamed Qazanfari, Mohammad M. AlyanNezhadi, Zohreh Nozari Khoshdaregi
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10089
- Pdf link: https://arxiv.org/pdf/2312.10089
- Abstract Content-based image retrieval (CBIR) systems have emerged as crucial tools in the field of computer vision, allowing for image search based on visual content rather than relying solely on metadata. This survey paper presents a comprehensive overview of CBIR, emphasizing its role in object detection and its potential to identify and retrieve visually similar images based on content features. Challenges faced by CBIR systems, including the semantic gap and scalability, are discussed, along with potential solutions. It elaborates on the semantic gap, which arises from the disparity between low-level features and high-level semantic concepts, and explores approaches to bridge this gap. One notable solution is the integration of relevance feedback (RF), empowering users to provide feedback on retrieved images and refine search results iteratively. The survey encompasses long-term and short-term learning approaches that leverage RF for enhanced CBIR accuracy and relevance. These methods focus on weight optimization and the utilization of active learning algorithms to select samples for training classifiers. Furthermore, the paper investigates machine learning techniques and the utilization of deep learning and convolutional neural networks to enhance CBIR performance. This survey paper plays a significant role in advancing the understanding of CBIR and RF techniques. It guides researchers and practitioners in comprehending existing methodologies, challenges, and potential solutions while fostering knowledge dissemination and identifying research gaps. By addressing future research directions, it sets the stage for advancements in CBIR that will enhance retrieval accuracy, usability, and effectiveness in various application domains.
Plasticine3D: Non-rigid 3D editting with text guidance
- Authors: Authors: Yige Chen, Ang Chen, Siyuan Chen, Ran Yi
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.10111
- Pdf link: https://arxiv.org/pdf/2312.10111
- Abstract With the help of Score Distillation Sampling(SDS) and the rapid development of various trainable 3D representations, Text-to-Image(T2I) diffusion models have been applied to 3D generation tasks and achieved considerable results. There are also some attempts toward the task of editing 3D objects leveraging this Text-to-3D pipeline. However, most methods currently focus on adding additional geometries, overwriting textures or both. But few of them can perform non-rigid transformation of 3D objects. For those who can perform non-rigid editing, on the other hand, suffer from low-resolution, lack of fidelity and poor flexibility. In order to address these issues, we present: Plasticine3D, a general, high-fidelity, photo-realistic and controllable non-rigid editing pipeline. Firstly, our work divides the editing process into a geometry editing stage and a texture editing stage to achieve more detailed and photo-realistic results ; Secondly, in order to perform non-rigid transformation with controllable results while maintain the fidelity towards original 3D models in the same time, we propose a multi-view-embedding(MVE) optimization strategy to ensure that the diffusion model learns the overall features of the original object and an embedding-fusion(EF) to control the degree of editing by adjusting the value of the fusing rate. We also design a geometry processing step before optimizing on the base geometry to cope with different needs of various editing tasks. Further more, to fully leverage the geometric prior from the original 3D object, we provide an optional replacement of score distillation sampling named score projection sampling(SPS) which enables us to directly perform optimization from the origin 3D mesh in most common median non-rigid editing scenarios. We demonstrate the effectiveness of our method on both the non-rigid 3D editing task and general 3D editing task.
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
- Authors: Authors: Qin Guo, Tianwei Lin
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.10113
- Pdf link: https://arxiv.org/pdf/2312.10113
- Abstract Recently, diffusion-based methods, like InstructPix2Pix (IP2P), have achieved effective instruction-based image editing, requiring only natural language instructions from the user. However, these methods often inadvertently alter unintended areas and struggle with multi-instruction editing, resulting in compromised outcomes. To address these issues, we introduce the Focus on Your Instruction (FoI), a method designed to ensure precise and harmonious editing across multiple instructions without extra training or test-time optimization. In the FoI, we primarily emphasize two aspects: (1) precisely extracting regions of interest for each instruction and (2) guiding the denoising process to concentrate within these regions of interest. For the first objective, we identify the implicit grounding capability of IP2P from the cross-attention between instruction and image, then develop an effective mask extraction method. For the second objective, we introduce a cross attention modulation module for rough isolation of target editing regions and unrelated regions. Additionally, we introduce a mask-guided disentangle sampling strategy to further ensure clear region isolation. Experimental results demonstrate that FoI surpasses existing methods in both quantitative and qualitative evaluations, especially excelling in multi-instruction editing task.
Test-Time Domain Adaptation by Learning Domain-Aware Batch Normalization
- Authors: Authors: Yanan Wu, Zhixiang Chi, Yang Wang, Konstantinos N. Plataniotis, Songhe Feng
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.10165
- Pdf link: https://arxiv.org/pdf/2312.10165
- Abstract Test-time domain adaptation aims to adapt the model trained on source domains to unseen target domains using a few unlabeled images. Emerging research has shown that the label and domain information is separately embedded in the weight matrix and batch normalization (BN) layer. Previous works normally update the whole network naively without explicitly decoupling the knowledge between label and domain. As a result, it leads to knowledge interference and defective distribution adaptation. In this work, we propose to reduce such learning interference and elevate the domain knowledge learning by only manipulating the BN layer. However, the normalization step in BN is intrinsically unstable when the statistics are re-estimated from a few samples. We find that ambiguities can be greatly reduced when only updating the two affine parameters in BN while keeping the source domain statistics. To further enhance the domain knowledge extraction from unlabeled data, we construct an auxiliary branch with label-independent self-supervised learning (SSL) to provide supervision. Moreover, we propose a bi-level optimization based on meta-learning to enforce the alignment of two learning objectives of auxiliary and main branches. The goal is to use the auxiliary branch to adapt the domain and benefit main task for subsequent inference. Our method keeps the same computational cost at inference as the auxiliary branch can be thoroughly discarded after adaptation. Extensive experiments show that our method outperforms the prior works on five WILDS real-world domain shift datasets. Our method can also be integrated with methods with label-dependent optimization to further push the performance boundary. Our code is available at https://github.com/ynanwu/MABN.
Coupling Fairness and Pruning in a Single Run: a Bi-level Optimization Perspective
- Authors: Authors: Yucong Dai, Gen Li, Feng Luo, Xiaolong Ma, Yongkai Wu
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
- Arxiv link: https://arxiv.org/abs/2312.10181
- Pdf link: https://arxiv.org/pdf/2312.10181
- Abstract Deep neural networks have demonstrated remarkable performance in various tasks. With a growing need for sparse deep learning, model compression techniques, especially pruning, have gained significant attention. However, conventional pruning techniques can inadvertently exacerbate algorithmic bias, resulting in unequal predictions. To address this, we define a fair pruning task where a sparse model is derived subject to fairness requirements. In particular, we propose a framework to jointly optimize the pruning mask and weight update processes with fairness constraints. This framework is engineered to compress models that maintain performance while ensuring fairness in a single execution. To this end, we formulate the fair pruning problem as a novel constrained bi-level optimization task and derive efficient and effective solving strategies. We design experiments spanning various datasets and settings to validate our proposed method. Our empirical analysis contrasts our framework with several mainstream pruning strategies, emphasizing our method's superiority in maintaining model fairness, performance, and efficiency.
Pareto Envelope Augmented with Reinforcement Learning: Multi-objective reinforcement learning-based approach for Large-Scale Constrained Pressurized Water Reactor optimization
- Authors: Authors: Paul Seurin, Koroush Seurin
- Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
- Arxiv link: https://arxiv.org/abs/2312.10194
- Pdf link: https://arxiv.org/pdf/2312.10194
- Abstract A novel method, the Pareto Envelope Augmented with Reinforcement Learning (PEARL), has been developed to address the challenges posed by multi-objective problems, particularly in the field of engineering where the evaluation of candidate solutions can be time-consuming. PEARL distinguishes itself from traditional policy-based multi-objective Reinforcement Learning methods by learning a single policy, eliminating the need for multiple neural networks to independently solve simpler sub-problems. Several versions inspired from deep learning and evolutionary techniques have been crafted, catering to both unconstrained and constrained problem domains. Curriculum Learning is harnessed to effectively manage constraints in these versions. PEARL's performance is first evaluated on classical multi-objective benchmarks. Additionally, it is tested on two practical PWR core Loading Pattern optimization problems to showcase its real-world applicability. The first problem involves optimizing the Cycle length and the rod-integrated peaking factor as the primary objectives, while the second problem incorporates the mean average enrichment as an additional objective. Furthermore, PEARL addresses three types of constraints related to boron concentration, peak pin burnup, and peak pin power. The results are systematically compared against a conventional approach, the Non-dominated Sorting Genetic Algorithm. Notably, PEARL, specifically the PEARL-NdS variant, efficiently uncovers a Pareto front without necessitating additional efforts from the algorithm designer, as opposed to a single optimization with scaled objectives. It also outperforms the classical approach across multiple performance metrics, including the Hyper-volume.
VK-G2T: Vision and Context Knowledge enhanced Gloss2Text
- Authors: Authors: Liqiang Jing, Xuemeng Song, Xinxing Zu, Na Zheng, Zhongzhou Zhao, Liqiang Nie
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2312.10210
- Pdf link: https://arxiv.org/pdf/2312.10210
- Abstract Existing sign language translation methods follow a two-stage pipeline: first converting the sign language video to a gloss sequence (i.e. Sign2Gloss) and then translating the generated gloss sequence into a spoken language sentence (i.e. Gloss2Text). While previous studies have focused on boosting the performance of the Sign2Gloss stage, we emphasize the optimization of the Gloss2Text stage. However, this task is non-trivial due to two distinct features of Gloss2Text: (1) isolated gloss input and (2) low-capacity gloss vocabulary. To address these issues, we propose a vision and context knowledge enhanced Gloss2Text model, named VK-G2T, which leverages the visual content of the sign language video to learn the properties of the target sentence and exploit the context knowledge to facilitate the adaptive translation of gloss words. Extensive experiments conducted on a Chinese benchmark validate the superiority of our model.
The Complexity of Optimizing Atomic Congestion
- Authors: Authors: Cornelius Brand, Robert Ganian, Subrahmanyam Kalyanasundaram, Fionn Mc Inerney
- Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)
- Arxiv link: https://arxiv.org/abs/2312.10219
- Pdf link: https://arxiv.org/pdf/2312.10219
- Abstract Atomic congestion games are a classic topic in network design, routing, and algorithmic game theory, and are capable of modeling congestion and flow optimization tasks in various application areas. While both the price of anarchy for such games as well as the computational complexity of computing their Nash equilibria are by now well-understood, the computational complexity of computing a system-optimal set of strategies -- that is, a centrally planned routing that minimizes the average cost of agents -- is severely understudied in the literature. We close this gap by identifying the exact boundaries of tractability for the problem through the lens of the parameterized complexity paradigm. After showing that the problem remains highly intractable even on extremely simple networks, we obtain a set of results which demonstrate that the structural parameters which control the computational (in)tractability of the problem are not vertex-separator based in nature (such as, e.g., treewidth), but rather based on edge separators. We conclude by extending our analysis towards the (even more challenging) min-max variant of the problem.
Active Reinforcement Learning for Robust Building Control
- Authors: Authors: Doseok Jang, Larry Yan, Lucas Spangher, Costas Spanos
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2312.10289
- Pdf link: https://arxiv.org/pdf/2312.10289
- Abstract Reinforcement learning (RL) is a powerful tool for optimal control that has found great success in Atari games, the game of Go, robotic control, and building optimization. RL is also very brittle; agents often overfit to their training environment and fail to generalize to new settings. Unsupervised environment design (UED) has been proposed as a solution to this problem, in which the agent trains in environments that have been specially selected to help it learn. Previous UED algorithms focus on trying to train an RL agent that generalizes across a large distribution of environments. This is not necessarily desirable when we wish to prioritize performance in one environment over others. In this work, we will be examining the setting of robust RL building control, where we wish to train an RL agent that prioritizes performing well in normal weather while still being robust to extreme weather conditions. We demonstrate a novel UED algorithm, ActivePLR, that uses uncertainty-aware neural network architectures to generate new training environments at the limit of the RL agent's ability while being able to prioritize performance in a desired base environment. We show that ActivePLR is able to outperform state-of-the-art UED algorithms in minimizing energy usage while maximizing occupant comfort in the setting of building control.
Runtime Analysis of the SMS-EMOA for Many-Objective Optimization
- Authors: Authors: Weijie Zheng, Benjamin Doerr
- Subjects: Neural and Evolutionary Computing (cs.NE)
- Arxiv link: https://arxiv.org/abs/2312.10290
- Pdf link: https://arxiv.org/pdf/2312.10290
- Abstract The widely used multiobjective optimizer NSGA-II was recently proven to have considerable difficulties in many-objective optimization. In contrast, experimental results in the literature show a good performance of the SMS-EMOA, which can be seen as a steady-state NSGA-II that uses the hypervolume contribution instead of the crowding distance as the second selection criterion. This paper conducts the first rigorous runtime analysis of the SMS-EMOA for many-objective optimization. To this aim, we first propose a many-objective counterpart, the m-objective mOJZJ problem, of the bi-objective OJZJ benchmark, which is the first many-objective multimodal benchmark used in a mathematical runtime analysis. We prove that SMS-EMOA computes the full Pareto front of this benchmark in an expected number of $O(M^2 n^k)$ iterations, where $n$ denotes the problem size (length of the bit-string representation), $k$ the gap size (a difficulty parameter of the problem), and $M=(2n/m-2k+3)^{m/2}$ the size of the Pareto front. This result together with the existing negative result on the original NSGA-II shows that in principle, the general approach of the NSGA-II is suitable for many-objective optimization, but the crowding distance as tie-breaker has deficiencies. We obtain three additional insights on the SMS-EMOA. Different from a recent result for the bi-objective OJZJ benchmark, the stochastic population update often does not help for mOJZJ. It results in a $1/\Theta(\min{Mk^{1/2}/2^{k/2},1})$ speed-up, which is $\Theta(1)$ for large $m$ such as $m>k$. On the positive side, we prove that heavy-tailed mutation still results in a speed-up of order $k^{0.5+k-\beta}$. Finally, we conduct the first runtime analyses of the SMS-EMOA on the bi-objective OneMinMax and LOTZ benchmarks and show that it has a performance comparable to the GSEMO and the NSGA-II.
Spatial-Temporal DAG Convolutional Networks for End-to-End Joint Effective Connectivity Learning and Resting-State fMRI Classification
- Authors: Authors: Rui Yang, Wenrui Dai, Huajun She, Yiping P. Du, Dapeng Wu, Hongkai Xiong
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
- Arxiv link: https://arxiv.org/abs/2312.10317
- Pdf link: https://arxiv.org/pdf/2312.10317
- Abstract Building comprehensive brain connectomes has proved of fundamental importance in resting-state fMRI (rs-fMRI) analysis. Based on the foundation of brain network, spatial-temporal-based graph convolutional networks have dramatically improved the performance of deep learning methods in rs-fMRI time series classification. However, existing works either pre-define the brain network as the correlation matrix derived from the raw time series or jointly learn the connectome and model parameters without any topology constraint. These methods could suffer from degraded classification performance caused by the deviation from the intrinsic brain connectivity and lack biological interpretability of demonstrating the causal structure (i.e., effective connectivity) among brain regions. Moreover, most existing methods for effective connectivity learning are unaware of the downstream classification task and cannot sufficiently exploit useful rs-fMRI label information. To address these issues in an end-to-end manner, we model the brain network as a directed acyclic graph (DAG) to discover direct causal connections between brain regions and propose Spatial-Temporal DAG Convolutional Network (ST-DAGCN) to jointly infer effective connectivity and classify rs-fMRI time series by learning brain representations based on nonlinear structural equation model. The optimization problem is formulated into a continuous program and solved with score-based learning method via gradient descent. We evaluate ST-DAGCN on two public rs-fMRI databases. Experiments show that ST-DAGCN outperforms existing models by evident margins in rs-fMRI classification and simultaneously learns meaningful edges of effective connectivity that help understand brain activity patterns and pathological mechanisms in brain disease.
Deriving Rewards for Reinforcement Learning from Symbolic Behaviour Descriptions of Bipedal Walking
- Authors: Authors: Daniel Harnack, Christoph Lüth, Lukas Gross, Shivesh Kumar, Frank Kirchner
- Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
- Arxiv link: https://arxiv.org/abs/2312.10328
- Pdf link: https://arxiv.org/pdf/2312.10328
- Abstract Generating physical movement behaviours from their symbolic description is a long-standing challenge in artificial intelligence (AI) and robotics, requiring insights into numerical optimization methods as well as into formalizations from symbolic AI and reasoning. In this paper, a novel approach to finding a reward function from a symbolic description is proposed. The intended system behaviour is modelled as a hybrid automaton, which reduces the system state space to allow more efficient reinforcement learning. The approach is applied to bipedal walking, by modelling the walking robot as a hybrid automaton over state space orthants, and used with the compass walker to derive a reward that incentivizes following the hybrid automaton cycle. As a result, training times of reinforcement learning controllers are reduced while final walking speed is increased. The approach can serve as a blueprint how to generate reward functions from symbolic AI and reasoning.
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
- Authors: Authors: Mingbin Xu, Alex Jin, Sicheng Wang, Mu Su, Tim Ng, Henry Mason, Shiyi Han, Yaqiao Deng, Zhen Huang, Mahesh Krishnamoorthy
- Subjects: Machine Learning (cs.LG); Performance (cs.PF)
- Arxiv link: https://arxiv.org/abs/2312.10359
- Pdf link: https://arxiv.org/pdf/2312.10359
- Abstract With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on resource-constrained devices, such as smartphones, smart wearables, and other small home automation devices. In this paper, we propose a series of model architecture adaptions, neural network graph transformations, and numerical optimizations to fit an advanced Conformer based end-to-end streaming ASR system on resource-constrained devices without accuracy degradation. We achieve over 5.26 times faster than realtime (0.19 RTF) speech recognition on small wearables while minimizing energy consumption and achieving state-of-the-art accuracy. The proposed methods are widely applicable to other transformer-based server-free AI applications. In addition, we provide a complete theory on optimal pre-normalizers that numerically stabilize layer normalization in any Lp-norm using any floating point precision.
Bayesian experimental design for head imaging by electrical impedance tomography
- Authors: Authors: N. Hyvönen, A. Jääskeläinen, R. Maity, A. Vavilov
- Subjects: Numerical Analysis (math.NA); Statistics Theory (math.ST)
- Arxiv link: https://arxiv.org/abs/2312.10383
- Pdf link: https://arxiv.org/pdf/2312.10383
- Abstract This work considers the optimization of electrode positions in head imaging by electrical impedance tomography. The study is motivated by maximizing the sensitivity of electrode measurements to conductivity changes when monitoring the condition of a stroke patient, which justifies adopting a linearized version of the complete electrode model as the forward model. The algorithm is based on finding a (locally) A-optimal measurement configuration via gradient descent with respect to the electrode positions. The efficient computation of the needed derivatives of the complete electrode model is one of the focal points. Two algorithms are introduced and numerically tested on a three-layer head model. The first one assumes a region of interest and a Gaussian prior for the conductivity in the brain, and it can be run offline, i.e., prior to taking any measurements. The second algorithm first computes a reconstruction of the conductivity anomaly caused by the stroke with an initial electrode configuration by combining lagged diffusivity iteration with sequential linearizations, which can be interpreted to produce an approximate Gaussian probability density for the conductivity perturbation. It then resorts to the first algorithm to find new, more informative positions for the available electrodes with the constructed density as the prior.
RedCore: Relative Advantage Aware Cross-modal Representation Learning for Missing Modalities with Imbalanced Missing Rates
- Authors: Authors: Jun Sun, Xinxin Zhang, Shoukang Han, Yu-ping Ruan, Taihao Li
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10386
- Pdf link: https://arxiv.org/pdf/2312.10386
- Abstract Multimodal learning is susceptible to modality missing, which poses a major obstacle for its practical applications and, thus, invigorates increasing research interest. In this paper, we investigate two challenging problems: 1) when modality missing exists in the training data, how to exploit the incomplete samples while guaranteeing that they are properly supervised? 2) when the missing rates of different modalities vary, causing or exacerbating the imbalance among modalities, how to address the imbalance and ensure all modalities are well-trained? To tackle these two challenges, we first introduce the variational information bottleneck (VIB) method for the cross-modal representation learning of missing modalities, which capitalizes on the available modalities and the labels as supervision. Then, accounting for the imbalanced missing rates, we define relative advantage to quantify the advantage of each modality over others. Accordingly, a bi-level optimization problem is formulated to adaptively regulate the supervision of all modalities during training. As a whole, the proposed approach features \textbf{Re}lative a\textbf{d}vantage aware \textbf{C}ross-m\textbf{o}dal \textbf{r}epresentation l\textbf{e}arning (abbreviated as \textbf{RedCore}) for missing modalities with imbalanced missing rates. Extensive empirical results demonstrate that RedCore outperforms competing models in that it exhibits superior robustness against either large or imbalanced missing rates.
Fractional Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing
- Authors: Authors: Ming Tang, Lyudong Jin, Meng Zhang, Hao Wang
- Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2312.10418
- Pdf link: https://arxiv.org/pdf/2312.10418
- Abstract Mobile edge computing (MEC) is a promising paradigm for real-time applications with intensive computational needs (e.g., autonomous driving), as it can reduce the processing delay. In this work, we focus on the timeliness of computational-intensive updates, measured by Age-ofInformation (AoI), and study how to jointly optimize the task updating and offloading policies for AoI with fractional form. Specifically, we consider edge load dynamics and formulate a task scheduling problem to minimize the expected time-average AoI. The uncertain edge load dynamics, the nature of the fractional objective, and hybrid continuous-discrete action space (due to the joint optimization) make this problem challenging and existing approaches not directly applicable. To this end, we propose a fractional reinforcement learning(RL) framework and prove its convergence. We further design a model-free fractional deep RL (DRL) algorithm, where each device makes scheduling decisions with the hybrid action space without knowing the system dynamics and decisions of other devices. Experimental results show that our proposed algorithms reduce the average AoI by up to 57.6% compared with several non-fractional benchmarks.
Stochastic Bayesian Optimization with Unknown Continuous Context Distribution via Kernel Density Estimation
- Authors: Authors: Xiaobin Huang, Lei Song, Ke Xue, Chao Qian
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.10423
- Pdf link: https://arxiv.org/pdf/2312.10423
- Abstract Bayesian optimization (BO) is a sample-efficient method and has been widely used for optimizing expensive black-box functions. Recently, there has been a considerable interest in BO literature in optimizing functions that are affected by context variable in the environment, which is uncontrollable by decision makers. In this paper, we focus on the optimization of functions' expectations over continuous context variable, subject to an unknown distribution. To address this problem, we propose two algorithms that employ kernel density estimation to learn the probability density function (PDF) of continuous context variable online. The first algorithm is simpler, which directly optimizes the expectation under the estimated PDF. Considering that the estimated PDF may have high estimation error when the true distribution is complicated, we further propose the second algorithm that optimizes the distributionally robust objective. Theoretical results demonstrate that both algorithms have sub-linear Bayesian cumulative regret on the expectation objective. Furthermore, we conduct numerical experiments to empirically demonstrate the effectiveness of our algorithms.
Decomposing Hard SAT Instances with Metaheuristic Optimization
- Authors: Authors: Daniil Chivilikhin, Artem Pavlenko, Alexander Semenov
- Subjects: Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
- Arxiv link: https://arxiv.org/abs/2312.10436
- Pdf link: https://arxiv.org/pdf/2312.10436
- Abstract In the article, within the framework of the Boolean Satisfiability problem (SAT), the problem of estimating the hardness of specific Boolean formulas w.r.t. a specific complete SAT solving algorithm is considered. Based on the well-known Strong Backdoor Set (SBS) concept, we introduce the notion of decomposition hardness (d-hardness). If $B$ is an arbitrary subset of the set of variables occurring in a SAT formula $C$, and $A$ is an arbitrary complete SAT solver , then the d-hardness expresses an estimate of the hardness of $C$ w.r.t. $A$ and $B$. We show that the d-hardness of $C$ w.r.t. a particular $B$ can be expressed in terms of the expected value of a special random variable associated with $A$, $B$, and $C$. For its computational evaluation, algorithms based on the Monte Carlo method can be used. The problem of finding $B$ with the minimum value of d-hardness is formulated as an optimization problem for a pseudo-Boolean function whose values are calculated as a result of a probabilistic experiment. To minimize this function, we use evolutionary algorithms. In the experimental part, we demonstrate the applicability of the concept of d-hardness and the methods of its estimation to solving hard unsatisfiable SAT instances.
Weight-Entanglement Meets Gradient-Based Neural Architecture Search
- Authors: Authors: Rhea Sanjay Sukthanker, Arjun Krishnakumar, Mahmoud Safari, Frank Hutter
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.10440
- Pdf link: https://arxiv.org/pdf/2312.10440
- Abstract Weight sharing is a fundamental concept in neural architecture search (NAS), enabling gradient-based methods to explore cell-based architecture spaces significantly faster than traditional blackbox approaches. In parallel, weight \emph{entanglement} has emerged as a technique for intricate parameter sharing among architectures within macro-level search spaces. %However, the macro structure of such spaces poses compatibility challenges for gradient-based NAS methods. %As a result, blackbox optimization methods have been commonly employed, particularly in conjunction with supernet training, to maintain search efficiency. %Due to the inherent differences in the structure of these search spaces, these Since weight-entanglement poses compatibility challenges for gradient-based NAS methods, these two paradigms have largely developed independently in parallel sub-communities. This paper aims to bridge the gap between these sub-communities by proposing a novel scheme to adapt gradient-based methods for weight-entangled spaces. This enables us to conduct an in-depth comparative assessment and analysis of the performance of gradient-based NAS in weight-entangled search spaces. Our findings reveal that this integration of weight-entanglement and gradient-based NAS brings forth the various benefits of gradient-based methods (enhanced performance, improved supernet training properties and superior any-time performance), while preserving the memory efficiency of weight-entangled spaces. The code for our work is openly accessible \href{https://anonymous.4open.science/r/TangleNAS-527C}{here}
Generalization Analysis of Policy Networks: An Example of Double-Integrator
- Authors: Authors: Ruining Zhang, Haoran Han, Maolong Lv, Qisong Yang, Jian Cheng
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2312.10472
- Pdf link: https://arxiv.org/pdf/2312.10472
- Abstract Extensive utilization of deep reinforcement learning (DRL) policy networks in diverse continuous control tasks has raised questions regarding performance degradation in expansive state spaces where the input state norm is larger than that in the training environment. This paper aims to uncover the underlying factors contributing to such performance deterioration when dealing with expanded state spaces, using a novel analysis technique known as state division. In contrast to prior approaches that employ state division merely as a post-hoc explanatory tool, our methodology delves into the intrinsic characteristics of DRL policy networks. Specifically, we demonstrate that the expansion of state space induces the activation function $\tanh$ to exhibit saturability, resulting in the transformation of the state division boundary from nonlinear to linear. Our analysis centers on the paradigm of the double-integrator system, revealing that this gradual shift towards linearity imparts a control behavior reminiscent of bang-bang control. However, the inherent linearity of the division boundary prevents the attainment of an ideal bang-bang control, thereby introducing unavoidable overshooting. Our experimental investigations, employing diverse RL algorithms, establish that this performance phenomenon stems from inherent attributes of the DRL policy network, remaining consistent across various optimization algorithms.
IRS-Aided Sectorized Base Station Design and 3D Coverage Performance Analysis
- Authors: Authors: Xintong Chen, Jiangbin Lyu, Liqun Fu
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2312.10475
- Pdf link: https://arxiv.org/pdf/2312.10475
- Abstract Intelligent reflecting surface (IRS) is regarded as a revolutionary paradigm that can reconfigure the wireless propagation environment for enhancing the desired signal and/or weakening the interference, and thus improving the quality of service (QoS) for communication systems. In this paper, we propose an IRS-aided sectorized BS design where the IRS is mounted in front of a transmitter (TX) and reflects/reconfigures signal towards the desired user equipment (UE). Unlike prior works that address link-level analysis/optimization of IRS-aided systems, we focus on the system-level three-dimensional (3D) coverage performance in both single-/multiple-cell scenarios. To this end, a distance/angle-dependent 3D channel model is considered for UEs in the 3D space, as well as the non-isotropic TX beam pattern and IRS element radiation pattern (ERP), both of which affect the average channel power as well as the multi-path fading statistics. Based on the above, a general formula of received signal power in our design is obtained, along with derived power scaling laws and upper/lower bounds on the mean signal/interference power under IRS passive beamforming or random scattering. Numerical results validate our analysis and demonstrate that our proposed design outperforms the benchmark schemes with fixed BS antenna patterns or active 3D beamforming. In particular, for aerial UEs that suffer from strong inter-cell interference, the IRS-aided BS design provides much better QoS in terms of the ergodic throughput performance compared with benchmarks, thanks to the IRS-inherent double pathloss effect that helps weaken the interference.
Spatial Deep Learning for Site-Specific Movement Optimization of Aerial Base Stations
- Authors: Authors: Jiangbin Lyu, Xu Chen, Jiefeng Zhang, Liqun Fu
- Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10490
- Pdf link: https://arxiv.org/pdf/2312.10490
- Abstract Unmanned aerial vehicles (UAVs) can be utilized as aerial base stations (ABSs) to provide wireless connectivity for ground users (GUs) in various emergency scenarios. However, it is a NP-hard problem with exponential complexity in $M$ and $N$, in order to maximize the coverage rate of $M$ GUs by jointly placing $N$ ABSs with limited coverage range. The problem is further complicated when the coverage range becomes irregular due to site-specific blockages (e.g., buildings) on the air-ground channel, and/or when the GUs are moving. To address the above challenges, we study a multi-ABS movement optimization problem to maximize the average coverage rate of mobile GUs in a site-specific environment. The Spatial Deep Learning with Multi-dimensional Archive of Phenotypic Elites (SDL-ME) algorithm is proposed to tackle this challenging problem by 1) partitioning the complicated ABS movement problem into ABS placement sub-problems each spanning finite time horizon; 2) using an encoder-decoder deep neural network (DNN) as the emulator to capture the spatial correlation of ABSs/GUs and thereby reducing the cost of interaction with the actual environment; 3) employing the emulator to speed up a quality-diversity search for the optimal placement solution; and 4) proposing a planning-exploration-serving scheme for multi-ABS movement coordination. Numerical results demonstrate that the proposed approach significantly outperforms the benchmark Deep Reinforcement Learning (DRL)-based method and other two baselines in terms of average coverage rate, training time and/or sample efficiency. Moreover, with one-time training, our proposed method can be applied in scenarios where the number of ABSs/GUs dynamically changes on site and/or with different/varying GU speeds, which is thus more robust and flexible compared with conventional DRL-based methods.
How to Train Neural Field Representations: A Comprehensive Study and Benchmark
- Authors: Authors: Samuele Papa, Riccardo Valperga, David Knigge, Miltiadis Kofinas, Phillip Lippe, Jan-Jakob Sonke, Efstratios Gavves
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.10531
- Pdf link: https://arxiv.org/pdf/2312.10531
- Abstract Neural fields (NeFs) have recently emerged as a versatile method for modeling signals of various modalities, including images, shapes, and scenes. Subsequently, a number of works have explored the use of NeFs as representations for downstream tasks, e.g. classifying an image based on the parameters of a NeF that has been fit to it. However, the impact of the NeF hyperparameters on their quality as downstream representation is scarcely understood and remains largely unexplored. This is in part caused by the large amount of time required to fit datasets of neural fields. In this work, we propose $\verb|fit-a-nef|$, a JAX-based library that leverages parallelization to enable fast optimization of large-scale NeF datasets, resulting in a significant speed-up. With this library, we perform a comprehensive study that investigates the effects of different hyperparameters -- including initialization, network architecture, and optimization strategies -- on fitting NeFs for downstream tasks. Our study provides valuable insights on how to train NeFs and offers guidance for optimizing their effectiveness in downstream applications. Finally, based on the proposed library and our analysis, we propose Neural Field Arena, a benchmark consisting of neural field variants of popular vision datasets, including MNIST, CIFAR, variants of ImageNet, and ShapeNetv2. Our library and the Neural Field Arena will be open-sourced to introduce standardized benchmarking and promote further research on neural fields.
Advancing RAN Slicing with Offline Reinforcement Learning
- Authors: Authors: Kun Yang, Shu-ping Yeh, Menglei Zhang, Jerry Sydir, Jing Yang, Cong Shen
- Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2312.10547
- Pdf link: https://arxiv.org/pdf/2312.10547
- Abstract Dynamic radio resource management (RRM) in wireless networks presents significant challenges, particularly in the context of Radio Access Network (RAN) slicing. This technology, crucial for catering to varying user requirements, often grapples with complex optimization scenarios. Existing Reinforcement Learning (RL) approaches, while achieving good performance in RAN slicing, typically rely on online algorithms or behavior cloning. These methods necessitate either continuous environmental interactions or access to high-quality datasets, hindering their practical deployment. Towards addressing these limitations, this paper introduces offline RL to solving the RAN slicing problem, marking a significant shift towards more feasible and adaptive RRM methods. We demonstrate how offline RL can effectively learn near-optimal policies from sub-optimal datasets, a notable advancement over existing practices. Our research highlights the inherent flexibility of offline RL, showcasing its ability to adjust policy criteria without the need for additional environmental interactions. Furthermore, we present empirical evidence of the efficacy of offline RL in adapting to various service-level requirements, illustrating its potential in diverse RAN slicing scenarios.
Improving Environment Robustness of Deep Reinforcement Learning Approaches for Autonomous Racing Using Bayesian Optimization-based Curriculum Learning
- Authors: Authors: Rohan Banerjee, Prishita Ray, Mark Campbell
- Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10557
- Pdf link: https://arxiv.org/pdf/2312.10557
- Abstract Deep reinforcement learning (RL) approaches have been broadly applied to a large number of robotics tasks, such as robot manipulation and autonomous driving. However, an open problem in deep RL is learning policies that are robust to variations in the environment, which is an important condition for such systems to be deployed into real-world, unstructured settings. Curriculum learning is one approach that has been applied to improve generalization performance in both supervised and reinforcement learning domains, but selecting the appropriate curriculum to achieve robustness can be a user-intensive process. In our work, we show that performing probabilistic inference of the underlying curriculum-reward function using Bayesian Optimization can be a promising technique for finding a robust curriculum. We demonstrate that a curriculum found with Bayesian optimization can outperform a vanilla deep RL agent and a hand-engineered curriculum in the domain of autonomous racing with obstacle avoidance. Our code is available at https://github.com/PRISHIta123/Curriculum_RL_for_Driving.
Enabling Accelerators for Graph Computing
- Authors: Authors: Kaustubh Shivdikar
- Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10561
- Pdf link: https://arxiv.org/pdf/2312.10561
- Abstract The advent of Graph Neural Networks (GNNs) has revolutionized the field of machine learning, offering a novel paradigm for learning on graph-structured data. Unlike traditional neural networks, GNNs are capable of capturing complex relationships and dependencies inherent in graph data, making them particularly suited for a wide range of applications including social network analysis, molecular chemistry, and network security. The impact of GNNs in these domains is profound, enabling more accurate models and predictions, and thereby contributing significantly to advancements in these fields. GNNs, with their unique structure and operation, present new computational challenges compared to conventional neural networks. This requires comprehensive benchmarking and a thorough characterization of GNNs to obtain insight into their computational requirements and to identify potential performance bottlenecks. In this thesis, we aim to develop a better understanding of how GNNs interact with the underlying hardware and will leverage this knowledge as we design specialized accelerators and develop new optimizations, leading to more efficient and faster GNN computations. Synthesizing these insights and optimizations, we design a state-of-the-art hardware accelerator capable of efficiently handling various GNN workloads. Our accelerator architecture is built on our characterization of GNN computational demands, providing clear motivation for our approach. Furthermore, we extend our exploration to emerging GNN workloads in the domain of graph neural networks. This exploration into novel models underlines our comprehensive approach, as we strive to enable accelerators that are not just performant, but also versatile, able to adapt to the evolving landscape of graph computing.
Multi-level Reasoning for Robotic Assembly: From Sequence Inference to Contact Selection
- Authors: Authors: Xinghao Zhu, Devesh K. Jha, Diego Romeres, Lingfeng Sun, Masayoshi Tomizuka, Anoop Cherian
- Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10571
- Pdf link: https://arxiv.org/pdf/2312.10571
- Abstract Automating the assembly of objects from their parts is a complex problem with innumerable applications in manufacturing, maintenance, and recycling. Unlike existing research, which is limited to target segmentation, pose regression, or using fixed target blueprints, our work presents a holistic multi-level framework for part assembly planning consisting of part assembly sequence inference, part motion planning, and robot contact optimization. We present the Part Assembly Sequence Transformer (PAST) -- a sequence-to-sequence neural network -- to infer assembly sequences recursively from a target blueprint. We then use a motion planner and optimization to generate part movements and contacts. To train PAST, we introduce D4PAS: a large-scale Dataset for Part Assembly Sequences (D4PAS) consisting of physically valid sequences for industrial objects. Experimental results show that our approach generalizes better than prior methods while needing significantly less computational time for inference.
DER-GCN: Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialogue Emotion Recognition
- Authors: Authors: Wei Ai, Yuntao Shou, Tao Meng, Keqin Li
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.10579
- Pdf link: https://arxiv.org/pdf/2312.10579
- Abstract With the continuous development of deep learning (DL), the task of multimodal dialogue emotion recognition (MDER) has recently received extensive research attention, which is also an essential branch of DL. The MDER aims to identify the emotional information contained in different modalities, e.g., text, video, and audio, in different dialogue scenes. However, existing research has focused on modeling contextual semantic information and dialogue relations between speakers while ignoring the impact of event relations on emotion. To tackle the above issues, we propose a novel Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition (DER-GCN) method. It models dialogue relations between speakers and captures latent event relations information. Specifically, we construct a weighted multi-relationship graph to simultaneously capture the dependencies between speakers and event relations in a dialogue. Moreover, we also introduce a Self-Supervised Masked Graph Autoencoder (SMGAE) to improve the fusion representation ability of features and structures. Next, we design a new Multiple Information Transformer (MIT) to capture the correlation between different relations, which can provide a better fuse of the multivariate information between relations. Finally, we propose a loss optimization strategy based on contrastive learning to enhance the representation learning ability of minority class features. We conduct extensive experiments on the IEMOCAP and MELD benchmark datasets, which verify the effectiveness of the DER-GCN model. The results demonstrate that our model significantly improves both the average accuracy and the f1 value of emotion recognition.
Policy Optimization in RLHF: The Impact of Out-of-preference Data
- Authors: Authors: Ziniu Li, Tian Xu, Yang Yu
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10584
- Pdf link: https://arxiv.org/pdf/2312.10584
- Abstract Aligning intelligent agents with human preferences and values is important. This paper examines two popular alignment methods: Direct Preference Optimization (DPO) and Reward-Model-Based Policy Optimization (RMB-PO). A variant of RMB-PO, referred to as RMB-PO+ is also considered. These methods, either explicitly or implicitly, learn a reward model from preference data and differ in the data used for policy optimization to unlock the generalization ability of the reward model. In particular, compared with DPO, RMB-PO additionally uses policy-generated data, and RMB-PO+ further leverages new, preference-free data. We examine the impact of such out-of-preference data. Our study, conducted through controlled and synthetic experiments, demonstrates that DPO performs poorly, whereas RMB-PO+ performs the best. In particular, even when providing the policy model with a good feature representation, we find that policy optimization with adequate out-of-preference data significantly improves performance by harnessing the reward model's generalization capabilities.
E2E-AT: A Unified Framework for Tackling Uncertainty in Task-aware End-to-end Learning
- Authors: Authors: Wangkun Xu, Jianhong Wang, Fei Teng
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10587
- Pdf link: https://arxiv.org/pdf/2312.10587
- Abstract Successful machine learning involves a complete pipeline of data, model, and downstream applications. Instead of treating them separately, there has been a prominent increase of attention within the constrained optimization (CO) and machine learning (ML) communities towards combining prediction and optimization models. The so-called end-to-end (E2E) learning captures the task-based objective for which they will be used for decision making. Although a large variety of E2E algorithms have been presented, it has not been fully investigated how to systematically address uncertainties involved in such models. Most of the existing work considers the uncertainties of ML in the input space and improves robustness through adversarial training. We apply the same idea to E2E learning and prove that there is a robustness certification procedure by solving augmented integer programming. Furthermore, we show that neglecting the uncertainty of COs during training causes a new trigger for generalization errors. To include all these components, we propose a unified framework that covers the uncertainties emerging in both the input feature space of the ML models and the COs. The framework is described as a robust optimization problem and is practically solved via end-to-end adversarial training (E2E-AT). Finally, the performance of E2E-AT is evaluated by a real-world end-to-end power system operation problem, including load forecasting and sequential scheduling tasks.
NN-Steiner: A Mixed Neural-algorithmic Approach for the Rectilinear Steiner Minimum Tree Problem
- Authors: Authors: Andrew B. Kahng, Robert R. Nerem, Yusu Wang, Chien-Yi Yang
- Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10589
- Pdf link: https://arxiv.org/pdf/2312.10589
- Abstract Recent years have witnessed rapid advances in the use of neural networks to solve combinatorial optimization problems. Nevertheless, designing the "right" neural model that can effectively handle a given optimization problem can be challenging, and often there is no theoretical understanding or justification of the resulting neural model. In this paper, we focus on the rectilinear Steiner minimum tree (RSMT) problem, which is of critical importance in IC layout design and as a result has attracted numerous heuristic approaches in the VLSI literature. Our contributions are two-fold. On the methodology front, we propose NN-Steiner, which is a novel mixed neural-algorithmic framework for computing RSMTs that leverages the celebrated PTAS algorithmic framework of Arora to solve this problem (and other geometric optimization problems). Our NN-Steiner replaces key algorithmic components within Arora's PTAS by suitable neural components. In particular, NN-Steiner only needs four neural network (NN) components that are called repeatedly within an algorithmic framework. Crucially, each of the four NN components is only of bounded size independent of input size, and thus easy to train. Furthermore, as the NN component is learning a generic algorithmic step, once learned, the resulting mixed neural-algorithmic framework generalizes to much larger instances not seen in training. Our NN-Steiner, to our best knowledge, is the first neural architecture of bounded size that has capacity to approximately solve RSMT (and variants). On the empirical front, we show how NN-Steiner can be implemented and demonstrate the effectiveness of our resulting approach, especially in terms of generalization, by comparing with state-of-the-art methods (both neural or non-neural based).
Theoretical Aspects of Generating Instances with Unique Solutions: Pre-assignment Models for Unique Vertex Cover
- Authors: Authors: Takashi Horiyama, Yasuaki Kobayashi, Hirotaka Ono, Kazuhisa Seto, Ryu Suzuki
- Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
- Arxiv link: https://arxiv.org/abs/2312.10599
- Pdf link: https://arxiv.org/pdf/2312.10599
- Abstract The uniqueness of an optimal solution to a combinatorial optimization problem attracts many fields of researchers' attention because it has a wide range of applications, it is related to important classes in computational complexity, and an instance with only one solution is often critical for algorithm designs in theory. However, as the authors know, there is no major benchmark set consisting of only instances with unique solutions, and no algorithm generating instances with unique solutions is known; a systematic approach to getting a problem instance guaranteed having a unique solution would be helpful. A possible approach is as follows: Given a problem instance, we specify a small part of a solution in advance so that only one optimal solution meets the specification. This paper formulates such a ``pre-assignment'' approach for the vertex cover problem as a typical combinatorial optimization problem and discusses its computational complexity. First, we show that the problem is $\Sigma^P_2$-complete in general, while the problem becomes NP-complete when an input graph is bipartite. We then present an $O(2.1996^n)$-time algorithm for general graphs and an $O(1.9181^n)$-time algorithm for bipartite graphs, where $n$ is the number of vertices. The latter is based on an FPT algorithm with $O^*(3.6791^{\tau})$ time for vertex cover number $\tau$. Furthermore, we show that the problem for trees can be solved in $O(1.4143^n)$ time.
Human AI Collaboration in Software Engineering: Lessons Learned from a Hands On Workshop
- Authors: Authors: Muhammad Hamza, Dominik Siemon, Muhammad Azeem Akbar, Tahsinur Rahman
- Subjects: Software Engineering (cs.SE)
- Arxiv link: https://arxiv.org/abs/2312.10620
- Pdf link: https://arxiv.org/pdf/2312.10620
- Abstract This paper investigates the dynamics of human AI collaboration in software engineering, focusing on the use of ChatGPT. Through a thematic analysis of a hands on workshop in which 22 professional software engineers collaborated for three hours with ChatGPT, we explore the transition of AI from a mere tool to a collaborative partner. The study identifies key themes such as the evolving nature of human AI interaction, the capabilities of AI in software engineering tasks, and the challenges and limitations of integrating AI in this domain. The findings show that while AI, particularly ChatGPT, improves the efficiency of code generation and optimization, human oversight remains crucial, especially in areas requiring complex problem solving and security considerations. This research contributes to the theoretical understanding of human AI collaboration in software engineering and provides practical insights for effectively integrating AI tools into development processes. It highlights the need for clear role allocation, effective communication, and balanced AI human collaboration to realize the full potential of AI in software engineering.
Beamforming Design for Integrated Sensing and Communication with Extended Target
- Authors: Authors: Yiqiu Wang, Meixia Tao, Shu Sun
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2312.10641
- Pdf link: https://arxiv.org/pdf/2312.10641
- Abstract This paper studies transmit beamforming design in an integrated sensing and communication (ISAC) system, where a base station sends symbols to perform downlink multi-user communication and sense an extended target simultaneously. We first model the extended target contour with truncated Fourier series. By considering echo signals as reflections from the valid elements on the target contour, a novel Cram'er-Rao bound (CRB) on the direction estimation of extended target is derived. We then formulate the transmit beamforming design as an optimization problem by minimizing the CRB of radar sensing, and satisfying a minimum signal-to-interference-plus-noise ratio requirement for each communication user, as well as a 3-dB beam coverage requirement tailored for the extended sensing target under a total transmit power constraint. In view of the non-convexity of the above problem, we employ semidefinite relaxation (SDR) technique for convex relaxation, followed by a rank-one extraction scheme for non-tight relaxation circumstances. Numerical results show that the proposed SDR beamforming scheme outperforms benchmark beampattern design methods with lower CRBs for the circumstances considered.
Single-Stage Optimization of Open-loop Stable Limit Cycles with Smooth, Symbolic Derivatives
- Authors: Authors: Muhammad Saud Ul Hassan, Christian Hubicki
- Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2312.10647
- Pdf link: https://arxiv.org/pdf/2312.10647
- Abstract Open-loop stable limit cycles are foundational to the dynamics of legged robots. They impart a self-stabilizing character to the robot's gait, thus alleviating the need for compute-heavy feedback-based gait correction. This paper proposes a general approach to rapidly generate limit cycles with explicit stability constraints for a given dynamical system. In particular, we pose the problem of open-loop limit cycle stability as a single-stage constrained-optimization problem (COP), and use Direct Collocation to transcribe it into a nonlinear program (NLP) with closed-form expressions for constraints, objectives, and their gradients. The COP formulations of stability are developed based (1) on the spectral radius of a discrete return map, and (2) on the spectral radius of the system's monodromy matrix, where the spectral radius is bounded using different constraint-satisfaction formulations of the eigenvalue problem. We compare the performance and solution qualities of each approach, but specifically highlight the Schur decomposition of the monodromy matrix as a formulation which boasts wider applicability through weaker assumptions and attractive numerical convergence properties. Moreover, we present results from our experiments on a spring-loaded inverted pendulum model of a robot, where our method generated actuation trajectories for open-loop stable hopping in under 2 seconds (on the Intel Core i7-6700K), and produced energy-minimizing actuation trajectories even under tight stability constraints.
PNeRFLoc: Visual Localization with Point-based Neural Radiance Fields
- Authors: Authors: Boming Zhao, Luwei Yang, Mao Mao, Hujun Bao, Zhaopeng Cui
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.10649
- Pdf link: https://arxiv.org/pdf/2312.10649
- Abstract Due to the ability to synthesize high-quality novel views, Neural Radiance Fields (NeRF) have been recently exploited to improve visual localization in a known environment. However, the existing methods mostly utilize NeRFs for data augmentation to improve the regression model training, and the performance on novel viewpoints and appearances is still limited due to the lack of geometric constraints. In this paper, we propose a novel visual localization framework, \ie, PNeRFLoc, based on a unified point-based representation. On the one hand, PNeRFLoc supports the initial pose estimation by matching 2D and 3D feature points as traditional structure-based methods; on the other hand, it also enables pose refinement with novel view synthesis using rendering-based optimization. Specifically, we propose a novel feature adaption module to close the gaps between the features for visual localization and neural rendering. To improve the efficacy and efficiency of neural rendering-based optimization, we also develop an efficient rendering-based framework with a warping loss function. Furthermore, several robustness techniques are developed to handle illumination changes and dynamic objects for outdoor scenarios. Experiments demonstrate that PNeRFLoc performs the best on synthetic data when the NeRF model can be well learned and performs on par with the SOTA method on the visual localization benchmark datasets.
Explorers at #SMM4H 2023: Enhancing BERT for Health Applications through Knowledge and Model Fusion
- Authors: Authors: Xutong Yue, Xilai Wang, Yuxin He, Zhenkun Zhou
- Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)
- Arxiv link: https://arxiv.org/abs/2312.10652
- Pdf link: https://arxiv.org/pdf/2312.10652
- Abstract An increasing number of individuals are willing to post states and opinions in social media, which has become a valuable data resource for studying human health. Furthermore, social media has been a crucial research point for healthcare now. This paper outlines the methods in our participation in the #SMM4H 2023 Shared Tasks, including data preprocessing, continual pre-training and fine-tuned optimization strategies. Especially for the Named Entity Recognition (NER) task, we utilize the model architecture named W2NER that effectively enhances the model generalization ability. Our method achieved first place in the Task 3. This paper has been peer-reviewed and accepted for presentation at the #SMM4H 2023 Workshop.
Heuristics and Metaheuristics for Dynamic Management of Computing and Cooling Energy in Cloud Data Centers
- Authors: Authors: Patricia Arroba, José L. Risco-Martín, José M. Moya, José L. Ayala
- Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.10663
- Pdf link: https://arxiv.org/pdf/2312.10663
- Abstract Data centers handle impressive high figures in terms of energy consumption, and the growing popularity of Cloud applications is intensifying their computational demand. Moreover, the cooling needed to keep the servers within reliable thermal operating conditions also has an impact on the thermal distribution of the data room, thus affecting to servers' power leakage. Optimizing the energy consumption of these infrastructures is a major challenge to place data centers on a more scalable scenario. Thus, understanding the relationship between power, temperature, consolidation and performance is crucial to enable an energy-efficient management at the data center level. In this research, we propose novel power and thermal-aware strategies and models to provide joint cooling and computing optimizations from a local perspective based on the global energy consumption of metaheuristic-based optimizations. Our results show that the combined awareness from both metaheuristic and best fit decreasing algorithms allow us to describe the global energy into faster and lighter optimization strategies that may be used during runtime. This approach allows us to improve the energy efficiency of the data center, considering both computing and cooling infrastructures, in up to a 21.74% while maintaining quality of service.
Silkie: Preference Distillation for Large Visual Language Models
- Authors: Authors: Lei Li, Zhihui Xie, Mukai Li, Shunian Chen, Peiyi Wang, Liang Chen, Yazheng Yang, Benyou Wang, Lingpeng Kong
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2312.10665
- Pdf link: https://arxiv.org/pdf/2312.10665
- Abstract This paper explores preference distillation for large vision language models (LVLMs), improving their ability to generate helpful and faithful responses anchoring the visual context. We first build a vision-language feedback (VLFeedback) dataset utilizing AI annotation. Specifically, responses are generated by models sampled from 12 LVLMs, conditioned on multi-modal instructions sourced from various datasets. We adopt GPT-4V to assess the generated outputs regarding helpfulness, visual faithfulness, and ethical considerations. Furthermore, the preference supervision is distilled into Qwen-VL-Chat through the direct preference optimization (DPO) method. The resulting model Silkie, achieves 6.9% and 9.5% relative improvement on the MME benchmark regarding the perception and cognition capabilities, respectively. Silkie also demonstrates reduced hallucination by setting a new state-of-the-art score of 3.02 on the MMHal-Bench benchmark. Further analysis shows that DPO with our VLFeedback dataset mainly boosts the fine-grained perception and complex cognition abilities of LVLMs, leading to more comprehensive improvements compared to human-annotated preference datasets.
CACTO-SL: Using Sobolev Learning to improve Continuous Actor-Critic with Trajectory Optimization
- Authors: Authors: Elisa Alboni, Gianluigi Grandesso, Gastone Pietro Rosati Papini, Justin Carpentier, Andrea Del Prete
- Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2312.10666
- Pdf link: https://arxiv.org/pdf/2312.10666
- Abstract Trajectory Optimization (TO) and Reinforcement Learning (RL) are powerful and complementary tools to solve optimal control problems. On the one hand, TO can efficiently compute locally-optimal solutions, but it tends to get stuck in local minima if the problem is not convex. On the other hand, RL is typically less sensitive to non-convexity, but it requires a much higher computational effort. Recently, we have proposed CACTO (Continuous Actor-Critic with Trajectory Optimization), an algorithm that uses TO to guide the exploration of an actor-critic RL algorithm. In turns, the policy encoded by the actor is used to warm-start TO, closing the loop between TO and RL. In this work, we present an extension of CACTO exploiting the idea of Sobolev learning. To make the training of the critic network faster and more data efficient, we enrich it with the gradient of the Value function, computed via a backward pass of the differential dynamic programming algorithm. Our results show that the new algorithm is more efficient than the original CACTO, reducing the number of TO episodes by a factor ranging from 3 to 10, and consequently the computation time. Moreover, we show that CACTO-SL helps TO to find better minima and to produce more consistent results.
Addressing Sample Inefficiency in Multi-View Representation Learning
- Authors: Authors: Kumar Krishna Agrawal, Arna Ghosh, Adam Oberman, Blake Richards
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.10725
- Pdf link: https://arxiv.org/pdf/2312.10725
- Abstract Non-contrastive self-supervised learning (NC-SSL) methods like BarlowTwins and VICReg have shown great promise for label-free representation learning in computer vision. Despite the apparent simplicity of these techniques, researchers must rely on several empirical heuristics to achieve competitive performance, most notably using high-dimensional projector heads and two augmentations of the same image. In this work, we provide theoretical insights on the implicit bias of the BarlowTwins and VICReg loss that can explain these heuristics and guide the development of more principled recommendations. Our first insight is that the orthogonality of the features is more critical than projector dimensionality for learning good representations. Based on this, we empirically demonstrate that low-dimensional projector heads are sufficient with appropriate regularization, contrary to the existing heuristic. Our second theoretical insight suggests that using multiple data augmentations better represents the desiderata of the SSL objective. Based on this, we demonstrate that leveraging more augmentations per sample improves representation quality and trainability. In particular, it improves optimization convergence, leading to better features emerging earlier in the training. Remarkably, we demonstrate that we can reduce the pretraining dataset size by up to 4x while maintaining accuracy and improving convergence simply by using more data augmentations. Combining these insights, we present practical pretraining recommendations that improve wall-clock time by 2x and improve performance on CIFAR-10/STL-10 datasets using a ResNet-50 backbone. Thus, this work provides a theoretical insight into NC-SSL and produces practical recommendations for enhancing its sample and compute efficiency.
Federated learning with differential privacy and an untrusted aggregator
- Authors: Authors: Kunlong Liu, Trinabh Gupta
- Subjects: Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2312.10789
- Pdf link: https://arxiv.org/pdf/2312.10789
- Abstract Federated learning for training models over mobile devices is gaining popularity. Current systems for this task exhibit significant trade-offs between model accuracy, privacy guarantee, and device efficiency. For instance, Oort (OSDI 2021) provides excellent accuracy and efficiency but requires a trusted central server. On the other hand, Orchard (OSDI 2020) provides good accuracy and the rigorous guarantee of differential privacy over an untrusted server, but creates huge overhead for the devices. This paper describes Aero, a new federated learning system that significantly improves this trade-off. Aero guarantees good accuracy, differential privacy over an untrusted server, and keeps the device overhead low. The key idea of Aero is to tune system architecture and design to a specific set of popular, federated learning algorithms. This tuning requires novel optimizations and techniques, e.g., a new protocol to securely aggregate updates from devices. An evaluation of Aero demonstrates that it provides comparable accuracy to plain federated learning (without differential privacy), and it improves efficiency (CPU and network) over Orchard by up to $10^5\times$.
Deep-Dispatch: A Deep Reinforcement Learning-Based Vehicle Dispatch Algorithm for Advanced Air Mobility
- Authors: Authors: Elaheh Sabziyan Varnousfaderani, Syed A. M. Shihab, Esrat F. Dulia
- Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10809
- Pdf link: https://arxiv.org/pdf/2312.10809
- Abstract Near future air taxi operations with electric vertical take-off and landing (eVTOL) aircraft will be constrained by the need for frequent recharging of eVTOLs, limited takeoff and landing pads in vertiports, and subject to time-varying demand and electricity prices, making the eVTOL dispatch problem unique and particularly challenging to solve. Previously, we have developed optimization models to address this problem. Such optimization models however suffer from prohibitively high computational run times when the scale of the problem increases, making them less practical for real world implementation. To overcome this issue, we have developed two deep reinforcement learning-based eVTOL dispatch algorithms, namely single-agent and multi-agent deep Q-learning eVTOL dispatch algorithms, where the objective is to maximize operating profit. An eVTOL-based passenger transportation simulation environment was built to assess the performance of our algorithms across $36$ numerical cases with varying number of eVTOLs, vertiports, and demand. The results indicate that the multi-agent eVTOL dispatch algorithm can closely approximate the optimal dispatch policy with significantly less computational expenses compared to the benchmark optimization model. The multi-agent algorithm was found to outperform the single-agent counterpart with respect to both profits generated and training time.
Contextual Reinforcement Learning for Offshore Wind Farm Bidding
- Authors: Authors: David Cole, Himanshu Sharma, Wei Wang
- Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2312.10884
- Pdf link: https://arxiv.org/pdf/2312.10884
- Abstract We propose a framework for applying reinforcement learning to contextual two-stage stochastic optimization and apply this framework to the problem of energy market bidding of an off-shore wind farm. Reinforcement learning could potentially be used to learn close to optimal solutions for first stage variables of a two-stage stochastic program under different contexts. Under the proposed framework, these solutions would be learned without having to solve the full two-stage stochastic program. We present initial results of training using the DDPG algorithm and present intended future steps to improve performance.
GINN-LP: A Growing Interpretable Neural Network for Discovering Multivariate Laurent Polynomial Equations
- Authors: Authors: Nisal Ranasinghe, Damith Senanayake, Sachith Seneviratne, Malin Premaratne, Saman Halgamuge
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.10913
- Pdf link: https://arxiv.org/pdf/2312.10913
- Abstract Traditional machine learning is generally treated as a black-box optimization problem and does not typically produce interpretable functions that connect inputs and outputs. However, the ability to discover such interpretable functions is desirable. In this work, we propose GINN-LP, an interpretable neural network to discover the form and coefficients of the underlying equation of a dataset, when the equation is assumed to take the form of a multivariate Laurent Polynomial. This is facilitated by a new type of interpretable neural network block, named the "power-term approximator block", consisting of logarithmic and exponential activation functions. GINN-LP is end-to-end differentiable, making it possible to use backpropagation for training. We propose a neural network growth strategy that will enable finding the suitable number of terms in the Laurent polynomial that represents the data, along with sparsity regularization to promote the discovery of concise equations. To the best of our knowledge, this is the first model that can discover arbitrary multivariate Laurent polynomial terms without any prior information on the order. Our approach is first evaluated on a subset of data used in SRBench, a benchmark for symbolic regression. We first show that GINN-LP outperforms the state-of-the-art symbolic regression methods on datasets generated using 48 real-world equations in the form of multivariate Laurent polynomials. Next, we propose an ensemble method that combines our method with a high-performing symbolic regression method, enabling us to discover non-Laurent polynomial equations. We achieve state-of-the-art results in equation discovery, showing an absolute improvement of 7.1% over the best contender, by applying this ensemble method to 113 datasets within SRBench with known ground-truth equations.
Semi-Supervised Clustering via Structural Entropy with Different Constraints
- Authors: Authors: Guangjie Zeng, Hao Peng, Angsheng Li, Zhiwei Liu, Runze Yang, Chunyang Liu, Lifang He
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10917
- Pdf link: https://arxiv.org/pdf/2312.10917
- Abstract Semi-supervised clustering techniques have emerged as valuable tools for leveraging prior information in the form of constraints to improve the quality of clustering outcomes. Despite the proliferation of such methods, the ability to seamlessly integrate various types of constraints remains limited. While structural entropy has proven to be a powerful clustering approach with wide-ranging applications, it has lacked a variant capable of accommodating these constraints. In this work, we present Semi-supervised clustering via Structural Entropy (SSE), a novel method that can incorporate different types of constraints from diverse sources to perform both partitioning and hierarchical clustering. Specifically, we formulate a uniform view for the commonly used pairwise and label constraints for both types of clustering. Then, we design objectives that incorporate these constraints into structural entropy and develop tailored algorithms for their optimization. We evaluate SSE on nine clustering datasets and compare it with eleven semi-supervised partitioning and hierarchical clustering methods. Experimental results demonstrate the superiority of SSE on clustering accuracy with different types of constraints. Additionally, the functionality of SSE for biological data analysis is demonstrated by cell clustering experiments conducted on four single-cell RNAseq datasets.
LabelCraft: Empowering Short Video Recommendations with Automated Label Crafting
- Authors: Authors: Yimeng Bai, Yang Zhang, Jing Lu, Jianxin Chang, Xiaoxue Zang, Yanan Niu, Yang Song, Fuli Feng
- Subjects: Information Retrieval (cs.IR)
- Arxiv link: https://arxiv.org/abs/2312.10947
- Pdf link: https://arxiv.org/pdf/2312.10947
- Abstract Short video recommendations often face limitations due to the quality of user feedback, which may not accurately depict user interests. To tackle this challenge, a new task has emerged: generating more dependable labels from original feedback. Existing label generation methods rely on manual rules, demanding substantial human effort and potentially misaligning with the desired objectives of the platform. To transcend these constraints, we introduce LabelCraft, a novel automated label generation method explicitly optimizing pivotal operational metrics for platform success. By formulating label generation as a higher-level optimization problem above recommender model optimization, LabelCraft introduces a trainable labeling model for automatic label mechanism modeling. Through meta-learning techniques, LabelCraft effectively addresses the bi-level optimization hurdle posed by the recommender and labeling models, enabling the automatic acquisition of intricate label generation mechanisms.Extensive experiments on real-world datasets corroborate LabelCraft's excellence across varied operational metrics, encompassing usage time, user engagement, and retention. Codes are available at https://github.com/baiyimeng/LabelCraft.
Generative linguistic representation for spoken language identification
- Authors: Authors: Peng Shen, Xuguang Lu, Hisashi Kawai
- Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- Arxiv link: https://arxiv.org/abs/2312.10964
- Pdf link: https://arxiv.org/pdf/2312.10964
- Abstract Effective extraction and application of linguistic features are central to the enhancement of spoken Language IDentification (LID) performance. With the success of recent large models, such as GPT and Whisper, the potential to leverage such pre-trained models for extracting linguistic features for LID tasks has become a promising area of research. In this paper, we explore the utilization of the decoder-based network from the Whisper model to extract linguistic features through its generative mechanism for improving the classification accuracy in LID tasks. We devised two strategies - one based on the language embedding method and the other focusing on direct optimization of LID outputs while simultaneously enhancing the speech recognition tasks. We conducted experiments on the large-scale multilingual datasets MLS, VoxLingua107, and CommonVoice to test our approach. The experimental results demonstrated the effectiveness of the proposed method on both in-domain and out-of-domain datasets for LID tasks.
A Hybrid Intelligent Framework for Maximising SAG Mill Throughput: An Integration of Expert Knowledge, Machine Learning and Evolutionary Algorithms for Parameter Optimisation
- Authors: Authors: Zahra Ghasemi, Mehdi Neshat, Chris Aldrich, John Karageorgos, Max Zanin, Frank Neumann, Lei Chen
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2312.10992
- Pdf link: https://arxiv.org/pdf/2312.10992
- Abstract In mineral processing plants, grinding is a crucial step, accounting for approximately 50 percent of the total mineral processing costs. Semi-autogenous grinding mills are extensively employed in the grinding circuit of mineral processing plants. Maximizing SAG mill throughput is of significant importance considering its profound financial outcomes. However, the optimum process parameter setting aimed at achieving maximum mill throughput remains an uninvestigated domain in prior research. This study introduces a hybrid intelligent framework leveraging expert knowledge, machine learning techniques, and evolutionary algorithms to address this research need. In this study, we utilize an extensive industrial dataset comprising 36743 records and select relevant features based on the insights of industry experts. Following the removal of erroneous data, a comprehensive evaluation of 17 diverse machine learning models is undertaken to identify the most accurate predictive model. To improve the model performance, feature selection and outlier detection are executed. The resultant optimal model, trained with refined features, serves as the objective function within three distinct evolutionary algorithms. These algorithms are employed to identify parameter configurations that maximize SAG mill throughput while adhering to the working limits of input parameters as constraints. Notably, our analysis revealed that CatBoost, as an ensemble model, stands out as the most accurate predictor. Furthermore, differential evolution emerges as the preferred optimization algorithm, exhibiting superior performance in both achieving the highest mill throughput predictions and ensuring robustness in predictions, surpassing alternative methods.
Retrieval-Augmented Generation for Large Language Models: A Survey
- Authors: Authors: Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Haofen Wang
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.10997
- Pdf link: https://arxiv.org/pdf/2312.10997
- Abstract Large language models (LLMs) demonstrate powerful capabilities, but they still face challenges in practical applications, such as hallucinations, slow knowledge updates, and lack of transparency in answers. Retrieval-Augmented Generation (RAG) refers to the retrieval of relevant information from external knowledge bases before answering questions with LLMs. RAG has been demonstrated to significantly enhance answer accuracy, reduce model hallucination, particularly for knowledge-intensive tasks. By citing sources, users can verify the accuracy of answers and increase trust in model outputs. It also facilitates knowledge updates and the introduction of domain-specific knowledge. RAG effectively combines the parameterized knowledge of LLMs with non-parameterized external knowledge bases, making it one of the most important methods for implementing large language models. This paper outlines the development paradigms of RAG in the era of LLMs, summarizing three paradigms: Naive RAG, Advanced RAG, and Modular RAG. It then provides a summary and organization of the three main components of RAG: retriever, generator, and augmentation methods, along with key technologies in each component. Furthermore, it discusses how to evaluate the effectiveness of RAG models, introducing two evaluation methods for RAG, emphasizing key metrics and abilities for evaluation, and presenting the latest automatic evaluation framework. Finally, potential future research directions are introduced from three aspects: vertical optimization, horizontal scalability, and the technical stack and ecosystem of RAG.
On the Benefits of Rate-Adaptive Transceivers: A Network Planning Study
- Authors: Authors: Jasper Müller, Gabriele Di Rosa, Tobias Fehenberger, Mario Wenning, Sai Kireet Patri, Jörg-Peter Elbers, Carmen Mas-Machuca
- Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2312.11005
- Pdf link: https://arxiv.org/pdf/2312.11005
- Abstract Flexible-grid Elastic Optical Networks (EONs) have been widely deployed in recent years to support the growing demand for bandwidth-intensive applications. To address this cost-efficiently, optimized utilization of EONs is required. Next-generation bandwidth-variable transceivers (BVTs) will offer increased adaptivity in symbol rate as well as modulation through probabilistic constellation shaping. In this work, we therefore investigate the impact of increased configuration granularity on various aspects of optical networks. We account for practical implementation considerations of BVT configurations for the estimation of the required signal-to-noise ratio. Additionally, an optimization algorithm is presented that selects the most efficient configuration for each considered data rate and bandwidth combination. Based on the advanced transceiver configurations, we conduct a network planning study using a physical-layer-aware algorithm for flexible-grid EONs, and present results for a national and a continental optical backbone network topology. Our research demonstrates that a rise in modulation rate adaptivity results in substantial savings in resources, decreasing the number of necessary lightpaths by as much as 20% in EONs. In contrast, increased symbol rate granularity only results in minor savings.
Multi-Goal Optimal Route Planning Using the Cell Mapping Technique
- Authors: Authors: Athanasios Karagounis
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2312.11025
- Pdf link: https://arxiv.org/pdf/2312.11025
- Abstract This manuscript explores the complexities of multi-objective path planning, aiming to optimize routes against a backdrop of conflicting performance criteria. The study integrates the cell mapping approach as its foundational concept. A two-pronged search strategy is introduced; initially, the cell mapping technique is utilized to develop a comprehensive database, encompassing all cells within the specified area. This database records the performance metrics for the most efficient routes from each cell to the designated target. The second phase involves analyzing this database to pinpoint the extent and count of all Pareto optimal routes from a selected starting cell to the target. This analysis contributes to solving the overarching multi-objective optimization challenge inherent in path planning. To validate this approach, case studies are included, and the results are benchmarked against the well-established multi-objective A* (MOA*) method. The study discovers that while the cell mapping method achieves similar outcomes to the MOA* method for routes originating from a single point, it demonstrates superior computational benefits, particularly when the starting and ending points are in separate, non-overlapping areas.
UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models
- Authors: Authors: Xiaoxi Li, Yujia Zhou, Zhicheng Dou
- Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2312.11036
- Pdf link: https://arxiv.org/pdf/2312.11036
- Abstract Generative information retrieval, encompassing two major tasks of Generative Document Retrieval (GDR) and Grounded Answer Generation (GAR), has gained significant attention in the area of information retrieval and natural language processing. Existing methods for GDR and GAR rely on separate retrieval and reader modules, which hinder simultaneous optimization. To overcome this, we present \textbf{UniGen}, a \textbf{Uni}fied \textbf{Gen}erative framework for retrieval and question answering that integrates both tasks into a single generative model leveraging the capabilities of large language models. UniGen employs a shared encoder and two distinct decoders for generative retrieval and question answering. To facilitate the learning of both tasks, we introduce connectors, generated by large language models, to bridge the gaps between query inputs and generation targets, as well as between document identifiers and answers. Furthermore, we propose an iterative enhancement strategy that leverages generated answers and retrieved documents to iteratively improve both tasks. Through extensive experiments on the MS MARCO and NQ datasets, we demonstrate the effectiveness of UniGen, showcasing its superior performance in both the retrieval and the question answering tasks.
Multi-Correlation Siamese Transformer Network with Dense Connection for 3D Single Object Tracking
- Authors: Authors: Shihao Feng, Pengpeng Liang, Jin Gao, Erkang Cheng
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2312.11051
- Pdf link: https://arxiv.org/pdf/2312.11051
- Abstract Point cloud-based 3D object tracking is an important task in autonomous driving. Though great advances regarding Siamese-based 3D tracking have been made recently, it remains challenging to learn the correlation between the template and search branches effectively with the sparse LIDAR point cloud data. Instead of performing correlation of the two branches at just one point in the network, in this paper, we present a multi-correlation Siamese Transformer network that has multiple stages and carries out feature correlation at the end of each stage based on sparse pillars. More specifically, in each stage, self-attention is first applied to each branch separately to capture the non-local context information. Then, cross-attention is used to inject the template information into the search area. This strategy allows the feature learning of the search area to be aware of the template while keeping the individual characteristics of the template intact. To enable the network to easily preserve the information learned at different stages and ease the optimization, for the search area, we densely connect the initial input sparse pillars and the output of each stage to all subsequent stages and the target localization network, which converts pillars to bird's eye view (BEV) feature maps and predicts the state of the target with a small densely connected convolution network. Deep supervision is added to each stage to further boost the performance as well. The proposed algorithm is evaluated on the popular KITTI, nuScenes, and Waymo datasets, and the experimental results show that our method achieves promising performance compared with the state-of-the-art. Ablation study that shows the effectiveness of each component is provided as well. Code is available at https://github.com/liangp/MCSTN-3DSOT.
MA-BBOB: A Problem Generator for Black-Box Optimization Using Affine Combinations and Shifts
- Authors: Authors: Diederick Vermetten, Furong Ye, Thomas Bäck, Carola Doerr
- Subjects: Neural and Evolutionary Computing (cs.NE)
- Arxiv link: https://arxiv.org/abs/2312.11083
- Pdf link: https://arxiv.org/pdf/2312.11083
- Abstract Choosing a set of benchmark problems is often a key component of any empirical evaluation of iterative optimization heuristics. In continuous, single-objective optimization, several sets of problems have become widespread, including the well-established BBOB suite. While this suite is designed to enable rigorous benchmarking, it is also commonly used for testing methods such as algorithm selection, which the suite was never designed around. We present the MA-BBOB function generator, which uses the BBOB suite as component functions in an affine combination. In this work, we describe the full procedure to create these affine combinations and highlight the trade-offs of several design decisions, specifically the choice to place the optimum uniformly at random in the domain. We then illustrate how this generator can be used to gain more low-level insight into the function landscapes through the use of exploratory landscape analysis. Finally, we show a potential use-case of MA-BBOB in generating a wide set of training and testing data for algorithm selectors. Using this setup, we show that the basic scheme of using a set of landscape features to predict the best algorithm does not lead to optimal results, and that an algorithm selector trained purely on the BBOB functions generalizes poorly to the affine combinations.
Colored Noise in PPO: Improved Exploration and Performance Through Correlated Action Sampling
- Authors: Authors: Jakob Hollenstein, Georg Martius, Justus Piater
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.11091
- Pdf link: https://arxiv.org/pdf/2312.11091
- Abstract Proximal Policy Optimization (PPO), a popular on-policy deep reinforcement learning method, employs a stochastic policy for exploration. In this paper, we propose a colored noise-based stochastic policy variant of PPO. Previous research highlighted the importance of temporal correlation in action noise for effective exploration in off-policy reinforcement learning. Building on this, we investigate whether correlated noise can also enhance exploration in on-policy methods like PPO. We discovered that correlated noise for action selection improves learning performance and outperforms the currently popular uncorrelated white noise approach in on-policy methods. Unlike off-policy learning, where pink noise was found to be highly effective, we found that a colored noise, intermediate between white and pink, performed best for on-policy learning in PPO. We examined the impact of varying the amount of data collected for each update by modifying the number of parallel simulation environments for data collection and observed that with a larger number of parallel environments, more strongly correlated noise is beneficial. Due to the significant impact and ease of implementation, we recommend switching to correlated noise as the default noise source in PPO.
Evaluation of Dataframe Libraries for Data Preparation on a Single Machine
- Authors: Authors: Angelo Mozzillo, Luca Zecchini, Luca Gagliardelli, Adeel Aslam, Sonia Bergamaschi, Giovanni Simonini
- Subjects: Databases (cs.DB)
- Arxiv link: https://arxiv.org/abs/2312.11122
- Pdf link: https://arxiv.org/pdf/2312.11122
- Abstract Data preparation is a trial-and-error process that typically involves countless iterations over the data to define the best pipeline of operators for a given task. With tabular data, practitioners often perform that burdensome activity on local machines by writing ad hoc scripts with libraries based on the Pandas dataframe API and testing them on samples of the entire dataset--the faster the library, the less idle time its users have. In this paper, we evaluate the most popular Python dataframe libraries in general data preparation use cases to assess how they perform on a single machine. To do so, we employ 4 real-world datasets and pipelines with distinct characteristics, covering a variety of scenarios. The insights gained with this experimentation are useful to data scientists who need to choose which of the dataframe libraries best suits their data preparation task at hand. In a nutshell, we found that: for small datasets, Pandas consistently proves to be the best choice with the richest API; when RAM is limited and there is no need to complete compatibility with Pandas API, Polars is the go-to choice thanks to its resource and query optimization; when a GPU is available, CuDF often yields the best performance, while for very large datasets that cannot fit in the GPU memory and RAM, PySpark (thanks to a multi-thread execution and a query optimizer) and Vaex (exploiting a columnar data format) are the best options.
Efficiency-oriented approaches for self-supervised speech representation learning
- Authors: Authors: Luis Lugo, Valentin Vielzeuf
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2312.11142
- Pdf link: https://arxiv.org/pdf/2312.11142
- Abstract Self-supervised learning enables the training of large neural models without the need for large, labeled datasets. It has been generating breakthroughs in several fields, including computer vision, natural language processing, biology, and speech. In particular, the state-of-the-art in several speech processing applications, such as automatic speech recognition or speaker identification, are models where the latent representation is learned using self-supervised approaches. Several configurations exist in self-supervised learning for speech, including contrastive, predictive, and multilingual approaches. There is, however, a crucial limitation in most existing approaches: their high computational costs. These costs limit the deployment of models, the size of the training dataset, and the number of research groups that can afford research with large self-supervised models. Likewise, we should consider the environmental costs that high energy consumption implies. Efforts in this direction comprise optimization of existing models, neural architecture efficiency, improvements in finetuning for speech processing tasks, and data efficiency. But despite current efforts, more work could be done to address high computational costs in self-supervised representation learning.
OsmLocator: locating overlapping scatter marks by simulated annealing on clustering-based re-visualization
- Authors: Authors: Yuming Qiu, Aleksandra Pizurica, Qi Ming, Nicolas Nadisic
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.11146
- Pdf link: https://arxiv.org/pdf/2312.11146
- Abstract Automated mark localization in scatter images, greatly helpful for discovering knowledge and understanding enormous document images and reasoning in visual question answering AI systems, is a highly challenging problem because of the ubiquity of overlapping marks. Locating overlapping marks faces many difficulties such as no texture, less contextual information, hallow shape and tiny size. Here, we formulate it as a combinatorial optimization problem on clustering-based re-visualization, to locate scatter marks by finding the status of multi-variables when an objective function reaches a minimum. The objective function is constructed on difference between binarized scatter images and corresponding re-visualization based on their clustering. Fundamentally, re-visualization tries to redraw a new scatter graph only taking a rasterized scatter image as an input, and clustering is employed to provide the information for such re-visualization. This method could stably locate severely-overlapping, variable-size and variable-shape marks in scatter images without dependence of any training dataset or reference. Meanwhile, we propose an adaptive variant of simulated annealing which can works on various connected regions. In addition, we especially built a dataset named SML2023 containing hundreds of scatter images with different markers and various levels of overlapping severity, and tested the proposed method and compared it to existing methods. The results show that it can accurately locate most marks in scatter images with different overlapping severity and marker types, with about 0.3 absolute increase on an assignment-cost-based metric in comparison with state-of-the-art methods. This work is of value to data mining on massive web pages and literatures, and shedding new light on image measurement such as bubble counting.
A low-rank non-convex norm method for multiview graph clustering
- Authors: Authors: Alaeddine Zahir, Khalide Jbilou, Ahmed Ratnani
- Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2312.11157
- Pdf link: https://arxiv.org/pdf/2312.11157
- Abstract This study introduces a novel technique for multi-view clustering known as the "Consensus Graph-Based Multi-View Clustering Method Using Low-Rank Non-Convex Norm" (CGMVC-NC). Multi-view clustering is a challenging task in machine learning as it requires the integration of information from multiple data sources or views to cluster data points accurately. The suggested approach makes use of the structural characteristics of multi-view data tensors, introducing a non-convex tensor norm to identify correlations between these views. In contrast to conventional methods, this approach demonstrates superior clustering accuracy across several benchmark datasets. Despite the non-convex nature of the tensor norm used, the proposed method remains amenable to efficient optimization using existing algorithms. The approach provides a valuable tool for multi-view data analysis and has the potential to enhance our understanding of complex systems in various fields. Further research can explore the application of this method to other types of data and extend it to other machine-learning tasks.
State-action control barrier functions: Imposing safety on learning-based control with low online computational costs
- Authors: Authors: Kanghui He, Shengling Shi, Ton van den Boom, Bart De Schutter
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2312.11255
- Pdf link: https://arxiv.org/pdf/2312.11255
- Abstract Learning-based control with safety guarantees usually requires real-time safety certification and modifications of possibly unsafe learning-based policies. The control barrier function (CBF) method uses a safety filter containing a constrained optimization problem to produce safe policies. However, finding a valid CBF for a general nonlinear system requires a complex function parameterization, which in general, makes the policy optimization problem difficult to solve in real time. For nonlinear systems with nonlinear state constraints, this paper proposes the novel concept of state-action CBFs, which not only characterize the safety at each state but also evaluate the control inputs taken at each state. State-action CBFs, in contrast to CBFs, enable a flexible parameterization, resulting in a safety filter that involves a convex quadratic optimization problem. This, in turn, significantly alleviates the online computational burden. To synthesize state-action CBFs, we propose a learning-based approach exploiting Hamilton-Jacobi reachability. The effect of learning errors on the effectiveness of state-action CBFs is addressed by constraint tightening and introducing a new concept called contractive CBFs. These contributions ensure formal safety guarantees for learned CBFs and control policies, enhancing the applicability of learning-based control in real-time scenarios. Simulation results on an inverted pendulum with elastic walls validate the proposed CBFs in terms of constraint satisfaction and CPU time.
LLM-ARK: Knowledge Graph Reasoning Using Large Language Models via Deep Reinforcement Learning
- Authors: Authors: Yuxuan Huang
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.11282
- Pdf link: https://arxiv.org/pdf/2312.11282
- Abstract With the evolution of pre-training methods, large language models (LLMs) have exhibited exemplary reasoning capabilities via prompt engineering. However, the absence of Knowledge Graph (KG) environment awareness and the challenge of engineering viable optimization mechanisms for intermediary reasoning processes, constrict the performance of LLMs on KG reasoning tasks compared to smaller models. We introduce LLM-ARK, a LLM grounded KG reasoning agent designed to deliver precise and adaptable predictions on KG paths. LLM-ARK utilizes Full Textual Environment (FTE) prompts to assimilate state information for each step-sized intelligence. Leveraging LLMs to richly encode and represent various types of inputs and integrate the knowledge graph further with path environment data, before making the final decision. Reframing the Knowledge Graph (KG) multi-hop inference problem as a sequential decision-making issue, we optimize our model using the Proximal Policy Optimization (PPO) online policy gradient reinforcement learning algorithm which allows the model to learn from a vast array of reward signals across diverse tasks and environments. We evaluate state-of-the-art LLM(GPT-4) and our method which using open-source models of varying sizes on OpenDialKG dataset. Our experiment shows that LLaMA7B-ARK provides excellent results with a performance rate of 48.75% for the target@1 evaluation metric, far exceeding the current state-of-the-art model by 17.64 percentage points. Meanwhile, GPT-4 accomplished a score of only 14.91%, further highlighting the efficacy and complexity of our methodology. Our code is available on GitHub for further access.
Domain Invariant Learning for Gaussian Processes and Bayesian Exploration
- Authors: Authors: Xilong Zhao, Siyuan Bian, Yaoyun Zhang, Yuliang Zhang, Qinying Gu, Xinbing Wang, Chenghu Zhou, Nanyang Ye
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.11318
- Pdf link: https://arxiv.org/pdf/2312.11318
- Abstract Out-of-distribution (OOD) generalization has long been a challenging problem that remains largely unsolved. Gaussian processes (GP), as popular probabilistic model classes, especially in the small data regime, presume strong OOD generalization abilities. Surprisingly, their OOD generalization abilities have been under-explored before compared with other lines of GP research. In this paper, we identify that GP is not free from the problem and propose a domain invariant learning algorithm for Gaussian processes (DIL-GP) with a min-max optimization on the likelihood. DIL-GP discovers the heterogeneity in the data and forces invariance across partitioned subsets of data. We further extend the DIL-GP to improve Bayesian optimization's adaptability on changing environments. Numerical experiments demonstrate the superiority of DIL-GP for predictions on several synthetic and real-world datasets. We further demonstrate the effectiveness of the DIL-GP Bayesian optimization method on a PID parameters tuning experiment for a quadrotor. The full version and source code are available at: https://github.com/Billzxl/DIL-GP.
Density Descent for Diversity Optimization
- Authors: Authors: David H. Lee, Anishalakshmi V. Palaparthi, Matthew C. Fontaine, Bryon Tjanaka, Stefanos Nikolaidis
- Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
- Arxiv link: https://arxiv.org/abs/2312.11331
- Pdf link: https://arxiv.org/pdf/2312.11331
- Abstract Diversity optimization seeks to discover a set of solutions that elicit diverse features. Prior work has proposed Novelty Search (NS), which, given a current set of solutions, seeks to expand the set by finding points in areas of low density in the feature space. However, to estimate density, NS relies on a heuristic that considers the k-nearest neighbors of the search point in the feature space, which yields a weaker stability guarantee. We propose Density Descent Search (DDS), an algorithm that explores the feature space via gradient descent on a continuous density estimate of the feature space that also provides stronger stability guarantee. We experiment with DDS and two density estimation methods: kernel density estimation (KDE) and continuous normalizing flow (CNF). On several standard diversity optimization benchmarks, DDS outperforms NS, the recently proposed MAP-Annealing algorithm, and other state-of-the-art baselines. Additionally, we prove that DDS with KDE provides stronger stability guarantees than NS, making it more suitable for adaptive optimizers. Furthermore, we prove that NS is a special case of DDS that descends a KDE of the feature space.
Optimize and Reduce: A Top-Down Approach for Image Vectorization
- Authors: Authors: Or Hirschorn, Amir Jevnisek, Shai Avidan
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- Arxiv link: https://arxiv.org/abs/2312.11334
- Pdf link: https://arxiv.org/pdf/2312.11334
- Abstract Vector image representation is a popular choice when editability and flexibility in resolution are desired. However, most images are only available in raster form, making raster-to-vector image conversion (vectorization) an important task. Classical methods for vectorization are either domain-specific or yield an abundance of shapes which limits editability and interpretability. Learning-based methods, that use differentiable rendering, have revolutionized vectorization, at the cost of poor generalization to out-of-training distribution domains, and optimization-based counterparts are either slow or produce non-editable and redundant shapes. In this work, we propose Optimize & Reduce (O&R), a top-down approach to vectorization that is both fast and domain-agnostic. O&R aims to attain a compact representation of input images by iteratively optimizing B'ezier curve parameters and significantly reducing the number of shapes, using a devised importance measure. We contribute a benchmark of five datasets comprising images from a broad spectrum of image complexities - from emojis to natural-like images. Through extensive experiments on hundreds of images, we demonstrate that our method is domain agnostic and outperforms existing works in both reconstruction and perceptual quality for a fixed number of shapes. Moreover, we show that our algorithm is $\times 10$ faster than the state-of-the-art optimization-based method.
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
- Authors: Authors: Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
- Arxiv link: https://arxiv.org/abs/2312.11360
- Pdf link: https://arxiv.org/pdf/2312.11360
- Abstract We present Paint-it, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS). We observe that directly applying SDS yields undesirable texture quality due to its noisy gradients. We reveal the importance of texture parameterization when using SDS. Specifically, we propose Deep Convolutional Physically-Based Rendering (DC-PBR) parameterization, which re-parameterizes the physically-based rendering (PBR) texture maps with randomly initialized convolution-based neural kernels, instead of a standard pixel-based parameterization. We show that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS. In experiments, Paint-it obtains remarkable quality PBR texture maps within 15 min., given only a text description. We demonstrate the generalizability and practicality of Paint-it by synthesizing high-quality texture maps for large-scale mesh datasets and showing test-time applications such as relighting and material control using a popular graphics engine. Project page: https://kim-youwang.github.io/paint-it
Orientation-Constrained System for Lamp Detection in Buildings Based on Computer Vision
- Authors: Authors: Francisco Troncoso-Pastoriza, Pablo Eguía-Oller, Rebeca P. Díaz-Redondo, Enrique Granada-Álvarez, Aitor Erkoreka
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.11380
- Pdf link: https://arxiv.org/pdf/2312.11380
- Abstract Computer vision is used in this work to detect lighting elements in buildings with the goal of improving the accuracy of previous methods to provide a precise inventory of the location and state of lamps. Using the framework developed in our previous works, we introduce two new modifications to enhance the system: first, a constraint on the orientation of the detected poses in the optimization methods for both the initial and the refined estimates based on the geometric information of the building information modelling (BIM) model; second, an additional reprojection error filtering step to discard the erroneous poses introduced with the orientation restrictions, keeping the identification and localization errors low while greatly increasing the number of detections. These~enhancements are tested in five different case studies with more than 30,000 images, with results showing improvements in the number of detections, the percentage of correct model and state identifications, and the distance between detections and reference positions
Path-aware optimistic optimization for a mobile robot
- Authors: Authors: Tudor Santejudean, Lucian Busoniu
- Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2312.11383
- Pdf link: https://arxiv.org/pdf/2312.11383
- Abstract We consider problems in which a mobile robot samples an unknown function defined over its operating space, so as to find a global optimum of this function. The path traveled by the robot matters, since it influences energy and time requirements. We consider a branch-and-bound algorithm called deterministic optimistic optimization, and extend it to the path-aware setting, obtaining path-aware optimistic optimization (OOPA). In this new algorithm, the robot decides how to move next via an optimal control problem that maximizes the long-term impact of the robot trajectory on lowering the upper bound, weighted by bound and function values to focus the search on the optima. An online version of value iteration is used to solve an approximate version of this optimal control problem. OOPA is evaluated in extensive experiments in two dimensions, where it does better than path-unaware and local-optimization baselines.
On Computing Optimal Temporal Branchings and Spanning Subgraphs
- Authors: Authors: Daniela Bubboloni, Costanza Catalano, Andrea Marino, Ana Silva
- Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
- Arxiv link: https://arxiv.org/abs/2312.11390
- Pdf link: https://arxiv.org/pdf/2312.11390
- Abstract In this work we extend the concept of out/in-branchings spanning the vertices of a digraph (also called directed spanning trees) to temporal graphs, which are digraphs where arcs are available only at prescribed times. While the literature has focused on minimum weight/earliest arrival time Temporal Out-Branchings (TOB), we solve the problem for other optimization criteria. In particular, we define five different types of TOBs based on the optimization of the travel duration (FT-TOB), of the departure time (LD-TOB), of the number of transfers (MT-TOB), of the total waiting time (MW-TOB), and of the travelling time (ST-TOB). For D$\in {$LD,MT,ST$}$, we provide necessary and sufficient conditions for the existence of a spanning D-TOB; when it does not exist, we characterize the maximum vertex set that a D-TOB can span. Moreover, we provide a log linear algorithm for computing such branchings. For D$\in {$FT,MW$}$, we prove that deciding the existence of a spanning D-TOB is NP-complete; we also show that the same results hold for optimal temporal in-branchings. Finally, we investigate the related problem of computing a spanning temporal subgraph with the minimum number of arcs and optimizing a chosen criterion D. This problem turns out to be NP-hard for any D. The hardness results are quite surprising, as computing optimal paths between nodes can always be done in polynomial time.
MAG-Edit: Localized Image Editing in Complex Scenarios via $\underline{M}$ask-Based $\underline{A}$ttention-Adjusted $\underline{G}$uidance
- Authors: Authors: Qi Mao, Lan Chen, Yuchao Gu, Zhen Fang, Mike Zheng Shou
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.11396
- Pdf link: https://arxiv.org/pdf/2312.11396
- Abstract Recent diffusion-based image editing approaches have exhibited impressive editing capabilities in images with simple compositions. However, localized editing in complex scenarios has not been well-studied in the literature, despite its growing real-world demands. Existing mask-based inpainting methods fall short of retaining the underlying structure within the edit region. Meanwhile, mask-free attention-based methods often exhibit editing leakage and misalignment in more complex compositions. In this work, we develop $\textbf{MAG-Edit}$, a training-free, inference-stage optimization method, which enables localized image editing in complex scenarios. In particular, MAG-Edit optimizes the noise latent feature in diffusion models by maximizing two mask-based cross-attention constraints of the edit token, which in turn gradually enhances the local alignment with the desired prompt. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method in achieving both text alignment and structure preservation for localized editing within complex scenarios.
Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF
- Authors: Authors: Wei Xiong, Hanze Dong, Chenlu Ye, Han Zhong, Nan Jiang, Tong Zhang
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2312.11456
- Pdf link: https://arxiv.org/pdf/2312.11456
- Abstract This paper studies the theoretical framework of the alignment process of generative models with Reinforcement Learning from Human Feedback (RLHF). We consider a standard mathematical formulation, the reverse-KL regularized contextual bandit for RLHF. Despite its widespread practical application, a rigorous theoretical analysis of this formulation remains open. We investigate its theoretical properties both in offline and online settings and propose efficient algorithms with finite-sample theoretical guarantees. Our work bridges the gap between theory and practice by linking our theoretical insights with existing practical alignment algorithms such as Direct Preference Optimization (DPO) and Rejection Sampling Optimization (RSO). Furthermore, these findings and connections also offer both theoretical and practical communities new tools and insights for future algorithmic design of alignment algorithms.
GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis
- Authors: Authors: Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, Lei Xiao
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.11458
- Pdf link: https://arxiv.org/pdf/2312.11458
- Abstract We propose a method for dynamic scene reconstruction using deformable 3D Gaussians that is tailored for monocular video. Building upon the efficiency of Gaussian splatting, our approach extends the representation to accommodate dynamic elements via a deformable set of Gaussians residing in a canonical space, and a time-dependent deformation field defined by a multi-layer perceptron (MLP). Moreover, under the assumption that most natural scenes have large regions that remain static, we allow the MLP to focus its representational power by additionally including a static Gaussian point cloud. The concatenated dynamic and static point clouds form the input for the Gaussian Splatting rasterizer, enabling real-time rendering. The differentiable pipeline is optimized end-to-end with a self-supervised rendering loss. Our method achieves results that are comparable to state-of-the-art dynamic neural radiance field methods while allowing much faster optimization and rendering. Project website: https://lynl7130.github.io/gaufre/index.html
Keyword: adam
On the Use of Walsh Domain Equalizer for Performance Enhancement of MIMO-OFDM Communication Systems
- Authors: Authors: Khaled Ramadan
- Subjects: Information Theory (cs.IT); Performance (cs.PF)
- Arxiv link: https://arxiv.org/abs/2312.10421
- Pdf link: https://arxiv.org/pdf/2312.10421
- Abstract The purpose of this article is to investigate the viability of Multi-Carrier Modulation (MCM) systems based on the Fast Walsh Hadamard Transform (FWHT). In addition, a nonlinear Joint Low-Complexity Optimized Zero Forcing Successive Interference Cancellation (JLCOZF-SIC) equalizer is proposed. To that end, general equations for the number of flops of the proposed equalizer and various other equalizers are given. This article discusses the use of Banded Matrix Approximation (BMA) as a technique for reducing complexity. The proposed equalizer uses BMA to accomplish both equalization and co-Carrier Frequency Offset (co-CFO) corrections. In addition, three cases involving the proposed equalizer were investigated. In the first case, diagonal compensation is used. In the second case, BMA compensation is used. In the third case, complete matrix compensation is used. In the presence of frequency offset, noise, and frequency-selective Rayleigh fading environments, analysis and simulation results show that the OFDM-FWHT system with the proposed equalizer outperforms the conventional OFDM system with various linear and nonlinear equalizers.
Keyword: gradient
Gradient-based Parameter Selection for Efficient Fine-Tuning
- Authors: Authors: Zhi Zhang, Qizhe Zhang, Zijun Gao, Renrui Zhang, Ekaterina Shutova, Shiji Zhou, Shanghang Zhang
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.10136
- Pdf link: https://arxiv.org/pdf/2312.10136
- Abstract With the growing size of pre-trained models, full fine-tuning and storing all the parameters for various downstream tasks is costly and infeasible. In this paper, we propose a new parameter-efficient fine-tuning method, Gradient-based Parameter Selection (GPS), demonstrating that only tuning a few selected parameters from the pre-trained model while keeping the remainder of the model frozen can generate similar or better performance compared with the full model fine-tuning method. Different from the existing popular and state-of-the-art parameter-efficient fine-tuning approaches, our method does not introduce any additional parameters and computational costs during both the training and inference stages. Another advantage is the model-agnostic and non-destructive property, which eliminates the need for any other design specific to a particular model. Compared with the full fine-tuning, GPS achieves 3.33% (91.78% vs. 88.45%, FGVC) and 9.61% (73.1% vs. 65.57%, VTAB) improvement of the accuracy with tuning only 0.36% parameters of the pre-trained model on average over 24 image classification tasks; it also demonstrates a significant improvement of 17% and 16.8% in mDice and mIoU, respectively, on medical image segmentation task. Moreover, GPS achieves state-of-the-art performance compared with existing PEFT methods.
RetailKLIP : Finetuning OpenCLIP backbone using metric learning on a single GPU for Zero-shot retail product image classification
- Authors: Authors: Muktabh Mayank Srivastava
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2312.10282
- Pdf link: https://arxiv.org/pdf/2312.10282
- Abstract Retail product or packaged grocery goods images need to classified in various computer vision applications like self checkout stores, supply chain automation and retail execution evaluation. Previous works explore ways to finetune deep models for this purpose. But because of the fact that finetuning a large model or even linear layer for a pretrained backbone requires to run at least a few epochs of gradient descent for every new retail product added in classification range, frequent retrainings are needed in a real world scenario. In this work, we propose finetuning the vision encoder of a CLIP model in a way that its embeddings can be easily used for nearest neighbor based classification, while also getting accuracy close to or exceeding full finetuning. A nearest neighbor based classifier needs no incremental training for new products, thus saving resources and wait time.
Spatial-Temporal DAG Convolutional Networks for End-to-End Joint Effective Connectivity Learning and Resting-State fMRI Classification
- Authors: Authors: Rui Yang, Wenrui Dai, Huajun She, Yiping P. Du, Dapeng Wu, Hongkai Xiong
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
- Arxiv link: https://arxiv.org/abs/2312.10317
- Pdf link: https://arxiv.org/pdf/2312.10317
- Abstract Building comprehensive brain connectomes has proved of fundamental importance in resting-state fMRI (rs-fMRI) analysis. Based on the foundation of brain network, spatial-temporal-based graph convolutional networks have dramatically improved the performance of deep learning methods in rs-fMRI time series classification. However, existing works either pre-define the brain network as the correlation matrix derived from the raw time series or jointly learn the connectome and model parameters without any topology constraint. These methods could suffer from degraded classification performance caused by the deviation from the intrinsic brain connectivity and lack biological interpretability of demonstrating the causal structure (i.e., effective connectivity) among brain regions. Moreover, most existing methods for effective connectivity learning are unaware of the downstream classification task and cannot sufficiently exploit useful rs-fMRI label information. To address these issues in an end-to-end manner, we model the brain network as a directed acyclic graph (DAG) to discover direct causal connections between brain regions and propose Spatial-Temporal DAG Convolutional Network (ST-DAGCN) to jointly infer effective connectivity and classify rs-fMRI time series by learning brain representations based on nonlinear structural equation model. The optimization problem is formulated into a continuous program and solved with score-based learning method via gradient descent. We evaluate ST-DAGCN on two public rs-fMRI databases. Experiments show that ST-DAGCN outperforms existing models by evident margins in rs-fMRI classification and simultaneously learns meaningful edges of effective connectivity that help understand brain activity patterns and pathological mechanisms in brain disease.
Material Point Methods on Unstructured Tessellations: A Stable Kernel Approach With Continuous Gradient Reconstruction
- Authors: Authors: Yadi Cao, Yidong Zhao, Minchen Li, Yin Yang, Jinhyun Choo, Demetri Terzopoulos, Chenfanfu Jiang
- Subjects: Computational Engineering, Finance, and Science (cs.CE)
- Arxiv link: https://arxiv.org/abs/2312.10338
- Pdf link: https://arxiv.org/pdf/2312.10338
- Abstract The Material Point Method (MPM) is a hybrid Eulerian-Lagrangian simulation technique for solid mechanics with significant deformation. Structured background grids are commonly employed in the standard MPM, but they may give rise to several accuracy problems in handling complex geometries. When using (2D) unstructured triangular or (3D) tetrahedral background elements, however, significant challenges arise (eg, cell-crossing error). Substantial numerical errors develop due to the inherent C0 continuity property of the interpolation function, which causes discontinuous gradients across element boundaries. Prior efforts in constructing C1 continuous interpolation functions have either not been adapted for unstructured grids or have only been applied to 2D triangular meshes. In this study, an Unstructured Moving Least Squares MPM (UMLS-MPM) is introduced to accommodate 2D and 3D simplex tessellation. The central idea is to incorporate a diminishing function into the sample weights of the MLS kernel, ensuring an analytically continuous velocity gradient estimation. Numerical analyses confirm the method's capability in mitigating cell crossing inaccuracies and realizing expected convergence.
Bayesian experimental design for head imaging by electrical impedance tomography
- Authors: Authors: N. Hyvönen, A. Jääskeläinen, R. Maity, A. Vavilov
- Subjects: Numerical Analysis (math.NA); Statistics Theory (math.ST)
- Arxiv link: https://arxiv.org/abs/2312.10383
- Pdf link: https://arxiv.org/pdf/2312.10383
- Abstract This work considers the optimization of electrode positions in head imaging by electrical impedance tomography. The study is motivated by maximizing the sensitivity of electrode measurements to conductivity changes when monitoring the condition of a stroke patient, which justifies adopting a linearized version of the complete electrode model as the forward model. The algorithm is based on finding a (locally) A-optimal measurement configuration via gradient descent with respect to the electrode positions. The efficient computation of the needed derivatives of the complete electrode model is one of the focal points. Two algorithms are introduced and numerically tested on a three-layer head model. The first one assumes a region of interest and a Gaussian prior for the conductivity in the brain, and it can be run offline, i.e., prior to taking any measurements. The second algorithm first computes a reconstruction of the conductivity anomaly caused by the stroke with an initial electrode configuration by combining lagged diffusivity iteration with sequential linearizations, which can be interpreted to produce an approximate Gaussian probability density for the conductivity perturbation. It then resorts to the first algorithm to find new, more informative positions for the available electrodes with the constructed density as the prior.
Take History as a Mirror in Heterogeneous Federated Learning
- Authors: Authors: Xiaorui Jiang, Hengwei Xu, Yu Gao, Yong Liao, Pengyuan Zhou
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10425
- Pdf link: https://arxiv.org/pdf/2312.10425
- Abstract Federated Learning (FL) allows several clients to cooperatively train machine learning models without disclosing the raw data. In practice, due to the system and statistical heterogeneity among devices, synchronous FL often encounters the straggler effect. In contrast, asynchronous FL can mitigate this problem, making it suitable for scenarios involving numerous participants. However, Non-IID data and stale models present significant challenges to asynchronous FL, as they would diminish the practicality of the global model and even lead to training failures. In this work, we propose a novel asynchronous FL framework called Federated Historical Learning (FedHist), which effectively addresses the challenges posed by both Non-IID data and gradient staleness. FedHist enhances the stability of local gradients by performing weighted fusion with historical global gradients cached on the server. Relying on hindsight, it assigns aggregation weights to each participant in a multi-dimensional manner during each communication round. To further enhance the efficiency and stability of the training process, we introduce an intelligent $\ell_2$-norm amplification scheme, which dynamically regulates the learning progress based on the $\ell_2$-norms of the submitted gradients. Extensive experiments demonstrate that FedHist outperforms state-of-the-art methods in terms of convergence performance and test accuracy.
Weight-Entanglement Meets Gradient-Based Neural Architecture Search
- Authors: Authors: Rhea Sanjay Sukthanker, Arjun Krishnakumar, Mahmoud Safari, Frank Hutter
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.10440
- Pdf link: https://arxiv.org/pdf/2312.10440
- Abstract Weight sharing is a fundamental concept in neural architecture search (NAS), enabling gradient-based methods to explore cell-based architecture spaces significantly faster than traditional blackbox approaches. In parallel, weight \emph{entanglement} has emerged as a technique for intricate parameter sharing among architectures within macro-level search spaces. %However, the macro structure of such spaces poses compatibility challenges for gradient-based NAS methods. %As a result, blackbox optimization methods have been commonly employed, particularly in conjunction with supernet training, to maintain search efficiency. %Due to the inherent differences in the structure of these search spaces, these Since weight-entanglement poses compatibility challenges for gradient-based NAS methods, these two paradigms have largely developed independently in parallel sub-communities. This paper aims to bridge the gap between these sub-communities by proposing a novel scheme to adapt gradient-based methods for weight-entangled spaces. This enables us to conduct an in-depth comparative assessment and analysis of the performance of gradient-based NAS in weight-entangled search spaces. Our findings reveal that this integration of weight-entanglement and gradient-based NAS brings forth the various benefits of gradient-based methods (enhanced performance, improved supernet training properties and superior any-time performance), while preserving the memory efficiency of weight-entangled spaces. The code for our work is openly accessible \href{https://anonymous.4open.science/r/TangleNAS-527C}{here}
Catastrophic Forgetting in Deep Learning: A Comprehensive Taxonomy
- Authors: Authors: Everton L. Aleixo, Juan G. Colonna, Marco Cristo, Everlandio Fernandes
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.10549
- Pdf link: https://arxiv.org/pdf/2312.10549
- Abstract Deep Learning models have achieved remarkable performance in tasks such as image classification or generation, often surpassing human accuracy. However, they can struggle to learn new tasks and update their knowledge without access to previous data, leading to a significant loss of accuracy known as Catastrophic Forgetting (CF). This phenomenon was first observed by McCloskey and Cohen in 1989 and remains an active research topic. Incremental learning without forgetting is widely recognized as a crucial aspect in building better AI systems, as it allows models to adapt to new tasks without losing the ability to perform previously learned ones. This article surveys recent studies that tackle CF in modern Deep Learning models that use gradient descent as their learning algorithm. Although several solutions have been proposed, a definitive solution or consensus on assessing CF is yet to be established. The article provides a comprehensive review of recent solutions, proposes a taxonomy to organize them, and identifies research gaps in this area.
Amortized Reparametrization: Efficient and Scalable Variational Inference for Latent SDEs
- Authors: Authors: Kevin Course, Prasanth B. Nair
- Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2312.10550
- Pdf link: https://arxiv.org/pdf/2312.10550
- Abstract We consider the problem of inferring latent stochastic differential equations (SDEs) with a time and memory cost that scales independently with the amount of data, the total length of the time series, and the stiffness of the approximate differential equations. This is in stark contrast to typical methods for inferring latent differential equations which, despite their constant memory cost, have a time complexity that is heavily dependent on the stiffness of the approximate differential equation. We achieve this computational advancement by removing the need to solve differential equations when approximating gradients using a novel amortization strategy coupled with a recently derived reparametrization of expectations under linear SDEs. We show that, in practice, this allows us to achieve similar performance to methods based on adjoint sensitivities with more than an order of magnitude fewer evaluations of the model in training.
Resource Allocation for Secure Ultra-Reliable Low-Latency-Communication in IoT Applications
- Authors: Authors: Solmaz Sorkhi Asbaghi, Mahmood Mohassel Feghhi, Javad Musevi niya
- Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2312.10555
- Pdf link: https://arxiv.org/pdf/2312.10555
- Abstract The Internet of Things (IoT) has a significant demand in society due to its features, and it is constantly improving. In the context of wireless technology, Ultra-reliable and low-latency communication (URLLC) is one of the essential and challenging services in fifth-generation (5G) networks and beyond. The research on URLLC is still in its early stages due to its conflicting requirements, regarding high reliability and low latency. In this paper, we study the performance of secure short-packet communications and resource allocation in IoT systems. To this end, we investigate a health center automation, where the goal of the access point is to send critical messages to devices without eavesdropping. In this context, our goal is to maximize the weighted sum throughput and minimize the total transmit power, respectively. The problems of maximizing the weighted sum throughput, and minimizing the total transmit power are non-convex and hard to solve. To overcome this challenge, we use efficient mathematical techniques, such as the block coordinate descent (BCD) method and gradient ascent algorithm; we also use estimation methods such as Ralston, Heun, and forward-backward, in the derivative part of the gradient ascent algorithm. The simulation results show the performance advantages of the BCD algorithm and the gradient ascent in the short packet transmission scheme, also the simulation results show the superiority of the proposed methods in most cases.
Single-Stage Optimization of Open-loop Stable Limit Cycles with Smooth, Symbolic Derivatives
- Authors: Authors: Muhammad Saud Ul Hassan, Christian Hubicki
- Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2312.10647
- Pdf link: https://arxiv.org/pdf/2312.10647
- Abstract Open-loop stable limit cycles are foundational to the dynamics of legged robots. They impart a self-stabilizing character to the robot's gait, thus alleviating the need for compute-heavy feedback-based gait correction. This paper proposes a general approach to rapidly generate limit cycles with explicit stability constraints for a given dynamical system. In particular, we pose the problem of open-loop limit cycle stability as a single-stage constrained-optimization problem (COP), and use Direct Collocation to transcribe it into a nonlinear program (NLP) with closed-form expressions for constraints, objectives, and their gradients. The COP formulations of stability are developed based (1) on the spectral radius of a discrete return map, and (2) on the spectral radius of the system's monodromy matrix, where the spectral radius is bounded using different constraint-satisfaction formulations of the eigenvalue problem. We compare the performance and solution qualities of each approach, but specifically highlight the Schur decomposition of the monodromy matrix as a formulation which boasts wider applicability through weaker assumptions and attractive numerical convergence properties. Moreover, we present results from our experiments on a spring-loaded inverted pendulum model of a robot, where our method generated actuation trajectories for open-loop stable hopping in under 2 seconds (on the Intel Core i7-6700K), and produced energy-minimizing actuation trajectories even under tight stability constraints.
Faithful Model Explanations through Energy-Constrained Conformal Counterfactuals
- Authors: Authors: Patrick Altmeyer, Mojtaba Farmanbar, Arie van Deursen, Cynthia C. S. Liem
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.10648
- Pdf link: https://arxiv.org/pdf/2312.10648
- Abstract Counterfactual explanations offer an intuitive and straightforward way to explain black-box models and offer algorithmic recourse to individuals. To address the need for plausible explanations, existing work has primarily relied on surrogate models to learn how the input data is distributed. This effectively reallocates the task of learning realistic explanations for the data from the model itself to the surrogate. Consequently, the generated explanations may seem plausible to humans but need not necessarily describe the behaviour of the black-box model faithfully. We formalise this notion of faithfulness through the introduction of a tailored evaluation metric and propose a novel algorithmic framework for generating Energy-Constrained Conformal Counterfactuals that are only as plausible as the model permits. Through extensive empirical studies, we demonstrate that ECCCo reconciles the need for faithfulness and plausibility. In particular, we show that for models with gradient access, it is possible to achieve state-of-the-art performance without the need for surrogate models. To do so, our framework relies solely on properties defining the black-box model itself by leveraging recent advances in energy-based modelling and conformal prediction. To our knowledge, this is the first venture in this direction for generating faithful counterfactual explanations. Thus, we anticipate that ECCCo can serve as a baseline for future research. We believe that our work opens avenues for researchers and practitioners seeking tools to better distinguish trustworthy from unreliable models.
CACTO-SL: Using Sobolev Learning to improve Continuous Actor-Critic with Trajectory Optimization
- Authors: Authors: Elisa Alboni, Gianluigi Grandesso, Gastone Pietro Rosati Papini, Justin Carpentier, Andrea Del Prete
- Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2312.10666
- Pdf link: https://arxiv.org/pdf/2312.10666
- Abstract Trajectory Optimization (TO) and Reinforcement Learning (RL) are powerful and complementary tools to solve optimal control problems. On the one hand, TO can efficiently compute locally-optimal solutions, but it tends to get stuck in local minima if the problem is not convex. On the other hand, RL is typically less sensitive to non-convexity, but it requires a much higher computational effort. Recently, we have proposed CACTO (Continuous Actor-Critic with Trajectory Optimization), an algorithm that uses TO to guide the exploration of an actor-critic RL algorithm. In turns, the policy encoded by the actor is used to warm-start TO, closing the loop between TO and RL. In this work, we present an extension of CACTO exploiting the idea of Sobolev learning. To make the training of the critic network faster and more data efficient, we enrich it with the gradient of the Value function, computed via a backward pass of the differential dynamic programming algorithm. Our results show that the new algorithm is more efficient than the original CACTO, reducing the number of TO episodes by a factor ranging from 3 to 10, and consequently the computation time. Moreover, we show that CACTO-SL helps TO to find better minima and to produce more consistent results.
Analisis Eksploratif Dan Augmentasi Data NSL-KDD Menggunakan Deep Generative Adversarial Networks Untuk Meningkatkan Performa Algoritma Extreme Gradient Boosting Dalam Klasifikasi Jenis Serangan Siber
- Authors: Authors: K. P. Santoso, F. A. Madany, H. Suryotrisongko
- Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.10669
- Pdf link: https://arxiv.org/pdf/2312.10669
- Abstract This study proposes the implementation of Deep Generative Adversarial Networks (GANs) for augmenting the NSL-KDD dataset. The primary objective is to enhance the efficacy of eXtreme Gradient Boosting (XGBoost) in the classification of cyber-attacks on the NSL-KDD dataset. As a result, the method proposed in this research achieved an accuracy of 99.53% using the XGBoost model without data augmentation with GAN, and 99.78% with data augmentation using GAN.
Automatic Optimisation of Normalised Neural Networks
- Authors: Authors: Namhoon Cho, Hyo-Sang Shin
- Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2312.10672
- Pdf link: https://arxiv.org/pdf/2312.10672
- Abstract We propose automatic optimisation methods considering the geometry of matrix manifold for the normalised parameters of neural networks. Layerwise weight normalisation with respect to Frobenius norm is utilised to bound the Lipschitz constant and to enhance gradient reliability so that the trained networks are suitable for control applications. Our approach first initialises the network and normalises the data with respect to the $\ell^{2}$-$\ell^{2}$ gain of the initialised network. Then, the proposed algorithms take the update structure based on the exponential map on high-dimensional spheres. Given an update direction such as that of the negative Riemannian gradient, we propose two different ways to determine the stepsize for descent. The first algorithm utilises automatic differentiation of the objective function along the update curve defined on the combined manifold of spheres. The directional second-order derivative information can be utilised without requiring explicit construction of the Hessian. The second algorithm utilises the majorisation-minimisation framework via architecture-aware majorisation for neural networks. With these new developments, the proposed methods avoid manual tuning and scheduling of the learning rate, thus providing an automated pipeline for optimizing normalised neural networks.
Knowledge Trees: Gradient Boosting Decision Trees on Knowledge Neurons as Probing Classifier
- Authors: Authors: Sergey A. Saltykov
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10746
- Pdf link: https://arxiv.org/pdf/2312.10746
- Abstract To understand how well a large language model captures certain semantic or syntactic features, researchers typically apply probing classifiers. However, the accuracy of these classifiers is critical for the correct interpretation of the results. If a probing classifier exhibits low accuracy, this may be due either to the fact that the language model does not capture the property under investigation, or to shortcomings in the classifier itself, which is unable to adequately capture the characteristics encoded in the internal representations of the model. Consequently, for more effective diagnosis, it is necessary to use the most accurate classifiers possible for a particular type of task. Logistic regression on the output representation of the transformer neural network layer is most often used to probing the syntactic properties of the language model. We show that using gradient boosting decision trees at the Knowledge Neuron layer, i.e., at the hidden layer of the feed-forward network of the transformer as a probing classifier for recognizing parts of a sentence is more advantageous than using logistic regression on the output representations of the transformer layer. This approach is also preferable to many other methods. The gain in error rate, depending on the preset, ranges from 9-54%
Identification of Knowledge Neurons in Protein Language Models
- Authors: Authors: Divya Nori, Shivali Singireddy, Marina Ten Have
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Biomolecules (q-bio.BM)
- Arxiv link: https://arxiv.org/abs/2312.10770
- Pdf link: https://arxiv.org/pdf/2312.10770
- Abstract Neural language models have become powerful tools for learning complex representations of entities in natural language processing tasks. However, their interpretability remains a significant challenge, particularly in domains like computational biology where trust in model predictions is crucial. In this work, we aim to enhance the interpretability of protein language models, specifically the state-of-the-art ESM model, by identifying and characterizing knowledge neurons - components that express understanding of key information. After fine-tuning the ESM model for the task of enzyme sequence classification, we compare two knowledge neuron selection methods that preserve a subset of neurons from the original model. The two methods, activation-based and integrated gradient-based selection, consistently outperform a random baseline. In particular, these methods show that there is a high density of knowledge neurons in the key vector prediction networks of self-attention modules. Given that key vectors specialize in understanding different features of input sequences, these knowledge neurons could capture knowledge of different enzyme sequence motifs. In the future, the types of knowledge captured by each neuron could be characterized.
AEDFL: Efficient Asynchronous Decentralized Federated Learning with Heterogeneous Devices
- Authors: Authors: Ji Liu, Tianshi Che, Yang Zhou, Ruoming Jin, Huaiyu Dai, Dejing Dou, Patrick Valduriez
- Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10935
- Pdf link: https://arxiv.org/pdf/2312.10935
- Abstract Federated Learning (FL) has achieved significant achievements recently, enabling collaborative model training on distributed data over edge devices. Iterative gradient or model exchanges between devices and the centralized server in the standard FL paradigm suffer from severe efficiency bottlenecks on the server. While enabling collaborative training without a central server, existing decentralized FL approaches either focus on the synchronous mechanism that deteriorates FL convergence or ignore device staleness with an asynchronous mechanism, resulting in inferior FL accuracy. In this paper, we propose an Asynchronous Efficient Decentralized FL framework, i.e., AEDFL, in heterogeneous environments with three unique contributions. First, we propose an asynchronous FL system model with an efficient model aggregation method for improving the FL convergence. Second, we propose a dynamic staleness-aware model update approach to achieve superior accuracy. Third, we propose an adaptive sparse training method to reduce communication and computation costs without significant accuracy degradation. Extensive experimentation on four public datasets and four models demonstrates the strength of AEDFL in terms of accuracy (up to 16.3% higher), efficiency (up to 92.9% faster), and computation costs (up to 42.3% lower).
A Multimodal Approach for Advanced Pest Detection and Classification
- Authors: Authors: Jinli Duan, Haoyu Ding, Sung Kim
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.10948
- Pdf link: https://arxiv.org/pdf/2312.10948
- Abstract This paper presents a novel multi modal deep learning framework for enhanced agricultural pest detection, combining tiny-BERT's natural language processing with R-CNN and ResNet-18's image processing. Addressing limitations of traditional CNN-based visual methods, this approach integrates textual context for more accurate pest identification. The R-CNN and ResNet-18 integration tackles deep CNN issues like vanishing gradients, while tiny-BERT ensures computational efficiency. Employing ensemble learning with linear regression and random forest models, the framework demonstrates superior discriminate ability, as shown in ROC and AUC analyses. This multi modal approach, blending text and image data, significantly boosts pest detection in agriculture. The study highlights the potential of multi modal deep learning in complex real-world scenarios, suggesting future expansions in diversity of datasets, advanced data augmentation, and cross-modal attention mechanisms to enhance model performance.
Exploring Gradient Explosion in Generative Adversarial Imitation Learning: A Probabilistic Perspective
- Authors: Authors: Wanying Wang, Yichen Zhu, Yirui Zhou, Chaomin Shen, Jian Tang, Zhiyuan Xu, Yaxin Peng, Yangchun Zhang
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.11214
- Pdf link: https://arxiv.org/pdf/2312.11214
- Abstract Generative Adversarial Imitation Learning (GAIL) stands as a cornerstone approach in imitation learning. This paper investigates the gradient explosion in two types of GAIL: GAIL with deterministic policy (DE-GAIL) and GAIL with stochastic policy (ST-GAIL). We begin with the observation that the training can be highly unstable for DE-GAIL at the beginning of the training phase and end up divergence. Conversely, the ST-GAIL training trajectory remains consistent, reliably converging. To shed light on these disparities, we provide an explanation from a theoretical perspective. By establishing a probabilistic lower bound for GAIL, we demonstrate that gradient explosion is an inevitable outcome for DE-GAIL due to occasionally large expert-imitator policy disparity, whereas ST-GAIL does not have the issue with it. To substantiate our assertion, we illustrate how modifications in the reward function can mitigate the gradient explosion challenge. Finally, we propose CREDO, a simple yet effective strategy that clips the reward function during the training phase, allowing the GAIL to enjoy high data efficiency and stable trainability.
LLM-ARK: Knowledge Graph Reasoning Using Large Language Models via Deep Reinforcement Learning
- Authors: Authors: Yuxuan Huang
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2312.11282
- Pdf link: https://arxiv.org/pdf/2312.11282
- Abstract With the evolution of pre-training methods, large language models (LLMs) have exhibited exemplary reasoning capabilities via prompt engineering. However, the absence of Knowledge Graph (KG) environment awareness and the challenge of engineering viable optimization mechanisms for intermediary reasoning processes, constrict the performance of LLMs on KG reasoning tasks compared to smaller models. We introduce LLM-ARK, a LLM grounded KG reasoning agent designed to deliver precise and adaptable predictions on KG paths. LLM-ARK utilizes Full Textual Environment (FTE) prompts to assimilate state information for each step-sized intelligence. Leveraging LLMs to richly encode and represent various types of inputs and integrate the knowledge graph further with path environment data, before making the final decision. Reframing the Knowledge Graph (KG) multi-hop inference problem as a sequential decision-making issue, we optimize our model using the Proximal Policy Optimization (PPO) online policy gradient reinforcement learning algorithm which allows the model to learn from a vast array of reward signals across diverse tasks and environments. We evaluate state-of-the-art LLM(GPT-4) and our method which using open-source models of varying sizes on OpenDialKG dataset. Our experiment shows that LLaMA7B-ARK provides excellent results with a performance rate of 48.75% for the target@1 evaluation metric, far exceeding the current state-of-the-art model by 17.64 percentage points. Meanwhile, GPT-4 accomplished a score of only 14.91%, further highlighting the efficacy and complexity of our methodology. Our code is available on GitHub for further access.
Density Descent for Diversity Optimization
- Authors: Authors: David H. Lee, Anishalakshmi V. Palaparthi, Matthew C. Fontaine, Bryon Tjanaka, Stefanos Nikolaidis
- Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
- Arxiv link: https://arxiv.org/abs/2312.11331
- Pdf link: https://arxiv.org/pdf/2312.11331
- Abstract Diversity optimization seeks to discover a set of solutions that elicit diverse features. Prior work has proposed Novelty Search (NS), which, given a current set of solutions, seeks to expand the set by finding points in areas of low density in the feature space. However, to estimate density, NS relies on a heuristic that considers the k-nearest neighbors of the search point in the feature space, which yields a weaker stability guarantee. We propose Density Descent Search (DDS), an algorithm that explores the feature space via gradient descent on a continuous density estimate of the feature space that also provides stronger stability guarantee. We experiment with DDS and two density estimation methods: kernel density estimation (KDE) and continuous normalizing flow (CNF). On several standard diversity optimization benchmarks, DDS outperforms NS, the recently proposed MAP-Annealing algorithm, and other state-of-the-art baselines. Additionally, we prove that DDS with KDE provides stronger stability guarantees than NS, making it more suitable for adaptive optimizers. Furthermore, we prove that NS is a special case of DDS that descends a KDE of the feature space.
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
- Authors: Authors: Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
- Arxiv link: https://arxiv.org/abs/2312.11360
- Pdf link: https://arxiv.org/pdf/2312.11360
- Abstract We present Paint-it, a text-driven high-fidelity texture map synthesis method for 3D meshes via neural re-parameterized texture optimization. Paint-it synthesizes texture maps from a text description by synthesis-through-optimization, exploiting the Score-Distillation Sampling (SDS). We observe that directly applying SDS yields undesirable texture quality due to its noisy gradients. We reveal the importance of texture parameterization when using SDS. Specifically, we propose Deep Convolutional Physically-Based Rendering (DC-PBR) parameterization, which re-parameterizes the physically-based rendering (PBR) texture maps with randomly initialized convolution-based neural kernels, instead of a standard pixel-based parameterization. We show that DC-PBR inherently schedules the optimization curriculum according to texture frequency and naturally filters out the noisy signals from SDS. In experiments, Paint-it obtains remarkable quality PBR texture maps within 15 min., given only a text description. We demonstrate the generalizability and practicality of Paint-it by synthesizing high-quality texture maps for large-scale mesh datasets and showing test-time applications such as relighting and material control using a popular graphics engine. Project page: https://kim-youwang.github.io/paint-it
DiffTune-MPC: Closed-Loop Learning for Model Predictive Control
- Authors: Authors: Ran Tao, Sheng Cheng, Xiaofeng Wang, Shenlong Wang, Naira Hovakimyan
- Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2312.11384
- Pdf link: https://arxiv.org/pdf/2312.11384
- Abstract Model predictive control (MPC) has been applied to many platforms in robotics and autonomous systems for its capability to predict a system's future behavior while incorporating constraints that a system may have. To enhance the performance of a system with an MPC controller, one can manually tune the MPC's cost function. However, it can be challenging due to the possibly high dimension of the parameter space as well as the potential difference between the open-loop cost function in MPC and the overall closed-loop performance metric function. This paper presents DiffTune-MPC, a novel learning method, to learn the cost function of an MPC in a closed-loop manner. The proposed framework is compatible with the scenario where the time interval for performance evaluation and MPC's planning horizon have different lengths. We show the auxiliary problem whose solution admits the analytical gradients of MPC and discuss its variations in different MPC settings. Simulation results demonstrate the capability of DiffTune-MPC and illustrate the influence of constraints (from actuation limits) on learning.
Keyword: super-resolution
Image Restoration Through Generalized Ornstein-Uhlenbeck Bridge
- Authors: Authors: Conghan Yue, Zhengwei Peng, Junlong Ma, Shiyan Du, Pengxu Wei, Dongyu Zhang
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2312.10299
- Pdf link: https://arxiv.org/pdf/2312.10299
- Abstract Diffusion models possess powerful generative capabilities enabling the mapping of noise to data using reverse stochastic differential equations. However, in image restoration tasks, the focus is on the mapping relationship from low-quality images to high-quality images. To address this, we introduced the Generalized Ornstein-Uhlenbeck Bridge (GOUB) model. By leveraging the natural mean-reverting property of the generalized OU process and further adjusting the variance of its steady-state distribution through the Doob's h-transform, we achieve diffusion mappings from point to point with minimal cost. This allows for end-to-end training, enabling the recovery of high-quality images from low-quality ones. Additionally, we uncovered the mathematical essence of some bridge models, all of which are special cases of the GOUB and empirically demonstrated the optimality of our proposed models. Furthermore, benefiting from our distinctive parameterization mechanism, we proposed the Mean-ODE model that is better at capturing pixel-level information and structural perceptions. Experimental results show that both models achieved state-of-the-art results in various tasks, including inpainting, deraining, and super-resolution. Code is available at https://github.com/Hammour-steak/GOUB.