arxiv-updates
arxiv-updates copied to clipboard
New submissions for Mon, 20 Nov 23
Keyword: sgd
Prompt Pool based Class-Incremental Continual Learning for Dialog State Tracking
- Authors: Authors: Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, Junlan Feng
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2311.10271
- Pdf link: https://arxiv.org/pdf/2311.10271
- Abstract Continual learning is crucial for dialog state tracking (DST) in dialog systems, since requirements from users for new functionalities are often encountered. However, most of existing continual learning methods for DST require task identities during testing, which is a severe limit in real-world applications. In this paper, we aim to address continual learning of DST in the class-incremental scenario (namely the task identity is unknown in testing). Inspired by the recently emerging prompt tuning method that performs well on dialog systems, we propose to use the prompt pool method, where we maintain a pool of key-value paired prompts and select prompts from the pool according to the distance between the dialog history and the prompt keys. The proposed method can automatically identify tasks and select appropriate prompts during testing. We conduct experiments on Schema-Guided Dialog dataset (SGD) and another dataset collected from a real-world dialog application. Experiment results show that the prompt pool method achieves much higher joint goal accuracy than the baseline. After combining with a rehearsal buffer, the model performance can be further improved.
Keyword: optimization
Smart Traffic Management of Vehicles using Faster R-CNN based Deep Learning Method
- Authors: Authors: Arindam Chaudhuri
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
- Arxiv link: https://arxiv.org/abs/2311.10099
- Pdf link: https://arxiv.org/pdf/2311.10099
- Abstract With constant growth of civilization and modernization of cities all across the world since past few centuries smart traffic management of vehicles is one of the most sorted after problem by research community. It is a challenging problem in computer vision and artificial intelligence domain. Smart traffic management basically involves segmentation of vehicles, estimation of traffic density and tracking of vehicles. The vehicle segmentation from traffic videos helps realization of niche applications such as monitoring of speed and estimation of traffic. When occlusions, background with clutters and traffic with density variations are present, this problem becomes more intractable in nature. Keeping this motivation in this research work, we investigate Faster R-CNN based deep learning method towards segmentation of vehicles. This problem is addressed in four steps viz minimization with adaptive background model, Faster R-CNN based subnet operation, Faster R-CNN initial refinement and result optimization with extended topological active nets. The computational framework uses ideas of adaptive background modeling. It also addresses shadow and illumination related issues. Higher segmentation accuracy is achieved through topological active net deformable models. The topological and extended topological active nets help to achieve stated deformations. Mesh deformation is achieved with minimization of energy. The segmentation accuracy is improved with modified version of extended topological active net. The experimental results demonstrate superiority of this computational framework
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture
- Authors: Authors: Lincong Feng, Muyu Wang, Maoyu Wang, Kuo Xu, Xiaoli Liu
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2311.10123
- Pdf link: https://arxiv.org/pdf/2311.10123
- Abstract Generative models for 3D object synthesis have seen significant advancements with the incorporation of prior knowledge distilled from 2D diffusion models. Nevertheless, challenges persist in the form of multi-view geometric inconsistencies and slow generation speeds within the existing 3D synthesis frameworks. This can be attributed to two factors: firstly, the deficiency of abundant geometric a priori knowledge in optimization, and secondly, the entanglement issue between geometry and texture in conventional 3D generation methods.In response, we introduce MetaDreammer, a two-stage optimization approach that leverages rich 2D and 3D prior knowledge. In the first stage, our emphasis is on optimizing the geometric representation to ensure multi-view consistency and accuracy of 3D objects. In the second stage, we concentrate on fine-tuning the geometry and optimizing the texture, thereby achieving a more refined 3D object. Through leveraging 2D and 3D prior knowledge in two stages, respectively, we effectively mitigate the interdependence between geometry and texture. MetaDreamer establishes clear optimization objectives for each stage, resulting in significant time savings in the 3D generation process. Ultimately, MetaDreamer can generate high-quality 3D objects based on textual prompts within 20 minutes, and to the best of our knowledge, it is the most efficient text-to-3D generation method. Furthermore, we introduce image control into the process, enhancing the controllability of 3D generation. Extensive empirical evidence confirms that our method is not only highly efficient but also achieves a quality level that is at the forefront of current state-of-the-art 3D generation techniques.
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
- Authors: Authors: Yunshan Zhong, Jiawei Hu, Mingbao Lin, Mengzhao Chen, Rongrong Ji
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2311.10126
- Pdf link: https://arxiv.org/pdf/2311.10126
- Abstract Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training & inference) undermine their position in industrial applications. Post-training quantization (PTQ), tuning ViTs with a tiny dataset and running in a low-bit format, well addresses the cost issue but unluckily bears more performance drops in lower-bit cases. In this paper, we introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for post-Softmax activations; (2) Rugged and magnified loss landscape in coarse-grained quantization granularity for post-LayerNorm activations. Then, I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2 quantizer (SULQ) that incorporates a shift mechanism followed by uniform quantization to achieve both an inclusive domain representation and accurate distribution approximation; (2) A three-stage smooth optimization strategy (SOS) that amalgamates the strengths of channel-wise and layer-wise quantization to enable stable learning. Comprehensive evaluations across diverse vision tasks validate I&S-ViT' superiority over existing PTQ of ViTs methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.
Improving Unimodal Inference with Multimodal Transformers
- Authors: Authors: Kateryna Chumachenko, Alexandros Iosifidis, Moncef Gabbouj
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.10170
- Pdf link: https://arxiv.org/pdf/2311.10170
- Abstract This paper proposes an approach for improving performance of unimodal models with multimodal training. Our approach involves a multi-branch architecture that incorporates unimodal models with a multimodal transformer-based branch. By co-training these branches, the stronger multimodal branch can transfer its knowledge to the weaker unimodal branches through a multi-task objective, thereby improving the performance of the resulting unimodal models. We evaluate our approach on tasks of dynamic hand gesture recognition based on RGB and Depth, audiovisual emotion recognition based on speech and facial video, and audio-video-text based sentiment analysis. Our approach outperforms the conventionally trained unimodal counterparts. Interestingly, we also observe that optimization of the unimodal branches improves the multimodal branch, compared to a similar multimodal model trained from scratch.
Layer-to-Layer Melt Pool Control in Laser Power Bed Fusion
- Authors: Authors: Dominic Liao-McPherson, Efe C. Balta, Mohamadreza Afrasiabi, Alisa Rupenyan, Markus Bambach, John Lygeros
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.10218
- Pdf link: https://arxiv.org/pdf/2311.10218
- Abstract Additive manufacturing processes are flexible and efficient technologies for producing complex geometries. However, ensuring reliability and repeatability is challenging due to the complex physics and various sources of uncertainty in the process. In this work, we investigate closed-loop control of the melt pool dimensions in a laser powder bed fusion (LPBF) process. We propose a trajectory optimization-based layer-to-layer controller that adjusts the laser power input to the next layer to track a desired melt pool depth and validate our controller by placing it in closed-loop high-fidelity multi-layer smoothed particle hydrodynamics simulator of a 2D LPBF process. Detailed numerical case studies demonstrate successful regulation of the melt pool depth on brick and overhang geometries and provide first of its kind results on the effectiveness of layer-to-layer input optimization for the LPBF process as well as detailed insight into the physics of the controlled process. Computational complexity and process performance results illustrate the method's effectiveness and provide an outlook for its implementation onto real systems.
Data-Driven LQR using Reinforcement Learning and Quadratic Neural Networks
- Authors: Authors: Soroush Asri, Luis Rodrigues
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.10235
- Pdf link: https://arxiv.org/pdf/2311.10235
- Abstract This paper introduces a novel data-driven approach to design a linear quadratic regulator (LQR) using a reinforcement learning (RL) algorithm that does not require a system model. The key contribution is to perform policy iteration (PI) by designing the policy evaluator as a two-layer quadratic neural network (QNN). This network is trained through convex optimization. To the best of our knowledge, this is the first time that a QNN trained through convex optimization is employed as the Q-function approximator (QFA). The main advantage is that the QNN's input-output mapping has an analytical expression as a quadratic form, which can then be used to obtain an analytical expression for policy improvement. This is in stark contrast to the available techniques in the literature that must train a second neural network to obtain policy improvement. The article establishes the convergence of the learning algorithm to the optimal control, provided the system is controllable and one starts from a stabilitzing policy. A quadrotor example demonstrates the effectiveness of the proposed approach.
Stable Differentiable Causal Discovery
- Authors: Authors: Achille Nazaret, Justin Hong, Elham Azizi, David Blei
- Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
- Arxiv link: https://arxiv.org/abs/2311.10263
- Pdf link: https://arxiv.org/pdf/2311.10263
- Abstract Inferring causal relationships as directed acyclic graphs (DAGs) is an important but challenging problem. Differentiable Causal Discovery (DCD) is a promising approach to this problem, framing the search as a continuous optimization. But existing DCD methods are numerically unstable, with poor performance beyond tens of variables. In this paper, we propose Stable Differentiable Causal Discovery (SDCD), a new method that improves previous DCD methods in two ways: (1) It employs an alternative constraint for acyclicity; this constraint is more stable, both theoretically and empirically, and fast to compute. (2) It uses a training procedure tailored for sparse causal graphs, which are common in real-world scenarios. We first derive SDCD and prove its stability and correctness. We then evaluate it with both observational and interventional data and on both small-scale and large-scale settings. We find that SDCD outperforms existing methods in both convergence speed and accuracy and can scale to thousands of variables.
Sobol Sequence Optimization for Hardware-Efficient Vector Symbolic Architectures
- Authors: Authors: Sercan Aygun, M. Hassan Najafi
- Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET)
- Arxiv link: https://arxiv.org/abs/2311.10277
- Pdf link: https://arxiv.org/pdf/2311.10277
-
Abstract
Hyperdimensional computing (HDC) is an emerging computing paradigm with significant promise for efficient and robust learning. In HDC, objects are encoded with high-dimensional vector symbolic sequences called hypervectors. The quality of hypervectors, defined by their distribution and independence, directly impacts the performance of HDC systems. Despite a large body of work on the processing parts of HDC systems, little to no attention has been paid to data encoding and the quality of hypervectors. Most prior studies have generated hypervectors using inherent random functions, such as MATLAB
s or Python
s random function. This work introduces an optimization technique for generating hypervectors by employing quasi-random sequences. These sequences have recently demonstrated their effectiveness in achieving accurate and low-discrepancy data encoding in stochastic computing systems. The study outlines the optimization steps for utilizing Sobol sequences to produce high-quality hypervectors in HDC systems. An optimization algorithm is proposed to select the most suitable Sobol sequences for generating minimally correlated hypervectors, particularly in applications related to symbol-oriented architectures. The performance of the proposed technique is evaluated in comparison to two traditional approaches of generating hypervectors based on linear-feedback shift registers and MATLAB random function. The evaluation is conducted for two applications: (i) language and (ii) headline classification. Our experimental results demonstrate accuracy improvements of up to 10.79%, depending on the vector size. Additionally, the proposed encoding hardware exhibits reduced energy consumption and a superior area-delay product.
Leveraging Function Space Aggregation for Federated Learning at Scale
- Authors: Authors: Nikita Dhawan, Nicole Mitchell, Zachary Charles, Zachary Garrett, Gintare Karolina Dziugaite
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.10291
- Pdf link: https://arxiv.org/pdf/2311.10291
- Abstract The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model, without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, we adopt a function space perspective and propose a new algorithm, FedFish, that aggregates local approximations to the functions learned by clients, using an estimate based on their Fisher information. We evaluate FedFish on realistic, large-scale cross-device benchmarks. While the performance of FedAvg can suffer as client models drift further apart, we demonstrate that FedFish is more robust to longer local training. Our evaluation across several settings in image and language benchmarks shows that FedFish outperforms FedAvg as local training epochs increase. Further, FedFish results in global networks that are more amenable to efficient personalization via local fine-tuning on the same or shifted data distributions. For instance, federated pretraining on the C4 dataset, followed by few-shot personalization on Stack Overflow, results in a 7% improvement in next-token prediction by FedFish over FedAvg.
How Much Time is Required for Phase Shift Delivery in RIS-Aided Wireless Systems?
- Authors: Authors: Hao Xie, Dong Li
- Subjects: Information Theory (cs.IT)
- Arxiv link: https://arxiv.org/abs/2311.10295
- Pdf link: https://arxiv.org/pdf/2311.10295
- Abstract Reconfigurable intelligent surface (RIS) has become a focal point of extensive research due to its remarkable "squared gain". However, achieving a substantial beamforming gain typically requires a significant number of elements, which leads to a non-negligible overhead that forwards the coherent phase shift to the RIS. Different from previous works, which primarily focus on the information transmission phase, we consider the phase delivery overhead during the phase-shift delivery phase to explore the trade-off between performance and overhead. To reduce the phase delivery overhead via the control link, we introduce a hybrid phase shift mechanism, encompassing both the coherent and fixed phase shifts. Specifically, a beamforming problem is formulated for maximizing the throughput. In light of the intractability of the problem, we develop an alternating optimization-based iterative algorithm by combining quadratic transformation and successive convex approximation. To gain more insights, we derive the closed-form expression of the number of elements adopting the coherent phase shift in the large signal-to-noise ratio region. This expression serves as a valuable guide for the practical implementation of the RIS technology. Our simulation results conclusively demonstrate the effectiveness of the proposed algorithm in achieving a favorable trade-off between throughput and overhead. Furthermore, the introduction of the hybrid phase shift approach significantly reduces phase delivery overhead while concurrently enhancing the system throughput.
Imagination-augmented Hierarchical Reinforcement Learning for Safe and Interactive Autonomous Driving in Urban Environments
- Authors: Authors: Sang-Hyun Lee, Yoonjae Jung, Seung-Woo Seo
- Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2311.10309
- Pdf link: https://arxiv.org/pdf/2311.10309
- Abstract Hierarchical reinforcement learning (HRL) has led to remarkable achievements in diverse fields. However, existing HRL algorithms still cannot be applied to real-world navigation tasks. These tasks require an agent to perform safety-aware behaviors and interact with surrounding objects in dynamic environments. In addition, an agent in these tasks should perform consistent and structured exploration as they are long-horizon and have complex structures with diverse objects and task-specific rules. Designing HRL agents that can handle these challenges in real-world navigation tasks is an open problem. In this paper, we propose imagination-augmented HRL (IAHRL), a new and general navigation algorithm that allows an agent to learn safe and interactive behaviors in real-world navigation tasks. Our key idea is to train a hierarchical agent in which a high-level policy infers interactions by interpreting behaviors imagined with low-level policies. Specifically, the high-level policy is designed with a permutation-invariant attention mechanism to determine which low-level policy generates the most interactive behavior, and the low-level policies are implemented with an optimization-based behavior planner to generate safe and structured behaviors following task-specific rules. To evaluate our algorithm, we introduce five complex urban driving tasks, which are among the most challenging real-world navigation tasks. The experimental results indicate that our hierarchical agent performs safety-aware behaviors and properly interacts with surrounding vehicles, achieving higher success rates and lower average episode steps than baselines in urban driving tasks.
Federated Knowledge Graph Completion via Latent Embedding Sharing and Tensor Factorization
- Authors: Authors: Maolin Wang, Dun Zeng, Zenglin Xu, Ruocheng Guo, Xiangyu Zhao
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2311.10341
- Pdf link: https://arxiv.org/pdf/2311.10341
- Abstract Knowledge graphs (KGs), which consist of triples, are inherently incomplete and always require completion procedure to predict missing triples. In real-world scenarios, KGs are distributed across clients, complicating completion tasks due to privacy restrictions. Many frameworks have been proposed to address the issue of federated knowledge graph completion. However, the existing frameworks, including FedE, FedR, and FEKG, have certain limitations. = FedE poses a risk of information leakage, FedR's optimization efficacy diminishes when there is minimal overlap among relations, and FKGE suffers from computational costs and mode collapse issues. To address these issues, we propose a novel method, i.e., Federated Latent Embedding Sharing Tensor factorization (FLEST), which is a novel approach using federated tensor factorization for KG completion. FLEST decompose the embedding matrix and enables sharing of latent dictionary embeddings to lower privacy risks. Empirical results demonstrate FLEST's effectiveness and efficiency, offering a balanced solution between performance and privacy. FLEST expands the application of federated tensor factorization in KG completion tasks.
FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification
- Authors: Authors: Wenqing Wu
- Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.10359
- Pdf link: https://arxiv.org/pdf/2311.10359
- Abstract Highly parallelized workloads like machine learning training, inferences and general HPC tasks are greatly accelerated using GPU devices. In a cloud computing cluster, serving a GPU's computation power through multi-tasks sharing is highly demanded since there are always more task requests than the number of GPU available. Existing GPU sharing solutions focus on reducing task-level waiting time or task-level switching costs when multiple jobs competing for a single GPU. Non-stopped computation requests come with different priorities, having non-symmetric impact on QoS for sharing a GPU device. Existing work missed the kernel-level optimization opportunity brought by this setting. To address this problem, we present a novel kernel-level scheduling strategy called FIKIT: Filling Inter-kernel Idle Time. FIKIT incorporates task-level priority information, fine-grained kernel identification, and kernel measurement, allowing low priorities task's execution during high priority task's inter-kernel idle time. Thereby, filling the GPU's device runtime fully, and reduce overall GPU sharing impact to cloud services. Across a set of ML models, the FIKIT based inference system accelerated high priority tasks by 1.33 to 14.87 times compared to the JCT in GPU sharing mode, and more than half of the cases are accelerated by more than 3.5 times. Alternatively, under preemptive sharing, the low-priority tasks have a comparable to default GPU sharing mode JCT, with a 0.84 to 1 times ratio. We further limit the kernel measurement and runtime fine-grained kernel scheduling overhead to less than 10%.
Designing and Evaluating an Adaptive Virtual Reality System using EEG Frequencies to Balance Internal and External Attention States
- Authors: Authors: Francesco Chiossi, Changkun Ou, Carolina Gerhardt, Felix Putze, Sven Mayer
- Subjects: Human-Computer Interaction (cs.HC)
- Arxiv link: https://arxiv.org/abs/2311.10447
- Pdf link: https://arxiv.org/pdf/2311.10447
- Abstract Virtual reality finds various applications in productivity, entertainment, and training scenarios requiring working memory and attentional resources. Working memory relies on prioritizing relevant information and suppressing irrelevant information through internal attention, which is fundamental for successful task performance and training. Today, virtual reality systems do not account for the impact of working memory loads resulting in over or under-stimulation. In this work, we designed an adaptive system based on EEG correlates of external and internal attention to support working memory task performance. Here, participants engaged in a visual working memory N-Back task, and we adapted the visual complexity of distracting surrounding elements. Our study first demonstrated the feasibility of EEG frontal theta and parietal alpha frequency bands for dynamic visual complexity adjustments. Second, our adaptive system showed improved task performance and diminished perceived workload compared to a reverse adaptation. Our results show the effectiveness of the proposed adaptive system, allowing for the optimization of distracting elements in high-demanding conditions. Adaptive systems based on alpha and theta frequency bands allow for the regulation of attentional and executive resources to keep users engaged in a task without resulting in cognitive overload.
A Novel VAPT Algorithm: Enhancing Web Application Security Trough OWASP top 10 Optimization
- Authors: Authors: Rui Ventura, Daniel Jose Franco, Omar Khasro Akram
- Subjects: Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2311.10450
- Pdf link: https://arxiv.org/pdf/2311.10450
- Abstract This research study is built upon cybersecurity audits and investigates the optimization of an Open Web Application Security Project (OWASP) Top 10 algorithm for Web Applications (WA) security audits using Vulnerability Assessment and Penetration Testing (VAPT) processes. The study places particular emphasis on enhancing the VAPT process by optimizing the OWASP algorithm. To achieve this, the research utilizes desk documents to gain knowledge of WA cybersecurity audits and their associated tools. It also delves into archives to explore VAPT processes and identify techniques, methods, and tools for VAPT automation. Furthermore, the research proposes a prototype optimization that streamlines the two steps of VAPT using the OWASP Top 10 algorithm through an experimental procedure. The results are obtained within a virtual environment, which employs black box testing methods as the primary means of data acquisition and analysis. In this experimental setting, the OWASP algorithm demonstrates an impressive level of precision, achieving a precision rate exceeding 90%. It effectively covers all researched vulnerabilities, thus justifying its optimization. This research contributes significantly to the enhancement of the OWASP algorithm and benefits the offensive security community. It plays a crucial role in ensuring compliance processes for professionals and analysts in the security and software development fields.
Accurate and Fast Fischer-Tropsch Reaction Microkinetics using PINNs
- Authors: Authors: Harshil Patel, Aniruddha Panda, Tymofii Nikolaienko, Stanislav Jaso, Alejandro Lopez, Kaushic Kalyanaraman
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Chemical Physics (physics.chem-ph); Computational Physics (physics.comp-ph)
- Arxiv link: https://arxiv.org/abs/2311.10456
- Pdf link: https://arxiv.org/pdf/2311.10456
- Abstract Microkinetics allows detailed modelling of chemical transformations occurring in many industrially relevant reactions. Traditional way of solving the microkinetics model for Fischer-Tropsch synthesis (FTS) becomes inefficient when it comes to more advanced real-time applications. In this work, we address these challenges by using physics-informed neural networks(PINNs) for modelling FTS microkinetics. We propose a computationally efficient and accurate method, enabling the ultra-fast solution of the existing microkinetics models in realistic process conditions. The proposed PINN model computes the fraction of vacant catalytic sites, a key quantity in FTS microkinetics, with median relative error (MRE) of 0.03%, and the FTS product formation rates with MRE of 0.1%. Compared to conventional equation solvers, the model achieves up to 1E+06 times speed-up when running on GPUs, thus being fast enough for multi-scale and multi-physics reactor modelling and enabling its applications in real-time process control and optimization.
Mixed Reality UI Adaptations with Inaccurate and Incomplete Objectives
- Authors: Authors: Christoph Albert Johns, João Marcelo Evangelista Belo
- Subjects: Human-Computer Interaction (cs.HC)
- Arxiv link: https://arxiv.org/abs/2311.10466
- Pdf link: https://arxiv.org/pdf/2311.10466
- Abstract This position paper outlines a new approach to adapting 3D user interface (UI) layouts given the complex nature of end-user preferences. Current optimization techniques, which mainly rely on weighted sum methods, can be inflexible and result in unsatisfactory adaptations. We propose using multi-objective optimization and interactive preference elicitation to provide semi-automated, flexible, and effective adaptations of 3D UIs. Our approach is demonstrated using an example of single-element 3D layout adaptation with ergonomic objectives. Future work is needed to address questions around the presentation and selection of optimal solutions, the impact on cognitive load, and the integration of preference learning. We conclude that, to make adaptive 3D UIs truly effective, we must acknowledge the limitations of our optimization objectives and techniques and emphasize the importance of user control.
Regions are Who Walk Them: a Large Pre-trained Spatiotemporal Model Based on Human Mobility for Ubiquitous Urban Sensing
- Authors: Authors: Ruixing Zhang, Liangzhe Han, Leilei Sun, Yunqi Liu, Jibin Wang, Weifeng Lv
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2311.10471
- Pdf link: https://arxiv.org/pdf/2311.10471
- Abstract User profiling and region analysis are two tasks of significant commercial value. However, in practical applications, modeling different features typically involves four main steps: data preparation, data processing, model establishment, evaluation, and optimization. This process is time-consuming and labor-intensive. Repeating this workflow for each feature results in abundant development time for tasks and a reduced overall volume of task development. Indeed, human mobility data contains a wealth of information. Several successful cases suggest that conducting in-depth analysis of population movement data could potentially yield meaningful profiles about users and areas. Nonetheless, most related works have not thoroughly utilized the semantic information within human mobility data and trained on a fixed number of the regions. To tap into the rich information within population movement, based on the perspective that Regions Are Who walk them, we propose a large spatiotemporal model based on trajectories (RAW). It possesses the following characteristics: 1) Tailored for trajectory data, introducing a GPT-like structure with a parameter count of up to 1B; 2) Introducing a spatiotemporal fine-tuning module, interpreting trajectories as collection of users to derive arbitrary region embedding. This framework allows rapid task development based on the large spatiotemporal model. We conducted extensive experiments to validate the effectiveness of our proposed large spatiotemporal model. It's evident that our proposed method, relying solely on human mobility data without additional features, exhibits a certain level of relevance in user profiling and region analysis. Moreover, our model showcases promising predictive capabilities in trajectory generation tasks based on the current state, offering the potential for further innovative work utilizing this large spatiotemporal model.
A large and natural Class of $Σ^p_2$- and $Σ^p_3$-complete Problems in Bilevel and Robust Optimization
- Authors: Authors: Christoph Grüne, Lasse Wulf
- Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2311.10540
- Pdf link: https://arxiv.org/pdf/2311.10540
- Abstract Because $\Sigma^p_2$- and $\Sigma^p_3$-hardness proofs are usually tedious and difficult, not so many complete problems for these classes are known. This is especially true in the areas of min-max regret robust optimization, network interdiction, most vital vertex problems, blocker problems, and two-stage adjustable robust optimization problems. Even though these areas are well-researched for over two decades and one would naturally expect many (if not most) of the problems occurring in these areas to be complete for the above classes, almost no completeness results exist in the literature. We address this lack of knowledge by introducing over 70 new $\Sigma^p_2$-complete and $\Sigma^p_3$-complete problems. We achieve this result by proving a new meta-theorem, which shows $\Sigma^p_2$- and $\Sigma^p_3$-completeness simultaneously for a huge class of problems. The majority of all earlier publications on $\Sigma^p_2$- and $\Sigma^p_3$-completeness in said areas are special cases of our meta-theorem. Our precise result is the following: We introduce a large list of problems for which the meta-theorem is applicable (including clique, vertex cover, knapsack, TSP, facility location and many more). For every problem on this list, we show: The interdiction/minimum cost blocker/most vital nodes problem (with element costs) is $\Sigma^p_2$-complete. The min-max-regret problem with interval uncertainty is $\Sigma^p_2$-complete. The two-stage adjustable robust optimization problem with discrete budgeted uncertainty is $\Sigma^p_3$-complete. In summary, our work reveals the interesting insight that a large amount of NP-complete problems have the property that their min-max versions are 'automatically' $\Sigma^p_2$-complete.
A Universal Framework for Multiport Network Analysis of Reconfigurable Intelligent Surfaces
- Authors: Authors: Matteo Nerini, Shanpu Shen, Hongyu Li, Marco Di Renzo, Bruno Clerckx
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.10561
- Pdf link: https://arxiv.org/pdf/2311.10561
- Abstract Reconfigurable intelligent surface (RIS) is an emerging paradigm able to control the propagation environment in wireless systems. Most of the research on RIS has been dedicated to system-level optimization and, with the advent of beyond diagonal RIS (BD-RIS), to RIS architecture design. However, developing general and unified electromagnetic (EM)-compliant models for RIS-aided systems remains an open problem. In this study, we propose a universal framework for the multiport network analysis of RIS-aided systems. With our framework, we model RIS-aided systems and RIS architectures through impedance, admittance, and scattering parameter analysis. Based on these analyses, three equivalent models are derived accounting for the effects of impedance mismatching and mutual coupling. The three models are then simplified by assuming large transmission distances, perfect matching, and no mutual coupling to understand the role of the RIS in the communication model. The derived simplified models are consistent with the model used in related literature, although we show that an additional approximation is commonly considered in the literature. We discuss the benefits of each analysis in characterizing and optimizing the RIS and how to select the most suitable parameters according to the needs. Numerical results provide additional evidence of the equivalence of the three analyses.
Implicit Maximum a Posteriori Filtering via Adaptive Optimization
- Authors: Authors: Gianluca M. Bencomo, Jake C. Snell, Thomas L. Griffiths
- Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.10580
- Pdf link: https://arxiv.org/pdf/2311.10580
- Abstract Bayesian filtering approximates the true underlying behavior of a time-varying system by inverting an explicit generative model to convert noisy measurements into state estimates. This process typically requires either storage, inversion, and multiplication of large matrices or Monte Carlo estimation, neither of which are practical in high-dimensional state spaces such as the weight spaces of artificial neural networks. Here, we frame the standard Bayesian filtering problem as optimization over a time-varying objective. Instead of maintaining matrices for the filtering equations or simulating particles, we specify an optimizer that defines the Bayesian filter implicitly. In the linear-Gaussian setting, we show that every Kalman filter has an equivalent formulation using K steps of gradient descent. In the nonlinear setting, our experiments demonstrate that our framework results in filters that are effective, robust, and scalable to high-dimensional systems, comparing well against the standard toolbox of Bayesian filtering solutions. We suggest that it is easier to fine-tune an optimizer than it is to specify the correct filtering equations, making our framework an attractive option for high-dimensional filtering problems.
Active Inference on the Edge: A Design Study
- Authors: Authors: Boris Sedlak, Victor Casamayor Pujol, Praveen Kumar Donta, Schahram Dustdar
- Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.10607
- Pdf link: https://arxiv.org/pdf/2311.10607
- Abstract Machine Learning (ML) is a common tool to interpret and predict the behavior of distributed computing systems, e.g., to optimize the task distribution between devices. As more and more data is created by Internet of Things (IoT) devices, data processing and ML training are carried out by edge devices in close proximity. To ensure Quality of Service (QoS) throughout these operations, systems are supervised and dynamically adapted with the help of ML. However, as long as ML models are not retrained, they fail to capture gradual shifts in the variable distribution, leading to an inaccurate view of the system state. Moreover, as the prediction accuracy decreases, the reporting device should actively resolve uncertainties to improve the model's precision. Such a level of self-determination could be provided by Active Inference (ACI) -- a concept from neuroscience that describes how the brain constantly predicts and evaluates sensory information to decrease long-term surprise. We encompassed these concepts in a single action-perception cycle, which we implemented for distributed agents in a smart manufacturing use case. As a result, we showed how our ACI agent was able to quickly and traceably solve an optimization problem while fulfilling QoS requirements.
Online Calibration of Deep Learning Sub-Models for Hybrid Numerical Modeling Systems
- Authors: Authors: Said Ouala, Bertrand Chapron, Fabrice Collard, Lucile Gaultier, Ronan Fablet
- Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
- Arxiv link: https://arxiv.org/abs/2311.10665
- Pdf link: https://arxiv.org/pdf/2311.10665
- Abstract Artificial intelligence and deep learning are currently reshaping numerical simulation frameworks by introducing new modeling capabilities. These frameworks are extensively investigated in the context of model correction and parameterization where they demonstrate great potential and often outperform traditional physical models. Most of these efforts in defining hybrid dynamical systems follow {offline} learning strategies in which the neural parameterization (called here sub-model) is trained to output an ideal correction. Yet, these hybrid models can face hard limitations when defining what should be a relevant sub-model response that would translate into a good forecasting performance. End-to-end learning schemes, also referred to as online learning, could address such a shortcoming by allowing the deep learning sub-models to train on historical data. However, defining end-to-end training schemes for the calibration of neural sub-models in hybrid systems requires working with an optimization problem that involves the solver of the physical equations. Online learning methodologies thus require the numerical model to be differentiable, which is not the case for most modeling systems. To overcome this difficulty and bypass the differentiability challenge of physical models, we present an efficient and practical online learning approach for hybrid systems. The method, called EGA for Euler Gradient Approximation, assumes an additive neural correction to the physical model, and an explicit Euler approximation of the gradients. We demonstrate that the EGA converges to the exact gradients in the limit of infinitely small time steps. Numerical experiments are performed on various case studies, including prototypical ocean-atmosphere dynamics. Results show significant improvements over offline learning, highlighting the potential of end-to-end online learning for hybrid modeling.
Optimal Path Planning for Aerial Load Transportation in Complex Environments using PSO-Improved Artificial Potential Fields
- Authors: Authors: Ali Akbar Rezaei Lori
- Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.10675
- Pdf link: https://arxiv.org/pdf/2311.10675
- Abstract In this article, we investigate the optimal path planning for aerial load transportation in complex, dynamic, and static environments using Particle Swarm Optimization (PSO). A hierarchical optimal control system is designed for a quadrotor equipped with a cable-suspended payload, employing Euler-Lagrange equations of motion. To navigate through obstacles, an improved artificial potential field combined with the PSO algorithm is used to determine the shortest path for a virtual point, acting as a leader. This leader guides the system toward the target point while avoiding collisions with both fixed and moving obstacles. The gravitational and repulsion coefficient forces using various PSO methods are fine-tuned to achieve the best trajectory and minimize time duration. The identified point serves as the desired location for quadrotor position control, based on a sliding mode strategy. Finally, we present numerical results to demonstrate the successful transportation of the payload by the system.
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
- Authors: Authors: Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2311.10702
- Pdf link: https://arxiv.org/pdf/2311.10702
- Abstract Since the release of T"ULU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques. We test and incorporate a number of these advances into T"ULU, resulting in T"ULU 2, a suite of improved T"ULU models for advancing the understanding and best practices of adapting pretrained language models to downstream tasks and user preferences. Concretely, we release: (1) T"ULU-V2-mix, an improved collection of high-quality instruction datasets; (2) T"ULU 2, LLAMA-2 models finetuned on the V2 mixture; (3) T"ULU 2+DPO, T"ULU 2 models trained with direct preference optimization (DPO), including the largest DPO-trained model to date (T"ULU 2+DPO 70B); (4) CODE T"ULU 2, CODE LLAMA models finetuned on our V2 mix that outperform CODE LLAMA and its instruction-tuned variant, CODE LLAMA-Instruct. Our evaluation from multiple perspectives shows that the T"ULU 2 suite achieves state-of-the-art performance among open models and matches or exceeds the performance of GPT-3.5-turbo-0301 on several benchmarks. We release all the checkpoints, data, training and evaluation code to facilitate future open efforts on adapting large language models.
Multimodal Representation Learning by Alternating Unimodal Adaptation
- Authors: Authors: Xiaohui Zhang, Jaehong Yoon, Mohit Bansal, Huaxiu Yao
- Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2311.10707
- Pdf link: https://arxiv.org/pdf/2311.10707
- Abstract Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant than others during multimodal learning, resulting in suboptimal performance. To address this challenge, we propose MLA (Multimodal Learning with Alternating Unimodal Adaptation). MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process, thereby minimizing interference between modalities. Simultaneously, it captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities. This optimization process is controlled by a gradient modification mechanism to prevent the shared head from losing previously acquired information. During the inference phase, MLA utilizes a test-time uncertainty-based model fusion mechanism to integrate multimodal information. Extensive experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities. These experiments demonstrate the superiority of MLA over competing prior approaches.
Keyword: adam
Minimum Star Partitions of Simple Polygons in Polynomial Time
- Authors: Authors: Mikkel Abrahamsen, Joakim Blikstad, André Nusser, Hanwen Zhang
- Subjects: Computational Geometry (cs.CG)
- Arxiv link: https://arxiv.org/abs/2311.10631
- Pdf link: https://arxiv.org/pdf/2311.10631
- Abstract We devise a polynomial-time algorithm for partitioning a simple polygon $P$ into a minimum number of star-shaped polygons. The question of whether such an algorithm exists has been open for more than four decades [Avis and Toussaint, Pattern Recognit., 1981] and it has been repeated frequently, for example in O'Rourke's famous book [Art Gallery Theorems and Algorithms, 1987]. In addition to its strong theoretical motivation, the problem is also motivated by practical domains such as CNC pocket milling, motion planning, and shape parameterization. The only previously known algorithm for a non-trivial special case is for $P$ being both monotone and rectilinear [Liu and Ntafos, Algorithmica, 1991]. For general polygons, an algorithm was only known for the restricted version in which Steiner points are disallowed [Keil, SIAM J. Comput., 1985], meaning that each corner of a piece in the partition must also be a corner of $P$. Interestingly, the solution size for the restricted version may be linear for instances where the unrestricted solution has constant size. The covering variant in which the pieces are star-shaped but allowed to overlap--known as the Art Gallery Problem--was recently shown to be $\exists\mathbb R$-complete and is thus likely not in NP [Abrahamsen, Adamaszek and Miltzow, STOC 2018 & J. ACM 2022]; this is in stark contrast to our result. Arguably the most related work to ours is the polynomial-time algorithm to partition a simple polygon into a minimum number of convex pieces by Chazelle and Dobkin~[STOC, 1979 & Comp. Geom., 1985].
Keyword: gradient
Implicit Maximum a Posteriori Filtering via Adaptive Optimization
- Authors: Authors: Gianluca M. Bencomo, Jake C. Snell, Thomas L. Griffiths
- Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.10580
- Pdf link: https://arxiv.org/pdf/2311.10580
- Abstract Bayesian filtering approximates the true underlying behavior of a time-varying system by inverting an explicit generative model to convert noisy measurements into state estimates. This process typically requires either storage, inversion, and multiplication of large matrices or Monte Carlo estimation, neither of which are practical in high-dimensional state spaces such as the weight spaces of artificial neural networks. Here, we frame the standard Bayesian filtering problem as optimization over a time-varying objective. Instead of maintaining matrices for the filtering equations or simulating particles, we specify an optimizer that defines the Bayesian filter implicitly. In the linear-Gaussian setting, we show that every Kalman filter has an equivalent formulation using K steps of gradient descent. In the nonlinear setting, our experiments demonstrate that our framework results in filters that are effective, robust, and scalable to high-dimensional systems, comparing well against the standard toolbox of Bayesian filtering solutions. We suggest that it is easier to fine-tune an optimizer than it is to specify the correct filtering equations, making our framework an attractive option for high-dimensional filtering problems.
Online Calibration of Deep Learning Sub-Models for Hybrid Numerical Modeling Systems
- Authors: Authors: Said Ouala, Bertrand Chapron, Fabrice Collard, Lucile Gaultier, Ronan Fablet
- Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
- Arxiv link: https://arxiv.org/abs/2311.10665
- Pdf link: https://arxiv.org/pdf/2311.10665
- Abstract Artificial intelligence and deep learning are currently reshaping numerical simulation frameworks by introducing new modeling capabilities. These frameworks are extensively investigated in the context of model correction and parameterization where they demonstrate great potential and often outperform traditional physical models. Most of these efforts in defining hybrid dynamical systems follow {offline} learning strategies in which the neural parameterization (called here sub-model) is trained to output an ideal correction. Yet, these hybrid models can face hard limitations when defining what should be a relevant sub-model response that would translate into a good forecasting performance. End-to-end learning schemes, also referred to as online learning, could address such a shortcoming by allowing the deep learning sub-models to train on historical data. However, defining end-to-end training schemes for the calibration of neural sub-models in hybrid systems requires working with an optimization problem that involves the solver of the physical equations. Online learning methodologies thus require the numerical model to be differentiable, which is not the case for most modeling systems. To overcome this difficulty and bypass the differentiability challenge of physical models, we present an efficient and practical online learning approach for hybrid systems. The method, called EGA for Euler Gradient Approximation, assumes an additive neural correction to the physical model, and an explicit Euler approximation of the gradients. We demonstrate that the EGA converges to the exact gradients in the limit of infinitely small time steps. Numerical experiments are performed on various case studies, including prototypical ocean-atmosphere dynamics. Results show significant improvements over offline learning, highlighting the potential of end-to-end online learning for hybrid modeling.
Multimodal Representation Learning by Alternating Unimodal Adaptation
- Authors: Authors: Xiaohui Zhang, Jaehong Yoon, Mohit Bansal, Huaxiu Yao
- Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2311.10707
- Pdf link: https://arxiv.org/pdf/2311.10707
- Abstract Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant than others during multimodal learning, resulting in suboptimal performance. To address this challenge, we propose MLA (Multimodal Learning with Alternating Unimodal Adaptation). MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process, thereby minimizing interference between modalities. Simultaneously, it captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities. This optimization process is controlled by a gradient modification mechanism to prevent the shared head from losing previously acquired information. During the inference phase, MLA utilizes a test-time uncertainty-based model fusion mechanism to integrate multimodal information. Extensive experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities. These experiments demonstrate the superiority of MLA over competing prior approaches.
Keyword: super-resolution
There is no result