arxiv-updates icon indicating copy to clipboard operation
arxiv-updates copied to clipboard

New submissions for Wed, 22 Nov 23

Open zoq opened this issue 1 year ago • 0 comments

Keyword: sgd

There is no result

Keyword: optimization

Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation

  • Authors: Authors: Zhuoran Zhou, Zhongyu Jiang, Wenhao Chai, Cheng-Yen Yang, Lei Li, Jenq-Neng Hwang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2311.12043
  • Pdf link: https://arxiv.org/pdf/2311.12043
  • Abstract Although 3D human pose estimation has gained impressive development in recent years, only a few works focus on infants, that have different bone lengths and also have limited data. Directly applying adult pose estimation models typically achieves low performance in the infant domain and suffers from out-of-distribution issues. Moreover, the limitation of infant pose data collection also heavily constrains the efficiency of learning-based models to lift 2D poses to 3D. To deal with the issues of small datasets, domain adaptation and data augmentation are commonly used techniques. Following this paradigm, we take advantage of an optimization-based method that utilizes generative priors to predict 3D infant keypoints from 2D keypoints without the need of large training data. We further apply a guided diffusion model to domain adapt 3D adult pose to infant pose to supplement small datasets. Besides, we also prove that our method, ZeDO-i, could attain efficient domain adaptation, even if only a small number of data is given. Quantitatively, we claim that our model attains state-of-the-art MPJPE performance of 43.6 mm on the SyRIP dataset and 21.2 mm on the MINI-RGBD dataset.

3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing

  • Authors: Authors: Haoran Li, Long Ma, Yong Liao, Lechao Cheng, Yanbin Hao, Pengyuan Zhou
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2311.12050
  • Pdf link: https://arxiv.org/pdf/2311.12050
  • Abstract The current GAN inversion methods typically can only edit the appearance and shape of a single object and background while overlooking spatial information. In this work, we propose a 3D editing framework, 3D-GOI, to enable multifaceted editing of affine information (scale, translation, and rotation) on multiple objects. 3D-GOI realizes the complex editing function by inverting the abundance of attribute codes (object shape/appearance/scale/rotation/translation, background shape/appearance, and camera pose) controlled by GIRAFFE, a renowned 3D GAN. Accurately inverting all the codes is challenging, 3D-GOI solves this challenge following three main steps. First, we segment the objects and the background in a multi-object image. Second, we use a custom Neural Inversion Encoder to obtain coarse codes of each object. Finally, we use a round-robin optimization algorithm to get precise codes to reconstruct the image. To the best of our knowledge, 3D-GOI is the first framework to enable multifaceted editing on multiple objects. Both qualitative and quantitative experiments demonstrate that 3D-GOI holds immense potential for flexible, multifaceted editing in complex multi-object scenes.

Security Fence Inspection at Airports Using Object Detection

  • Authors: Authors: Nils Friederich, Andreas Specker, Jürgen Beyerer
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2311.12064
  • Pdf link: https://arxiv.org/pdf/2311.12064
  • Abstract To ensure the security of airports, it is essential to protect the airside from unauthorized access. For this purpose, security fences are commonly used, but they require regular inspection to detect damages. However, due to the growing shortage of human specialists and the large manual effort, there is the need for automated methods. The aim is to automatically inspect the fence for damage with the help of an autonomous robot. In this work, we explore object detection methods to address the fence inspection task and localize various types of damages. In addition to evaluating four State-of-the-Art (SOTA) object detection models, we analyze the impact of several design criteria, aiming at adapting to the task-specific challenges. This includes contrast adjustment, optimization of hyperparameters, and utilization of modern backbones. The experimental results indicate that our optimized You Only Look Once v5 (YOLOv5) model achieves the highest accuracy of the four methods with an increase of 6.9% points in Average Precision (AP) compared to the baseline. Moreover, we show the real-time capability of the model. The trained models are published on GitHub: https://github.com/N-Friederich/airport_fence_inspection.

DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation

  • Authors: Authors: Yifei Li, Hsiao-yu Chen, Egor Larionov, Nikolaos Sarafianos, Wojciech Matusik, Tuur Stuyck
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2311.12194
  • Pdf link: https://arxiv.org/pdf/2311.12194
  • Abstract The realism of digital avatars is crucial in enabling telepresence applications with self-expression and customization. A key aspect of this realism originates from the physical accuracy of both a true-to-life body shape and clothing. While physical simulations can produce high-quality, realistic motions for clothed humans, they require precise estimation of body shape and high-quality garment assets with associated physical parameters for cloth simulations. However, manually creating these assets and calibrating their parameters is labor-intensive and requires specialized expertise. To address this gap, we propose DiffAvatar, a novel approach that performs body and garment co-optimization using differentiable simulation. By integrating physical simulation into the optimization loop and accounting for the complex nonlinear behavior of cloth and its intricate interaction with the body, our framework recovers body and garment geometry and extracts important material parameters in a physically plausible way. Our experiments demonstrate that our approach generates realistic clothing and body shape that can be easily used in downstream applications.

Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation

  • Authors: Authors: Chenyang Gao, Yue Gu, Ivan Marsic
  • Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2311.12199
  • Pdf link: https://arxiv.org/pdf/2311.12199
  • Abstract In supervised speech separation, permutation invariant training (PIT) is widely used to handle label ambiguity by selecting the best permutation to update the model. Despite its success, previous studies showed that PIT is plagued by excessive label assignment switching in adjacent epochs, impeding the model to learn better label assignments. To address this issue, we propose a novel training strategy, dynamic sample dropout (DSD), which considers previous best label assignments and evaluation metrics to exclude the samples that may negatively impact the learned label assignments during training. Additionally, we include layer-wise optimization (LO) to improve the performance by solving layer-decoupling. Our experiments showed that combining DSD and LO outperforms the baseline and solves excessive label assignment switching and layer-decoupling issues. The proposed DSD and LO approach is easy to implement, requires no extra training sets or steps, and shows generality to various speech separation tasks.

Day-Ahead Programming of Energy Communities Participating in Pay-as-Bid Service Markets

  • Authors: Authors: F. Conte, S. Massucco, G. Natrella, M. Saviozzi, F. Silvestro
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2311.12203
  • Pdf link: https://arxiv.org/pdf/2311.12203
  • Abstract This paper proposes an optimal strategy for a Renewable Energy Community participating in the Italian pay-as-bid ancillary service market. The community is composed by a group of residential customers sharing a common facility equipped with a PV power plant and a battery energy storage system. This battery is employed to maximize the community cash flow obtained by the participation in the service market. A scenario-based optimization problem is defined to size the bids to be submitted to the market and to define the corresponding optimal battery energy exchange profile for the day ahead. The proposed optimization scheme is able to take in to account the probability of acceptance of the service market bids and the uncertainties on the forecasts of PV generation and energy demands. Results prove the effectiveness of the approach and the economic advantages on participating in the service market.

Fast Inner-Product Algorithms and Architectures for Deep Neural Network Accelerators

  • Authors: Authors: Trevor E. Pogue, Nicola Nicolici
  • Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2311.12224
  • Pdf link: https://arxiv.org/pdf/2311.12224
  • Abstract We introduce a new algorithm called the Free-pipeline Fast Inner Product (FFIP) and its hardware architecture that improve an under-explored fast inner-product algorithm (FIP) proposed by Winograd in 1968. Unlike the unrelated Winograd minimal filtering algorithms for convolutional layers, FIP is applicable to all machine learning (ML) model layers that can mainly decompose to matrix multiplication, including fully-connected, convolutional, recurrent, and attention/transformer layers. We implement FIP for the first time in an ML accelerator then present our FFIP algorithm and generalized architecture which inherently improve FIP's clock frequency and, as a consequence, throughput for a similar hardware cost. Finally, we contribute ML-specific optimizations for the FIP and FFIP algorithms and architectures. We show that FFIP can be seamlessly incorporated into traditional fixed-point systolic array ML accelerators to achieve the same throughput with half the number of multiply-accumulate (MAC) units, or it can double the maximum systolic array size that can fit onto devices with a fixed hardware budget. Our FFIP implementation for non-sparse ML models with 8 to 16-bit fixed-point inputs achieves higher throughput and compute efficiency than the best-in-class prior solutions on the same type of compute platform.

InteraSSort: Interactive Assortment Planning Using Large Language Models

  • Authors: Authors: Saketh Reddy Karra, Theja Tulabandhula
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2311.12241
  • Pdf link: https://arxiv.org/pdf/2311.12241
  • Abstract Assortment planning, integral to multiple commercial offerings, is a key problem studied in e-commerce and retail settings. Numerous variants of the problem along with their integration into business solutions have been thoroughly investigated in the existing literature. However, the nuanced complexities of in-store planning and a lack of optimization proficiency among store planners with strong domain expertise remain largely overlooked. These challenges frequently necessitate collaborative efforts with multiple stakeholders which often lead to prolonged decision-making processes and significant delays. To mitigate these challenges and capitalize on the advancements of Large Language Models (LLMs), we propose an interactive assortment planning framework, InteraSSort that augments LLMs with optimization tools to assist store planners in making decisions through interactive conversations. Specifically, we develop a solution featuring a user-friendly interface that enables users to express their optimization objectives as input text prompts to InteraSSort and receive tailored optimized solutions as output. Our framework extends beyond basic functionality by enabling the inclusion of additional constraints through interactive conversation, facilitating precise and highly customized decision-making. Extensive experiments demonstrate the effectiveness of our framework and potential extensions to a broad range of operations management challenges.

The limitation of neural nets for approximation and optimization

  • Authors: Authors: Tommaso Giovannelli, Oumaima Sohab, Luis Nunes Vicente
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2311.12253
  • Pdf link: https://arxiv.org/pdf/2311.12253
  • Abstract We are interested in assessing the use of neural networks as surrogate models to approximate and minimize objective functions in optimization problems. While neural networks are widely used for machine learning tasks such as classification and regression, their application in solving optimization problems has been limited. Our study begins by determining the best activation function for approximating the objective functions of popular nonlinear optimization test problems, and the evidence provided shows that~SiLU has the best performance. We then analyze the accuracy of function value, gradient, and Hessian approximations for such objective functions obtained through interpolation/regression models and neural networks. When compared to interpolation/regression models, neural networks can deliver competitive zero- and first-order approximations (at a high training cost) but underperform on second-order approximation. However, it is shown that combining a neural net activation function with the natural basis for quadratic interpolation/regression can waive the necessity of including cross terms in the natural basis, leading to models with fewer parameters to determine. Lastly, we provide evidence that the performance of a state-of-the-art derivative-free optimization algorithm can hardly be improved when the gradient of an objective function is approximated using any of the surrogate models considered, including neural networks.

How AI-driven Digital Twins Can Empower Mobile Networks

  • Authors: Authors: Tong Li, Fenyu Jiang, Qiaohong Yu, Wenzhen Huang, Tao Jiang, Depeng Jin
  • Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2311.12273
  • Pdf link: https://arxiv.org/pdf/2311.12273
  • Abstract The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which serves as validation for the optimizer's decision outcomes, is used explicitly to train artificial intelligence (AI) empowered optimizers iteratively. In practice, we develop a network digital twin prototype system leveraging data-driven technology to accurately model the behaviors of mobile network elements (e.g., mobile users and base stations), wireless environments, and network performance. An AI-powered network optimizer has been developed based on the deployed MNDT prototype system for providing reliable and optimized network configurations. The results of the experiments demonstrate that the proposed MNDT infrastructure can provide practical network optimization solutions while adapting to the more complex environment.

Micro Energy-Water-Hydrogen Nexus: Data-driven Real-time Optimal Operation

  • Authors: Authors: Mostafa Goodarzi, Qifeng Li
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2311.12274
  • Pdf link: https://arxiv.org/pdf/2311.12274
  • Abstract This paper extends a new concept of energy-water-hydrogen (EWH) nexus, which was recently developed as a solution for reducing carbon emissions from the generation side of power systems, to the distribution side. Under the concept of distribution-level EWH (micro EWH) nexus, renewable energy sources (RES) are utilized to meet the energy needs of a small community. To avoid the uncertainty caused by RESs, this paper aims to investigate the real-time optimal operation of the micro EWH nexus which is however a challenging optimization problem. First, such a large-scale mixed-integer nonlinear programming problem is relaxed into a mixed-integer convex program (MICP) by leveraging the effective convex-hull relaxation technique. Second, a fast data-driven solution method based on active constraint and integer variable prediction is presented, which can solve the MICP problem very fast since it utilizes historical optimization data to quickly predict binary variable values and a limited set of active constraints.

Orthogonally weighted $\ell_{2,1}$ regularization for rank-aware joint sparse recovery: algorithm and analysis

  • Authors: Authors: Armenak Petrosyan, Konstantin Pieper, Hoang Tran
  • Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2311.12282
  • Pdf link: https://arxiv.org/pdf/2311.12282
  • Abstract We propose and analyze an efficient algorithm for solving the joint sparse recovery problem using a new regularization-based method, named orthogonally weighted $\ell_{2,1}$ ($\mathit{ow}\ell_{2,1}$), which is specifically designed to take into account the rank of the solution matrix. This method has applications in feature extraction, matrix column selection, and dictionary learning, and it is distinct from commonly used $\ell_{2,1}$ regularization and other existing regularization-based approaches because it can exploit the full rank of the row-sparse solution matrix, a key feature in many applications. We provide a proof of the method's rank-awareness, establish the existence of solutions to the proposed optimization problem, and develop an efficient algorithm for solving it, whose convergence is analyzed. We also present numerical experiments to illustrate the theory and demonstrate the effectiveness of our method on real-life problems.

Adapting LLMs for Efficient, Personalized Information Retrieval: Methods and Implications

  • Authors: Authors: Samira Ghodratnama, Mehrdad Zakershahrak
  • Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2311.12287
  • Pdf link: https://arxiv.org/pdf/2311.12287
  • Abstract The advent of Large Language Models (LLMs) heralds a pivotal shift in online user interactions with information. Traditional Information Retrieval (IR) systems primarily relied on query-document matching, whereas LLMs excel in comprehending and generating human-like text, thereby enriching the IR experience significantly. While LLMs are often associated with chatbot functionalities, this paper extends the discussion to their explicit application in information retrieval. We explore methodologies to optimize the retrieval process, select optimal models, and effectively scale and orchestrate LLMs, aiming for cost-efficiency and enhanced result accuracy. A notable challenge, model hallucination-where the model yields inaccurate or misinterpreted data-is addressed alongside other model-specific hurdles. Our discourse extends to crucial considerations including user privacy, data optimization, and the necessity for system clarity and interpretability. Through a comprehensive examination, we unveil not only innovative strategies for integrating Language Models (LLMs) with Information Retrieval (IR) systems, but also the consequential considerations that underline the need for a balanced approach aligned with user-centric principles.

Stable Diffusion For Aerial Object Detection

  • Authors: Authors: Yanan Jian, Fuxun Yu, Simranjit Singh, Dimitrios Stamoulis
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2311.12345
  • Pdf link: https://arxiv.org/pdf/2311.12345
  • Abstract Aerial object detection is a challenging task, in which one major obstacle lies in the limitations of large-scale data collection and the long-tail distribution of certain classes. Synthetic data offers a promising solution, especially with recent advances in diffusion-based methods like stable diffusion (SD). However, the direct application of diffusion methods to aerial domains poses unique challenges: stable diffusion's optimization for rich ground-level semantics doesn't align with the sparse nature of aerial objects, and the extraction of post-synthesis object coordinates remains problematic. To address these challenges, we introduce a synthetic data augmentation framework tailored for aerial images. It encompasses sparse-to-dense region of interest (ROI) extraction to bridge the semantic gap, fine-tuning the diffusion model with low-rank adaptation (LORA) to circumvent exhaustive retraining, and finally, a Copy-Paste method to compose synthesized objects with backgrounds, providing a nuanced approach to aerial object detection through synthetic data.

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

  • Authors: Authors: Yunpeng Huang, Jingwei Xu, Zixu Jiang, Junyu Lai, Zenan Li, Yuan Yao, Taolue Chen, Lijuan Yang, Zhou Xin, Xiaoxing Ma
  • Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2311.12351
  • Pdf link: https://arxiv.org/pdf/2311.12351
  • Abstract With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs) have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been applied in diverse areas as knowledge bases, human interfaces, and dynamic agents. However, a prevailing limitation exists: many current LLMs, constrained by resources, are primarily pre-trained on shorter texts, rendering them less effective for longer-context prompts, commonly encountered in real-world settings. In this paper, we present a comprehensive survey focusing on the advancement of model architecture in Transformer-based LLMs to optimize long-context capabilities across all stages from pre-training to inference. We firstly delineate and analyze the problems of handling long-context input and output with the current Transformer-based models. Then, we mainly offer a holistic taxonomy to navigate the landscape of Transformer upgrades on architecture to solve these problems. Afterward, we provide the investigation on wildly used evaluation necessities tailored for long-context LLMs, including datasets, metrics, and baseline models, as well as some amazing optimization toolkits like libraries, systems, and compilers to augment LLMs' efficiency and efficacy across different stages. Finally, we further discuss the predominant challenges and potential avenues for future research in this domain. Additionally, we have established a repository where we curate relevant literature with real-time updates at https://github.com/Strivin0311/long-llms-learning.

RFTrans: Leveraging Refractive Flow of Transparent Objects for Surface Normal Estimation and Manipulation

  • Authors: Authors: Tutian Tang, Jiyu Liu, Jieyi Zhang, Haoyuan Fu, Wenqiang Xu, Cewu Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2311.12398
  • Pdf link: https://arxiv.org/pdf/2311.12398
  • Abstract Transparent objects are widely used in our daily lives, making it important to teach robots to interact with them. However, it's not easy because the reflective and refractive effects can make RGB-D cameras fail to give accurate geometry measurements. To solve this problem, this paper introduces RFTrans, an RGB-D-based method for surface normal estimation and manipulation of transparent objects. By leveraging refractive flow as an intermediate representation, RFTrans circumvents the drawbacks of directly predicting the geometry (e.g. surface normal) from RGB images and helps bridge the sim-to-real gap. RFTrans integrates the RFNet, which predicts refractive flow, object mask, and boundaries, followed by the F2Net, which estimates surface normal from the refractive flow. To make manipulation possible, a global optimization module will take in the predictions, refine the raw depth, and construct the point cloud with normal. An analytical grasp planning algorithm, ISF, is followed to generate the grasp poses. We build a synthetic dataset with physically plausible ray-tracing rendering techniques to train the networks. Results show that the RFTrans trained on the synthetic dataset can consistently outperform the baseline ClearGrasp in both synthetic and real-world benchmarks by a large margin. Finally, a real-world robot grasping task witnesses an 83% success rate, proving that refractive flow can help enable direct sim-to-real transfer. The code, data, and supplementary materials are available at https://rftrans.robotflow.ai.

nach0: Multimodal Natural and Chemical Languages Foundation Model

  • Authors: Authors: Micha Livne, Zulfat Miftahutdinov, Elena Tutubalina, Maksim Kuznetsov, Daniil Polykovskiy, Annika Brundyn, Aastha Jhunjhunwala, Anthony Costa, Alex Aliper, Alex Zhavoronkov
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2311.12410
  • Pdf link: https://arxiv.org/pdf/2311.12410
  • Abstract Large Language Models (LLMs) have substantially driven scientific progress in various domains, and many papers have demonstrated their ability to tackle complex problems with creative solutions. Our paper introduces a new foundation model, nach0, capable of solving various chemical and biological tasks: biomedical question answering, named entity recognition, molecular generation, molecular synthesis, attributes prediction, and others. nach0 is a multi-domain and multi-task encoder-decoder LLM pre-trained on unlabeled text from scientific literature, patents, and molecule strings to incorporate a range of chemical and linguistic knowledge. We employed instruction tuning, where specific task-related instructions are utilized to fine-tune nach0 for the final set of tasks. To train nach0 effectively, we leverage the NeMo framework, enabling efficient parallel optimization of both base and large model versions. Extensive experiments demonstrate that our model outperforms state-of-the-art baselines on single-domain and cross-domain tasks. Furthermore, it can generate high-quality outputs in molecular and textual formats, showcasing its effectiveness in multi-domain setups.

Constellation Shaping under Phase Noise Impairment for Sub-THz Communications

  • Authors: Authors: Dileepa Marasinghe, Le Hang Nguyen, Jafar Mohammadi, Yejian Chen, Thorsten Wild, Nandana Rajatheva
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2311.12433
  • Pdf link: https://arxiv.org/pdf/2311.12433
  • Abstract The large untapped spectrum in the sub-THz allows for ultra-high throughput communication to realize many seemingly impossible applications in 6G. One of the challenges in radio communications in sub-THz is the hardware impairments. Specifically, phase noise is one key hardware impairment, which is accentuated as we increase the frequency and bandwidth. Furthermore, the modest output power of the sub-THz power amplifier demands limits on peak to average power ratio (PAPR) signal design. Single carrier frequency domain equalization (SC-FDE) waveform has been identified as a suitable candidate for sub-THz, although some challenges such as phase noise and PAPR still remain to be tackled. In this work, we design a phase noise robust, low PAPR SC-FDE waveform by geometrically shaping the constellation under practical conditions. We formulate the waveform optimization problem in its augmented Lagrangian form and use a back-propagation-inspired technique to obtain a constellation design that is numerically robust to phase noise, while maintaining a low PAPR.

Cooperative RIS and STAR-RIS assisted mMIMO Communication: Analysis and Optimization

  • Authors: Authors: Anastasios Papazafeiropoulos, Ahmet M. Elbir, Pandelis Kourtessis, Ioannis Krikidis, Symeon Chatzinotas
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2311.12473
  • Pdf link: https://arxiv.org/pdf/2311.12473
  • Abstract Reconfigurable intelligent surface (RIS) has emerged as a cost-effective and promising solution to extend the wireless signal coverage and improve the performance via passive signal reflection. Different from existing works which do not account for the cooperation between RISs or do not provide full space coverage, we propose the marriage of cooperative double-RIS with simultaneously transmitting and reflecting RIS (STAR-RIS) technologies denoted as RIS/STAR-RIS under correlated Rayleigh fading conditions to assist the communication in a massive multiple-input multiple-output (mMIMO) setup. The proposed architecture is superior since it enjoys the benefits of the individual designs. We introduce a channel estimation approach of the cascaded channels with reduced overhead. Also, we obtain the deterministic equivalent (DE) of the downlink achievable sum spectral efficiency (SE) in closed form based on large-scale statistics. Notably, relied on statistical channel state information (CSI), we optimise both surfaces by means of the projected gradient ascent method (PGAM), and obtain the gradients in closed form. The proposed optimization achieves to maximise the sum SE of such a complex system, and has low complexity and low overhead since it can be performed at every several coherence intervals. Numerical results show the benefit of the proposed architecture and verify the analytical framework. In particular, we show that the RIS/STAR-RIS architecture outperforms the conventional double-RIS or its single-RIS counterparts.

Uncertainty-Aware Test Prioritization: Approaches and Empirical Evaluation

  • Authors: Authors: Man Zhang, Jiahui Wu, Shaukat Ali, Tao Yue
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2311.12484
  • Pdf link: https://arxiv.org/pdf/2311.12484
  • Abstract Complex software systems, e.g., Cyber-Physical Systems (CPSs), interact with the real world; thus, they often behave unexpectedly in uncertain environments. Testing such systems is challenging due to limited resources, time, complex testing infrastructure setup, and the inherent uncertainties in their operating environment. Devising uncertainty-aware testing solutions supported with test optimization techniques can be considered as a mandate for tackling this challenge. This paper proposes an uncertainty-aware and time-aware test case prioritization approach, named UncerPrio, for optimizing a sequence of tests to execute with a multi-objective search. To guide the prioritization with uncertainty, we identify four uncertainty measures: uncertainty measurement (AUM), uncertainty space (PUS), the number of uncertainties (ANU), and uncertainty coverage (PUU). Based on these measures and their combinations, we proposed 10 uncertainty-aware and multi-objective test case prioritization problems, and each problem was additionally defined with one cost objective (execution cost, PET) to be minimized and one effective measure (model coverage, PTR) to be maximized. Moreover, considering time constraints for test executions (i.e., time-aware), we defined 10 time budgets for all the 10 problems for identifying the best strategy in solving uncertainty-aware test prioritization. In our empirical study, we employed four well-known Multi-Objective Search Algorithms (MuOSAs): NSGA-II, MOCell, SPEA2, and CellDE with five use cases from two industrial CPS subject systems, and used Random Algorithm (RS) as the comparison baseline. Results show that all the MuOSAs significantly outperformed RS. The strategy of Prob.6 f(PET,PTR,AUM,ANU) (i.e., the problem with uncertainty measures AUM and ANU combined) achieved the overall best performance in observing uncertainty when using 100% time budget.

Hyb-NeRF: A Multiresolution Hybrid Encoding for Neural Radiance Fields

  • Authors: Authors: Yifan Wang, Yi Gong, Yuan Zeng
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2311.12490
  • Pdf link: https://arxiv.org/pdf/2311.12490
  • Abstract Recent advances in Neural radiance fields (NeRF) have enabled high-fidelity scene reconstruction for novel view synthesis. However, NeRF requires hundreds of network evaluations per pixel to approximate a volume rendering integral, making it slow to train. Caching NeRFs into explicit data structures can effectively enhance rendering speed but at the cost of higher memory usage. To address these issues, we present Hyb-NeRF, a novel neural radiance field with a multi-resolution hybrid encoding that achieves efficient neural modeling and fast rendering, which also allows for high-quality novel view synthesis. The key idea of Hyb-NeRF is to represent the scene using different encoding strategies from coarse-to-fine resolution levels. Hyb-NeRF exploits memory-efficiency learnable positional features at coarse resolutions and the fast optimization speed and local details of hash-based feature grids at fine resolutions. In addition, to further boost performance, we embed cone tracing-based features in our learnable positional encoding that eliminates encoding ambiguity and reduces aliasing artifacts. Extensive experiments on both synthetic and real-world datasets show that Hyb-NeRF achieves faster rendering speed with better rending quality and even a lower memory footprint in comparison to previous state-of-the-art methods.

Multi-Objective Reinforcement Learning based on Decomposition: A taxonomy and framework

  • Authors: Authors: Florian Felten, El-Ghazali Talbi, Grégoire Danoy
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2311.12495
  • Pdf link: https://arxiv.org/pdf/2311.12495
  • Abstract Multi-objective reinforcement learning (MORL) extends traditional RL by seeking policies making different compromises among conflicting objectives. The recent surge of interest in MORL has led to diverse studies and solving methods, often drawing from existing knowledge in multi-objective optimization based on decomposition (MOO/D). Yet, a clear categorization based on both RL and MOO/D is lacking in the existing literature. Consequently, MORL researchers face difficulties when trying to classify contributions within a broader context due to the absence of a standardized taxonomy. To tackle such an issue, this paper introduces Multi-Objective Reinforcement Learning based on Decomposition (MORL/D), a novel methodology bridging RL and MOO literature. A comprehensive taxonomy for MORL/D is presented, providing a structured foundation for categorizing existing and potential MORL works. The introduced taxonomy is then used to scrutinize MORL research, enhancing clarity and conciseness through well-defined categorization. Moreover, a flexible framework derived from the taxonomy is introduced. This framework accommodates diverse instantiations using tools from both RL and MOO/D. Implementation across various configurations demonstrates its versatility, assessed against benchmark problems. Results indicate MORL/D instantiations achieve comparable performance with significantly greater versatility than current state-of-the-art approaches. By presenting the taxonomy and framework, this paper offers a comprehensive perspective and a unified vocabulary for MORL. This not only facilitates the identification of algorithmic contributions but also lays the groundwork for novel research avenues in MORL, contributing to the continued advancement of this field.

Hyena: Optimizing Homomorphically Encrypted Convolution for Private CNN Inference

  • Authors: Authors: Hyeri Roh, Woo-Seok Choi
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2311.12519
  • Pdf link: https://arxiv.org/pdf/2311.12519
  • Abstract Processing convolution layers remains a huge bottleneck for private deep convolutional neural network (CNN) inference for large datasets. To solve this issue, this paper presents a novel homomorphic convolution algorithm that provides speedup, communication cost, and storage saving. We first note that padded convolution provides the advantage of model storage saving, but it does not support channel packing, thereby increasing the amount of computation and communication. We address this limitation by proposing a novel plaintext multiplication algorithm using the Walsh-Hadamard matrix. Furthermore, we propose the optimization techniques to significantly reduce the latency of the proposed convolution by selecting the optimal encryption parameters and applying lazy reduction. It achieves 1.6-3.8x speedup and reduces the weight storage by 2000-8000x compared to the conventional convolution. When the proposed convolution is employed for CNNs like VGG-16, ResNet-20, and MobileNetV1 on ImageNet, it reduces the end-to-end latency by 1.3-2.6x, the memory usage by 2.1-7.9x and communication cost by 1.7-2.0x compared to conventional method.

Neural Network Pruning by Gradient Descent

  • Authors: Authors: Zhang Zhang, Ruyi Tao, Jiang Zhang
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2311.12526
  • Pdf link: https://arxiv.org/pdf/2311.12526
  • Abstract The rapid increase in the parameters of deep learning models has led to significant costs, challenging computational efficiency and model interpretability. In this paper, we introduce a novel and straightforward neural network pruning framework that incorporates the Gumbel-Softmax technique. This framework enables the simultaneous optimization of a network's weights and topology in an end-to-end process using stochastic gradient descent. Empirical results demonstrate its exceptional compression capability, maintaining high accuracy on the MNIST dataset with only 0.15% of the original network parameters. Moreover, our framework enhances neural network interpretability, not only by allowing easy extraction of feature importance directly from the pruned network but also by enabling visualization of feature symmetry and the pathways of information propagation from features to outcomes. Although the pruning strategy is learned through deep learning, it is surprisingly intuitive and understandable, focusing on selecting key representative features and exploiting data patterns to achieve extreme sparse pruning. We believe our method opens a promising new avenue for deep learning pruning and the creation of interpretable machine learning systems.

MetaStore: High-Performance Metagenomic Analysis via In-Storage Computing

  • Authors: Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Ma, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu
  • Subjects: Hardware Architecture (cs.AR); Genomics (q-bio.GN); Quantitative Methods (q-bio.QM)
  • Arxiv link: https://arxiv.org/abs/2311.12527
  • Pdf link: https://arxiv.org/pdf/2311.12527
  • Abstract Metagenomics has led to significant advancements in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases containing information on different species' genomes. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system to the rest of the system. In-storage processing can be a fundamental solution for reducing data movement overhead. However, designing an in-storage processing system for metagenomics is challenging because none of the existing approaches can be directly implemented in storage effectively due to the hardware limitations of modern SSDs. We propose MetaStore, the first in-storage processing system designed to significantly reduce the data movement overhead of end-to-end metagenomic analysis. MetaStore is enabled by our lightweight and cooperative design that effectively leverages and orchestrates processing inside and outside the storage system. Through our detailed analysis of the end-to-end metagenomic analysis pipeline and careful hardware/software co-design, we address in-storage processing challenges for metagenomics via specialized and efficient 1) task partitioning, 2) data/computation flow coordination, 3) storage technology-aware algorithmic optimizations, 4) light-weight in-storage accelerators, and 5) data mapping. Our evaluation shows that MetaStore outperforms the state-of-the-art performance- and accuracy-optimized software metagenomic tools by 2.7-37.2$\times$ and 6.9-100.2$\times$, respectively, while matching the accuracy of the accuracy-optimized tool. MetaStore achieves 1.5-5.1$\times$ speedup compared to the state-of-the-art metagenomic hardware-accelerated tool, while achieving significantly higher accuracy.

Oasis: Data Curation and Assessment System for Pretraining of Large Language Models

  • Authors: Authors: Tong Zhou, Yubo Chen, Pengfei Cao, Kang Liu, Jun Zhao, Shengping Liu
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2311.12537
  • Pdf link: https://arxiv.org/pdf/2311.12537
  • Abstract Data is one of the most critical elements in building a large language model. However, existing systems either fail to customize a corpus curation pipeline or neglect to leverage comprehensive corpus assessment for iterative optimization of the curation. To this end, we present a pretraining corpus curation and assessment platform called Oasis -- a one-stop system for data quality improvement and quantification with user-friendly interactive interfaces. Specifically, the interactive modular rule filter module can devise customized rules according to explicit feedback. The debiased neural filter module builds the quality classification dataset in a negative-centric manner to remove the undesired bias. The adaptive document deduplication module could execute large-scale deduplication with limited memory resources. These three parts constitute the customized data curation module. And in the holistic data assessment module, a corpus can be assessed in local and global views, with three evaluation means including human, GPT-4, and heuristic metrics. We exhibit a complete process to use Oasis for the curation and assessment of pretraining data. In addition, an 800GB bilingual corpus curated by Oasis is publicly released.

Multi-Session Budget Optimization for Forward Auction-based Federated Learning

  • Authors: Authors: Xiaoli Tang, Han Yu
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2311.12548
  • Pdf link: https://arxiv.org/pdf/2311.12548
  • Abstract Auction-based Federated Learning (AFL) has emerged as an important research field in recent years. The prevailing strategies for FL model users (MUs) assume that the entire team of the required data owners (DOs) for an FL task must be assembled before training can commence. In practice, an MU can trigger the FL training process multiple times. DOs can thus be gradually recruited over multiple FL model training sessions. Existing bidding strategies for AFL MUs are not designed to handle such scenarios. Therefore, the problem of multi-session AFL remains open. To address this problem, we propose the Multi-session Budget Optimization Strategy for forward Auction-based Federated Learning (MultiBOS-AFL). Based on hierarchical reinforcement learning, MultiBOS-AFL jointly optimizes inter-session budget pacing and intra-session bidding for AFL MUs, with the objective of maximizing the total utility. Extensive experiments on six benchmark datasets show that it significantly outperforms seven state-of-the-art approaches. On average, MultiBOS-AFL achieves 12.28% higher utility, 14.52% more data acquired through auctions for a given budget, and 1.23% higher test accuracy achieved by the resulting FL model compared to the best baseline. To the best of our knowledge, it is the first budget optimization decision support method with budget pacing capability designed for MUs in multi-session forward auction-based federated learning

KNVQA: A Benchmark for evaluation knowledge-based VQA

  • Authors: Authors: Sirui Cheng, Siyu Zhang, Jiayi Wu, Muchen Lan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2311.12639
  • Pdf link: https://arxiv.org/pdf/2311.12639
  • Abstract Within the multimodal field, large vision-language models (LVLMs) have made significant progress due to their strong perception and reasoning capabilities in the visual and language systems. However, LVLMs are still plagued by the two critical issues of object hallucination and factual accuracy, which limit the practicality of LVLMs in different scenarios. Furthermore, previous evaluation methods focus more on the comprehension and reasoning of language content but lack a comprehensive evaluation of multimodal interactions, thereby resulting in potential limitations. To this end, we propose a novel KNVQA-Eval, which is devoted to knowledge-based VQA task evaluation to reflect the factuality of multimodal LVLMs. To ensure the robustness and scalability of the evaluation, we develop a new KNVQA dataset by incorporating human judgment and perception, aiming to evaluate the accuracy of standard answers relative to AI-generated answers in knowledge-based VQA. This work not only comprehensively evaluates the contextual information of LVLMs using reliable human annotations, but also further analyzes the fine-grained capabilities of current methods to reveal potential avenues for subsequent optimization of LVLMs-based estimators. Our proposed VQA-Eval and corresponding dataset KNVQA will facilitate the development of automatic evaluation tools with the advantages of low cost, privacy protection, and reproducibility. Our code will be released upon publication.

FedDRO: Federated Compositional Optimization for Distributionally Robust Learning

  • Authors: Authors: Prashant Khanduri, Chengyin Li, Rafi Ibn Sultan, Yao Qiang, Joerg Kliewer, Dongxiao Zhu
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2311.12652
  • Pdf link: https://arxiv.org/pdf/2311.12652
  • Abstract Recently, compositional optimization (CO) has gained popularity because of its applications in distributionally robust optimization (DRO) and many other machine learning problems. Large-scale and distributed availability of data demands the development of efficient federated learning (FL) algorithms for solving CO problems. Developing FL algorithms for CO is particularly challenging because of the compositional nature of the objective. Moreover, current state-of-the-art methods to solve such problems rely on large batch gradients (depending on the solution accuracy) not feasible for most practical settings. To address these challenges, in this work, we propose efficient FedAvg-type algorithms for solving non-convex CO in the FL setting. We first establish that vanilla FedAvg is not suitable to solve distributed CO problems because of the data heterogeneity in the compositional objective at each client which leads to the amplification of bias in the local compositional gradient estimates. To this end, we propose a novel FL framework FedDRO that utilizes the DRO problem structure to design a communication strategy that allows FedAvg to control the bias in the estimation of the compositional gradient. A key novelty of our work is to develop solution accuracy-independent algorithms that do not require large batch gradients (and function evaluations) for solving federated CO problems. We establish $\mathcal{O}(\epsilon^{-2})$ sample and $\mathcal{O}(\epsilon^{-3/2})$ communication complexity in the FL setting while achieving linear speedup with the number of clients. We corroborate our theoretical findings with empirical studies on large-scale DRO problems.

Hand-Eye Calibration

  • Authors: Authors: Radu Horaud, Fadi Dornaika
  • Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2311.12655
  • Pdf link: https://arxiv.org/pdf/2311.12655
  • Abstract Whenever a sensor is mounted on a robot hand it is important to know the relationship between the sensor and the hand. The problem of determining this relationship is referred to as hand-eye calibration, which is important in at least two types of tasks: (i) map sensor centered measurements into the robot workspace and (ii) allow the robot to precisely move the sensor. In the past some solutions were proposed in the particular case of a camera. With almost no exception, all existing solutions attempt to solve the homogeneous matrix equation AX=XB. First we show that there are two possible formulations of the hand-eye calibration problem. One formulation is the classical one that we just mentioned. A second formulation takes the form of the following homogeneous matrix equation: MY=M'YB. The advantage of the latter is that the extrinsic and intrinsic camera parameters need not be made explicit. Indeed, this formulation directly uses the 3 by 4 perspective matrices (M and M') associated with two positions of the camera. Moreover, this formulation together with the classical one cover a wider range of camera-based sensors to be calibrated with respect to the robot hand. Second, we develop a common mathematical framework to solve for the hand-eye calibration problem using either of the two formulations. We present two methods, (i) a rotation then translation and (ii) a non-linear solver for rotation and translation. Third, we perform a stability analysis both for our two methods and for the classical linear method developed. In the light of this comparison, the non-linear optimization method, that solves for rotation and translation simultaneously, seems to be the most robust one with respect to noise and to measurement errors.

From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design

  • Authors: Authors: Cyril Picard, Kristen M. Edwards, Anna C. Doris, Brandon Man, Giorgio Giannone, Md Ferdous Alam, Faez Ahmed
  • Subjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2311.12668
  • Pdf link: https://arxiv.org/pdf/2311.12668
  • Abstract Engineering Design is undergoing a transformative shift with the advent of AI, marking a new era in how we approach product, system, and service planning. Large language models have demonstrated impressive capabilities in enabling this shift. Yet, with text as their only input modality, they cannot leverage the large body of visual artifacts that engineers have used for centuries and are accustomed to. This gap is addressed with the release of multimodal vision language models, such as GPT-4V, enabling AI to impact many more types of tasks. In light of these advancements, this paper presents a comprehensive evaluation of GPT-4V, a vision language model, across a wide spectrum of engineering design tasks, categorized into four main areas: Conceptual Design, System-Level and Detailed Design, Manufacturing and Inspection, and Engineering Education Tasks. Our study assesses GPT-4V's capabilities in design tasks such as sketch similarity analysis, concept selection using Pugh Charts, material selection, engineering drawing analysis, CAD generation, topology optimization, design for additive and subtractive manufacturing, spatial reasoning challenges, and textbook problems. Through this structured evaluation, we not only explore GPT-4V's proficiency in handling complex design and manufacturing challenges but also identify its limitations in complex engineering design applications. Our research establishes a foundation for future assessments of vision language models, emphasizing their immense potential for innovating and enhancing the engineering design and manufacturing landscape. It also contributes a set of benchmark testing datasets, with more than 1000 queries, for ongoing advancements and applications in this field.

BundleMoCap: Efficient, Robust and Smooth Motion Capture from Sparse Multiview Videos

  • Authors: Authors: Georgios Albanis, Nikolaos Zioulis, Kostas Kolomvatsos
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2311.12679
  • Pdf link: https://arxiv.org/pdf/2311.12679
  • Abstract Capturing smooth motions from videos using markerless techniques typically involves complex processes such as temporal constraints, multiple stages with data-driven regression and optimization, and bundle solving over temporal windows. These processes can be inefficient and require tuning multiple objectives across stages. In contrast, BundleMoCap introduces a novel and efficient approach to this problem. It solves the motion capture task in a single stage, eliminating the need for temporal smoothness objectives while still delivering smooth motions. BundleMoCap outperforms the state-of-the-art without increasing complexity. The key concept behind BundleMoCap is manifold interpolation between latent keyframes. By relying on a local manifold smoothness assumption, we can efficiently solve a bundle of frames using a single code. Additionally, the method can be implemented as a sliding window optimization and requires only the first frame to be properly initialized, reducing the overall computational burden. BundleMoCap's strength lies in its ability to achieve high-quality motion capture results with simplicity and efficiency. More details can be found at https://moverseai.github.io/bundle/.

Convexifying Regulation Market Clearing of State-of-Charge Dependent Bid

  • Authors: Authors: Siying Li, Cong Chen, Lang Tong
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2311.12690
  • Pdf link: https://arxiv.org/pdf/2311.12690
  • Abstract We consider the problem of merchant storage participating in the regulation market with state-of-charge (SoC) dependent bids. Because storage can simultaneously provide regulation up and regulation down capacities, the market-clearing engine faces the computation challenge of evaluating storage costs under different regulation scenarios. One approach is to employ a bilevel optimization that minimizes the worst-case storage cost among all potential regulation events. However, subproblems of such a bilevel optimization are nonconvex, resulting in prohibitive computation challenges for the real-time clearing of the regulation market. We show that the complex nonconvex market clearing problem can be convexified by a simple restriction on the SoC-dependent bid, rendering the intractable market clearing computation to standard linear programs. Numerical simulations demonstrate that SoC-dependent bids satisfying the convexification conditions increase the profits of merchant storage owners by 12.32-77.38% compared with SoC-independent bids.

A Data-Driven Integrated Framework for Fast-Charging Facility Planning using Multi-Period Bi-Objective Optimization

  • Authors: Authors: Mingjia He, Panchamy Krishnakumari, Ding Luo, Jiaqi Chen
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2311.12700
  • Pdf link: https://arxiv.org/pdf/2311.12700
  • Abstract With the electrification in freight transportation, the availability of fast-charging facilities becomes essential to facilitate en-route charging for freight electric vehicles. Most studies focus on planning charging facilities based on mathematical modeling and hypothetical scenarios. This study aims to develop a data-driven integrated framework for fast-charging facility planning. By leveraging the highway traffic data, we extracted, analyzed, and compared spatial and temporal flow patterns of general traffic and freight traffic. Furthermore, graph theory-based network evaluation methods are employed to identify traffic nodes within the highway network that play a significant role in accommodating charging infrastructure. A candidate selection method is proposed to obtain potential deployment locations for charging stations and to-go chargers. Based on this, we present a multi-period bi-objective optimization model to provide optimal solutions for the placement of charging facilities, with the objectives of minimizing investment cost and maximizing demand coverage. The case study on the Amsterdam highway network shows how existing traffic data can be used to generate more realistic charging demand scenarios and how it can be integrated and evaluated within the optimization framework for facility planning. The study also shows that the proposed model can leverage the potential of early investment in improving the charging demand coverage.

Exploring Graph Classification Techniques Under Low Data Constraints: A Comprehensive Study

  • Authors: Authors: Kush Kothari, Bhavya Mehta, Reshmika Nambiar, Seema Shrawne
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2311.12737
  • Pdf link: https://arxiv.org/pdf/2311.12737
  • Abstract This survey paper presents a brief overview of recent research on graph data augmentation and few-shot learning. It covers various techniques for graph data augmentation, including node and edge perturbation, graph coarsening, and graph generation, as well as the latest developments in few-shot learning, such as meta-learning and model-agnostic meta-learning. The paper explores these areas in depth and delves into further sub classifications. Rule based approaches and learning based approaches are surveyed under graph augmentation techniques. Few-Shot Learning on graphs is also studied in terms of metric learning techniques and optimization-based techniques. In all, this paper provides an extensive array of techniques that can be employed in solving graph processing problems faced in low-data scenarios.

Learn to Augment Network Simulators Towards Digital Network Twins

  • Authors: Authors: Yuru Zhang, Ming Zhao, Qiang Liu
  • Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2311.12745
  • Pdf link: https://arxiv.org/pdf/2311.12745
  • Abstract Digital network twin (DNT) is a promising paradigm to replicate real-world cellular networks toward continual assessment, proactive management, and what-if analysis. Existing discussions have been focusing on using only deep learning techniques to build DNTs, which raises widespread concerns regarding their generalization, explainability, and transparency. In this paper, we explore an alternative approach to augment network simulators with context-aware neural agents. The main challenge lies in the non-trivial simulation-to-reality (sim-to-real) discrepancy between offline simulators and real-world networks. To solve the challenge, we propose a new learn-to-bridge algorithm to cost-efficiently bridge the sim-to-real discrepancy in two alternative stages. In the first stage, we select states to query performances in real-world networks by using newly-designed cost-aware Bayesian optimization. In the second stage, we train the neural agent to learn the state context and bridge the probabilistic discrepancy based on Bayesian neural networks (BNN). In addition, we build a small-scale end-to-end network testbed based on OpenAirInterface RAN and Core with USRP B210 and a smartphone, and replicate the network in NS-3. The evaluation results show that, our proposed solution substantially outperforms existing methods, with more than 92% reduction in the sim-to-real discrepancy.

Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatially Relation Matching

  • Authors: Authors: Meng Chu, Zhedong Zheng, Wei Ji, Tat-Seng Chua
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
  • Arxiv link: https://arxiv.org/abs/2311.12751
  • Pdf link: https://arxiv.org/pdf/2311.12751
  • Abstract Drone navigation through natural language commands remains a significant challenge due to the lack of publicly available multi-modal datasets and the intricate demands of fine-grained visual-text alignment. In response to this pressing need, we present a new human-computer interaction annotation benchmark called GeoText-1652, meticulously curated through a robust Large Language Model (LLM)-based data generation framework and the expertise of pre-trained vision models. This new dataset seamlessly extends the existing image dataset, \ie, University-1652, with spatial-aware text annotations, encompassing intricate image-text-bounding box associations. Besides, we introduce a new optimization objective to leverage fine-grained spatial associations, called blending spatial matching, for region-level spatial relation matching. Extensive experiments reveal that our approach maintains an exceptional recall rate under varying description complexities. This underscores the promising potential of our approach in elevating drone control and navigation through the seamless integration of natural language commands in real-world scenarios.

Estimating time of arrival of vehicle fleets with GCN based traffic prediction

  • Authors: Authors: Shivika Sharma, Nandini Mawane, Dhruthick Gowda M, Mayur Taware, Chetan Kumar, Yash Chandrashekhar Dixit, Rakshit Ramesh
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2311.12758
  • Pdf link: https://arxiv.org/pdf/2311.12758
  • Abstract This paper presents an effective framework for estimating time of arrival of vehicles (buses) in an Intelligent Transit Management System (ITMS) having sparse position updates. Our contributions towards this is firstly in implementing a constrained optimization based road linestring segmenting framework ensuring ideal segment lengths and segments with sufficient density of vehicle position measurements which will result in valid statistics for scenarios involving sparse position measurements. Over this we propose a comprehensive approach for predicting traffic delays and estimated time of vehicle arrival addressing both the spatial and temporal dependencies of traffic. The traffic delay model is built on top of the T-GCN architecture on which we optimally augment an adjacency matrix which models a complexly connected road network considering the degree of influence between road segments, enabling the traffic delay model to look beyond physical road connectivity in predicting traffic delays and therefore producing better estimates of arrival times to points along the designated route of the vehicles.

The T-Complexity Costs of Error Correction for Control Flow in Quantum Computation

  • Authors: Authors: Charles Yuan, Michael Carbin
  • Subjects: Programming Languages (cs.PL); Quantum Physics (quant-ph)
  • Arxiv link: https://arxiv.org/abs/2311.12772
  • Pdf link: https://arxiv.org/pdf/2311.12772
  • Abstract Numerous quantum algorithms require the use of quantum error correction to overcome the intrinsic unreliability of physical qubits. However, error correction imposes a unique performance bottleneck, known as T-complexity, that can make an implementation of an algorithm as a quantum program run more slowly than on idealized hardware. In this work, we identify that programming abstractions for control flow, such as the quantum if-statement, can introduce polynomial increases in the T-complexity of a program. If not mitigated, this slowdown can diminish the computational advantage of a quantum algorithm. To enable reasoning about the costs of control flow, we present a cost model, using which a developer can analyze the T-complexity of a program under quantum error correction and pinpoint the sources of slowdown. We also present a set of program-level optimizations, using which a developer can rewrite a program to reduce its T-complexity, predict the T-complexity of the optimized program using the cost model, and then compile it to an efficient circuit via a straightforward strategy. We implement the program-level optimizations in Spire, an extension of the Tower quantum compiler. Using a set of 11 benchmark programs that use control flow, we show that the cost model is accurate, and that Spire's optimizations recover programs that are asymptotically efficient, meaning their runtime T-complexity under error correction is equal to their time complexity on idealized hardware. Our results show that optimizing a program before it is compiled to a circuit can yield better results than compiling the program to an inefficient circuit and then invoking a quantum circuit optimizer found in prior work. For our benchmarks, only 2 of 8 existing circuit optimizers recover circuits with asymptotically efficient T-complexity. Compared to these 2 optimizers, Spire uses 54x to 2400x less compile time.

SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering

  • Authors: Authors: Antoine Guédon, Vincent Lepetit
  • Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2311.12775
  • Pdf link: https://arxiv.org/pdf/2311.12775
  • Abstract We propose a method to allow precise and extremely fast mesh extraction from 3D Gaussian Splatting. Gaussian Splatting has recently become very popular as it yields realistic rendering while being significantly faster to train than NeRFs. It is however challenging to extract a mesh from the millions of tiny 3D gaussians as these gaussians tend to be unorganized after optimization and no method has been proposed so far. Our first key contribution is a regularization term that encourages the gaussians to align well with the surface of the scene. We then introduce a method that exploits this alignment to extract a mesh from the Gaussians using Poisson reconstruction, which is fast, scalable, and preserves details, in contrast to the Marching Cubes algorithm usually applied to extract meshes from Neural SDFs. Finally, we introduce an optional refinement strategy that binds gaussians to the surface of the mesh, and jointly optimizes these Gaussians and the mesh through Gaussian splatting rendering. This enables easy editing, sculpting, rigging, animating, compositing and relighting of the Gaussians using traditional softwares by manipulating the mesh instead of the gaussians themselves. Retrieving such an editable mesh for realistic rendering is done within minutes with our method, compared to hours with the state-of-the-art methods on neural SDFs, while providing a better rendering quality.

Finding Adversarial Inputs for Heuristics using Multi-level Optimization

  • Authors: Authors: Pooria Namyar, Behnaz Arzani, Ryan Beckett, Santiago Segarra, Himanshu Raj, Umesh Krishnaswamy, Ramesh Govindan, Srikanth Kandula
  • Subjects: Networking and Internet Architecture (cs.NI); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2311.12779
  • Pdf link: https://arxiv.org/pdf/2311.12779
  • Abstract Production systems use heuristics because they are faster or scale better than their optimal counterparts. Yet, practitioners are often unaware of the performance gap between a heuristic and the optimum or between two heuristics in realistic scenarios. We present MetaOpt, a system that helps analyze heuristics. Users specify the heuristic and the optimal (or another heuristic) as input, and MetaOpt automatically encodes these efficiently for a solver to find performance gaps and their corresponding adversarial inputs. Its suite of built-in optimizations helps it scale its analysis to practical problem sizes. To show it is versatile, we used MetaOpt to analyze heuristics from three domains (traffic engineering, vector bin packing, and packet scheduling). We found a production traffic engineering heuristic can require 30% more capacity than the optimal to satisfy realistic demands. Based on the patterns in the adversarial inputs MetaOpt produced, we modified the heuristic to reduce its performance gap by 12.5$\times$. We examined adversarial inputs to a vector bin packing heuristic and proved a new lower bound on its performance.

Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models

  • Authors: Authors: David Stotko, Nils Wandel, Reinhard Klein
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2311.12796
  • Pdf link: https://arxiv.org/pdf/2311.12796
  • Abstract 3D reconstruction of dynamic scenes is a long-standing problem in computer graphics and increasingly difficult the less information is available. Shape-from-Template (SfT) methods aim to reconstruct a template-based geometry from RGB images or video sequences, often leveraging just a single monocular camera without depth information, such as regular smartphone recordings. Unfortunately, existing reconstruction methods are either unphysical and noisy or slow in optimization. To solve this problem, we propose a novel SfT reconstruction algorithm for cloth using a pre-trained neural surrogate model that is fast to evaluate, stable, and produces smooth reconstructions due to a regularizing physics simulation. Differentiable rendering of the simulated mesh enables pixel-wise comparisons between the reconstruction and a target video sequence that can be used for a gradient-based optimization procedure to extract not only shape information but also physical parameters such as stretching, shearing, or bending stiffness of the cloth. This allows to retain a precise, stable, and smooth reconstructed geometry while reducing the runtime by a factor of 400-500 compared to $\phi$-SfT, a state-of-the-art physics-based SfT approach.

Keyword: adam

Hyena: Optimizing Homomorphically Encrypted Convolution for Private CNN Inference

  • Authors: Authors: Hyeri Roh, Woo-Seok Choi
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2311.12519
  • Pdf link: https://arxiv.org/pdf/2311.12519
  • Abstract Processing convolution layers remains a huge bottleneck for private deep convolutional neural network (CNN) inference for large datasets. To solve this issue, this paper presents a novel homomorphic convolution algorithm that provides speedup, communication cost, and storage saving. We first note that padded convolution provides the advantage of model storage saving, but it does not support channel packing, thereby increasing the amount of computation and communication. We address this limitation by proposing a novel plaintext multiplication algorithm using the Walsh-Hadamard matrix. Furthermore, we propose the optimization techniques to significantly reduce the latency of the proposed convolution by selecting the optimal encryption parameters and applying lazy reduction. It achieves 1.6-3.8x speedup and reduces the weight storage by 2000-8000x compared to the conventional convolution. When the proposed convolution is employed for CNNs like VGG-16, ResNet-20, and MobileNetV1 on ImageNet, it reduces the end-to-end latency by 1.3-2.6x, the memory usage by 2.1-7.9x and communication cost by 1.7-2.0x compared to conventional method.

Keyword: gradient

The limitation of neural nets for approximation and optimization

  • Authors: Authors: Tommaso Giovannelli, Oumaima Sohab, Luis Nunes Vicente
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2311.12253
  • Pdf link: https://arxiv.org/pdf/2311.12253
  • Abstract We are interested in assessing the use of neural networks as surrogate models to approximate and minimize objective functions in optimization problems. While neural networks are widely used for machine learning tasks such as classification and regression, their application in solving optimization problems has been limited. Our study begins by determining the best activation function for approximating the objective functions of popular nonlinear optimization test problems, and the evidence provided shows that~SiLU has the best performance. We then analyze the accuracy of function value, gradient, and Hessian approximations for such objective functions obtained through interpolation/regression models and neural networks. When compared to interpolation/regression models, neural networks can deliver competitive zero- and first-order approximations (at a high training cost) but underperform on second-order approximation. However, it is shown that combining a neural net activation function with the natural basis for quadratic interpolation/regression can waive the necessity of including cross terms in the natural basis, leading to models with fewer parameters to determine. Lastly, we provide evidence that the performance of a state-of-the-art derivative-free optimization algorithm can hardly be improved when the gradient of an objective function is approximated using any of the surrogate models considered, including neural networks.

Federated Learning via Consensus Mechanism on Heterogeneous Data: A New Perspective on Convergence

  • Authors: Authors: Shu Zheng, Tiandi Ye, Xiang Li, Ming Gao
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2311.12358
  • Pdf link: https://arxiv.org/pdf/2311.12358
  • Abstract Federated learning (FL) on heterogeneous data (non-IID data) has recently received great attention. Most existing methods focus on studying the convergence guarantees for the global objective. While these methods can guarantee the decrease of the global objective in each communication round, they fail to ensure risk decrease for each client. In this paper, to address the problem,we propose FedCOME, which introduces a consensus mechanism to enforce decreased risk for each client after each training round. In particular, we allow a slight adjustment to a client's gradient on the server side, which generates an acute angle between the corrected gradient and the original ones of other clients. We theoretically show that the consensus mechanism can guarantee the convergence of the global objective. To generalize the consensus mechanism to the partial participation FL scenario, we devise a novel client sampling strategy to select the most representative clients for the global data distribution. Training on these selected clients with the consensus mechanism could empirically lead to risk decrease for clients that are not selected. Finally, we conduct extensive experiments on four benchmark datasets to show the superiority of FedCOME against other state-of-the-art methods in terms of effectiveness, efficiency and fairness. For reproducibility, we make our source code publicly available at: \url{https://github.com/fedcome/fedcome}.

Post-Training Quantization with Low-precision Minifloats and Integers on FPGAs

  • Authors: Authors: Shivam Aggarwal, Alessandro Pappalardo, Hans Jakob Damsgaard, Giuseppe Franco, Thomas B. Preußer, Michaela Blott, Tulika Mitra
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Machine Learning (cs.LG); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2311.12359
  • Pdf link: https://arxiv.org/pdf/2311.12359
  • Abstract Post-Training Quantization (PTQ) is a powerful technique for model compression, reducing the precision of neural networks without additional training overhead. Recent works have investigated adopting 8-bit floating-point quantization (FP8) in the context of PTQ for model inference. However, the exploration of floating-point formats smaller than 8 bits and their comparison with integer quantization remains relatively limited. In this work, we present minifloats, which are reduced-precision floating-point formats capable of further reducing the memory footprint, latency, and energy cost of a model while approaching full-precision model accuracy. Our work presents a novel PTQ design-space exploration, comparing minifloat and integer quantization schemes across a range of 3 to 8 bits for both weights and activations. We examine the applicability of various PTQ techniques to minifloats, including weight equalization, bias correction, SmoothQuant, gradient-based learned rounding, and the GPTQ method. Our experiments validate the effectiveness of low-precision minifloats when compared to their integer counterparts across a spectrum of accuracy-precision trade-offs on a set of reference deep learning vision workloads. Finally, we evaluate our results against an FPGA-based hardware cost model, showing that integer quantization often remains the Pareto-optimal option, given its relatively smaller hardware resource footprint.

Infinite forecast combinations based on Dirichlet process

  • Authors: Authors: Yinuo Ren, Feng Li, Yanfei Kang
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2311.12379
  • Pdf link: https://arxiv.org/pdf/2311.12379
  • Abstract Forecast combination integrates information from various sources by consolidating multiple forecast results from the target time series. Instead of the need to select a single optimal forecasting model, this paper introduces a deep learning ensemble forecasting model based on the Dirichlet process. Initially, the learning rate is sampled with three basis distributions as hyperparameters to convert the infinite mixture into a finite one. All checkpoints are collected to establish a deep learning sub-model pool, and weight adjustment and diversity strategies are developed during the combination process. The main advantage of this method is its ability to generate the required base learners through a single training process, utilizing the decaying strategy to tackle the challenge posed by the stochastic nature of gradient descent in determining the optimal learning rate. To ensure the method's generalizability and competitiveness, this paper conducts an empirical analysis using the weekly dataset from the M4 competition and explores sensitivity to the number of models to be combined. The results demonstrate that the ensemble model proposed offers substantial improvements in prediction accuracy and stability compared to a single benchmark model.

Cooperative RIS and STAR-RIS assisted mMIMO Communication: Analysis and Optimization

  • Authors: Authors: Anastasios Papazafeiropoulos, Ahmet M. Elbir, Pandelis Kourtessis, Ioannis Krikidis, Symeon Chatzinotas
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2311.12473
  • Pdf link: https://arxiv.org/pdf/2311.12473
  • Abstract Reconfigurable intelligent surface (RIS) has emerged as a cost-effective and promising solution to extend the wireless signal coverage and improve the performance via passive signal reflection. Different from existing works which do not account for the cooperation between RISs or do not provide full space coverage, we propose the marriage of cooperative double-RIS with simultaneously transmitting and reflecting RIS (STAR-RIS) technologies denoted as RIS/STAR-RIS under correlated Rayleigh fading conditions to assist the communication in a massive multiple-input multiple-output (mMIMO) setup. The proposed architecture is superior since it enjoys the benefits of the individual designs. We introduce a channel estimation approach of the cascaded channels with reduced overhead. Also, we obtain the deterministic equivalent (DE) of the downlink achievable sum spectral efficiency (SE) in closed form based on large-scale statistics. Notably, relied on statistical channel state information (CSI), we optimise both surfaces by means of the projected gradient ascent method (PGAM), and obtain the gradients in closed form. The proposed optimization achieves to maximise the sum SE of such a complex system, and has low complexity and low overhead since it can be performed at every several coherence intervals. Numerical results show the benefit of the proposed architecture and verify the analytical framework. In particular, we show that the RIS/STAR-RIS architecture outperforms the conventional double-RIS or its single-RIS counterparts.

Neural Network Pruning by Gradient Descent

  • Authors: Authors: Zhang Zhang, Ruyi Tao, Jiang Zhang
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2311.12526
  • Pdf link: https://arxiv.org/pdf/2311.12526
  • Abstract The rapid increase in the parameters of deep learning models has led to significant costs, challenging computational efficiency and model interpretability. In this paper, we introduce a novel and straightforward neural network pruning framework that incorporates the Gumbel-Softmax technique. This framework enables the simultaneous optimization of a network's weights and topology in an end-to-end process using stochastic gradient descent. Empirical results demonstrate its exceptional compression capability, maintaining high accuracy on the MNIST dataset with only 0.15% of the original network parameters. Moreover, our framework enhances neural network interpretability, not only by allowing easy extraction of feature importance directly from the pruned network but also by enabling visualization of feature symmetry and the pathways of information propagation from features to outcomes. Although the pruning strategy is learned through deep learning, it is surprisingly intuitive and understandable, focusing on selecting key representative features and exploiting data patterns to achieve extreme sparse pruning. We believe our method opens a promising new avenue for deep learning pruning and the creation of interpretable machine learning systems.

Differentiable Sampling of Categorical Distributions Using the CatLog-Derivative Trick

  • Authors: Authors: Lennert De Smet, Emanuele Sansone, Pedro Zuidberg Dos Martires
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2311.12569
  • Pdf link: https://arxiv.org/pdf/2311.12569
  • Abstract Categorical random variables can faithfully represent the discrete and uncertain aspects of data as part of a discrete latent variable model. Learning in such models necessitates taking gradients with respect to the parameters of the categorical probability distributions, which is often intractable due to their combinatorial nature. A popular technique to estimate these otherwise intractable gradients is the Log-Derivative trick. This trick forms the basis of the well-known REINFORCE gradient estimator and its many extensions. While the Log-Derivative trick allows us to differentiate through samples drawn from categorical distributions, it does not take into account the discrete nature of the distribution itself. Our first contribution addresses this shortcoming by introducing the CatLog-Derivative trick - a variation of the Log-Derivative trick tailored towards categorical distributions. Secondly, we use the CatLog-Derivative trick to introduce IndeCateR, a novel and unbiased gradient estimator for the important case of products of independent categorical distributions with provably lower variance than REINFORCE. Thirdly, we empirically show that IndeCateR can be efficiently implemented and that its gradient estimates have significantly lower bias and variance for the same number of samples compared to the state of the art.

FedDRO: Federated Compositional Optimization for Distributionally Robust Learning

  • Authors: Authors: Prashant Khanduri, Chengyin Li, Rafi Ibn Sultan, Yao Qiang, Joerg Kliewer, Dongxiao Zhu
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2311.12652
  • Pdf link: https://arxiv.org/pdf/2311.12652
  • Abstract Recently, compositional optimization (CO) has gained popularity because of its applications in distributionally robust optimization (DRO) and many other machine learning problems. Large-scale and distributed availability of data demands the development of efficient federated learning (FL) algorithms for solving CO problems. Developing FL algorithms for CO is particularly challenging because of the compositional nature of the objective. Moreover, current state-of-the-art methods to solve such problems rely on large batch gradients (depending on the solution accuracy) not feasible for most practical settings. To address these challenges, in this work, we propose efficient FedAvg-type algorithms for solving non-convex CO in the FL setting. We first establish that vanilla FedAvg is not suitable to solve distributed CO problems because of the data heterogeneity in the compositional objective at each client which leads to the amplification of bias in the local compositional gradient estimates. To this end, we propose a novel FL framework FedDRO that utilizes the DRO problem structure to design a communication strategy that allows FedAvg to control the bias in the estimation of the compositional gradient. A key novelty of our work is to develop solution accuracy-independent algorithms that do not require large batch gradients (and function evaluations) for solving federated CO problems. We establish $\mathcal{O}(\epsilon^{-2})$ sample and $\mathcal{O}(\epsilon^{-3/2})$ communication complexity in the FL setting while achieving linear speedup with the number of clients. We corroborate our theoretical findings with empirical studies on large-scale DRO problems.

On the Out-of-Distribution Coverage of Combining Split Conformal Prediction and Bayesian Deep Learning

  • Authors: Authors: Paul Scemama, Ariel Kapusta
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2311.12688
  • Pdf link: https://arxiv.org/pdf/2311.12688
  • Abstract Bayesian deep learning and conformal prediction are two methods that have been used to convey uncertainty and increase safety in machine learning systems. We focus on combining Bayesian deep learning with split conformal prediction and how this combination effects out-of-distribution coverage; particularly in the case of multiclass image classification. We suggest that if the model is generally underconfident on the calibration set, then the resultant conformal sets may exhibit worse out-of-distribution coverage compared to simple predictive credible sets. Conversely, if the model is overconfident on the calibration set, the use of conformal prediction may improve out-of-distribution coverage. We evaluate prediction sets as a result of combining split conformal methods and neural networks trained with (i) stochastic gradient descent, (ii) deep ensembles, and (iii) mean-field variational inference. Our results suggest that combining Bayesian deep learning models with split conformal prediction can, in some cases, cause unintended consequences such as reducing out-of-distribution coverage.

LiFi: Learn to Incentivize Federated Learning in Automotive Edge Computing

  • Authors: Authors: Ming Zhao, Yuru Zhang, Qiang Liu
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2311.12720
  • Pdf link: https://arxiv.org/pdf/2311.12720
  • Abstract Federated learning (FL) is the promising privacy-preserve approach to continually update the central machine learning (ML) model (e.g., object detectors in edge servers) by aggregating the gradients obtained from local observation data in distributed connected and automated vehicles (CAVs). The incentive mechanism is to incentivize individual selfish CAVs to participate in FL towards the improvement of overall model accuracy. It is, however, challenging to design the incentive mechanism, due to the complex correlation between the overall model accuracy and unknown incentive sensitivity of CAVs, especially under the non-independent and identically distributed (Non-IID) data of individual CAVs. In this paper, we propose a new learn-to-incentivize algorithm to adaptively allocate rewards to individual CAVs under unknown sensitivity functions. First, we gradually learn the unknown sensitivity function of individual CAVs with accumulative observations, by using compute-efficient Gaussian process regression (GPR). Second, we iteratively update the reward allocation to individual CAVs with new sampled gradients, derived from GPR. Third, we project the updated reward allocations to comply with the total budget. We evaluate the performance of extensive simulations, where the simulation parameters are obtained from realistic profiling of the CIFAR-10 dataset and NVIDIA RTX 3080 GPU. The results show that our proposed algorithm substantially outperforms existing solutions, in terms of accuracy, scalability, and adaptability.

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

  • Authors: Authors: Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2311.12786
  • Pdf link: https://arxiv.org/pdf/2311.12786
  • Abstract Fine-tuning large pre-trained models has become the de facto strategy for developing both task-specific and general-purpose machine learning systems, including developing models that are safe to deploy. Despite its clear importance, there has been minimal work that explains how fine-tuning alters the underlying capabilities learned by a model during pretraining: does fine-tuning yield entirely novel capabilities or does it just modulate existing ones? We address this question empirically in synthetic, controlled settings where we can use mechanistic interpretability tools (e.g., network pruning and probing) to understand how the model's underlying capabilities are changing. We perform an extensive analysis of the effects of fine-tuning in these settings, and show that: (i) fine-tuning rarely alters the underlying model capabilities; (ii) a minimal transformation, which we call a 'wrapper', is typically learned on top of the underlying model capabilities, creating the illusion that they have been modified; and (iii) further fine-tuning on a task where such hidden capabilities are relevant leads to sample-efficient 'revival' of the capability, i.e., the model begins reusing these capability after only a few gradient steps. This indicates that practitioners can unintentionally remove a model's safety wrapper merely by fine-tuning it on a, e.g., superficially unrelated, downstream task. We additionally perform analysis on language models trained on the TinyStories dataset to support our claims in a more realistic setup.

Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models

  • Authors: Authors: David Stotko, Nils Wandel, Reinhard Klein
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2311.12796
  • Pdf link: https://arxiv.org/pdf/2311.12796
  • Abstract 3D reconstruction of dynamic scenes is a long-standing problem in computer graphics and increasingly difficult the less information is available. Shape-from-Template (SfT) methods aim to reconstruct a template-based geometry from RGB images or video sequences, often leveraging just a single monocular camera without depth information, such as regular smartphone recordings. Unfortunately, existing reconstruction methods are either unphysical and noisy or slow in optimization. To solve this problem, we propose a novel SfT reconstruction algorithm for cloth using a pre-trained neural surrogate model that is fast to evaluate, stable, and produces smooth reconstructions due to a regularizing physics simulation. Differentiable rendering of the simulated mesh enables pixel-wise comparisons between the reconstruction and a target video sequence that can be used for a gradient-based optimization procedure to extract not only shape information but also physical parameters such as stretching, shearing, or bending stiffness of the cloth. This allows to retain a precise, stable, and smooth reconstructed geometry while reducing the runtime by a factor of 400-500 compared to $\phi$-SfT, a state-of-the-art physics-based SfT approach.

Keyword: super-resolution

Efficient Model Agnostic Approach for Implicit Neural Representation Based Arbitrary-Scale Image Super-Resolution

  • Authors: Authors: Young Jae Oh, Jihun Kim, Tae Hyun Kim
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2311.12077
  • Pdf link: https://arxiv.org/pdf/2311.12077
  • Abstract Single image super-resolution (SISR) has experienced significant advancements, primarily driven by deep convolutional networks. Traditional networks, however, are limited to upscaling images to a fixed scale, leading to the utilization of implicit neural functions for generating arbitrarily scaled images. Nevertheless, these methodologies have imposed substantial computational demands as they involve querying every target pixel to a single resource-intensive decoder. In this paper, we introduce a novel and efficient framework, the Mixture of Experts Implicit Super-Resolution (MoEISR), which enables super-resolution at arbitrary scales with significantly increased computational efficiency without sacrificing reconstruction quality. MoEISR dynamically allocates the most suitable decoding expert to each pixel using a lightweight mapper module, allowing experts with varying capacities to reconstruct pixels across regions with diverse complexities. Our experiments demonstrate that MoEISR successfully reduces up to 73% in floating point operations (FLOPs) while delivering comparable or superior peak signal-to-noise ratio (PSNR).

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

  • Authors: Authors: Sang-Hoon Lee, Ha-Yeong Choi, Seung-Bin Kim, Seong-Whan Lee
  • Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2311.12454
  • Pdf link: https://arxiv.org/pdf/2311.12454
  • Abstract Large language models (LLM)-based speech synthesis has been widely adopted in zero-shot speech synthesis. However, they require a large-scale data and possess the same limitations as previous autoregressive speech models, including slow inference speed and lack of robustness. This paper proposes HierSpeech++, a fast and strong zero-shot speech synthesizer for text-to-speech (TTS) and voice conversion (VC). We verified that hierarchical speech synthesis frameworks could significantly improve the robustness and expressiveness of the synthetic speech. Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios. For text-to-speech, we adopt the text-to-vec framework, which generates a self-supervised speech representation and an F0 representation based on text representations and prosody prompts. Then, HierSpeech++ generates speech from the generated vector, F0, and voice prompt. We further introduce a high-efficient speech super-resolution framework from 16 kHz to 48 kHz. The experimental results demonstrated that the hierarchical variational autoencoder could be a strong zero-shot speech synthesizer given that it outperforms LLM-based and diffusion-based models. Moreover, we achieved the first human-level quality zero-shot speech synthesis. Audio samples and source code are available at https://github.com/sh-lee-prml/HierSpeechpp.

zoq avatar Nov 22 '23 07:11 zoq