Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (2024)

Fengxiao Tang
Central South University
tangfengxiao@csu.edu.cn Xiaonan Wang
Xinjiang University
107552304984@stu.xju.edu.cn Xun Yuan
Central South University
yuan.xun@csu.edu.cn Linfeng Luo
Central South University
luolinfeng@csu.edu.cn Ming Zhao
Central South University
meanzhao@csu.edu.cn Nei Kato
Tohoku University
kato@it.is.tohoku.ac.jp

Abstract

Network device and system health management is the foundation of modern network operations and maintenance. Traditional health management methods, relying on expert identification or simple rule-based algorithms, struggle to cope with the dynamic heterogeneous networks (DHNs) environment. Moreover, current state-of-the-art distributed anomaly detection methods, which utilize specific machine learning techniques, lack multi-scale adaptivity for heterogeneous device information, resulting in unsatisfactory diagnostic accuracy for DHNs. In this paper, we develop an LLM-assisted end-to-end intelligent network health management framework. The framework first proposes a Multi-Scale Semanticized Anomaly Detection Model (MSADM), incorporating semantic rule trees with an attention mechanism to address the multi-scale anomaly detection problem in DHNs. Secondly, a chain-of-thought-based large language model is embedded in downstream to adaptively analyze the fault detection results and produce an analysis report with detailed fault information and optimization strategies. Experimental results show that the accuracy of our proposed MSADM for heterogeneous network entity anomaly detection is as high as 91.31%.

1 Introduction

With the development of communication technology and unmanned control technology towards B5G/6G, dynamic heterogeneous networks (DHNs)[36] play an increasingly important role in many key areas such as emergency communication, transportation, and military administration[11]. As shown in Fig.1, DHNs consist of various types of communication devices such as base stations, drones, and mobile phones, which have been deployed in harsh and dynamically changing environments for long periods[30], are prone to various anomalies and faults[33]. Therefore, to enhance the availability and reliability of DHNs, it is essential to perform timely health management to detect network anomalies and diagnose network faults[8].

Modern health management is a comprehensive analysis technique that not only presents and visualizes anomalous data but also digs the fault type and reasons behind the abnormal data in the whole network, thus a series of decisions can be made to mitigate the problem[9].

A typical health management life cycle includes at least three phases: (1) Anomaly Detection[19]: Here, a monitor performs anomaly detection of multivariate time series data ( e.g., packet loss, byte error, etc.). (2) Fault Detection[17]: network managers (NMs) assess various aspects of the event and engage in several rounds of communication to pinpoint the cause of the anomaly. (3) Mitigation[1]: the NMs implement several actions to mitigate the incident and restore the health of the communication service. The accuracy of anomaly detection and fault detection is the foundation of the health management life cycle, however, the increasing variety and dynamicity in DHNs result in two key challenges of health management of DHNs[20]: 1. How to accurately infer faults through local information when global information is difficult to obtain in real-time. 2. How to accurately locate faults in heterogeneous devices with differences in information scale and fault mechanisms.

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (1)

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (2)

The traditional Bayesian-based health management methods are widely used in network fault detection, which establish connections between network anomalies and their root causes for performance diagnosis[2]. However, Bayesian methods rely on directed acyclic graphs that lack scalability, making them unsuitable for DHNs. Simultaneously, frequent changes in topology complicate the ability of traditional distributed anomaly detection algorithms to detect local or minor anomalies in DHNs[13].

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (3)

Recently, machine learning-based health management methods have been widely researched and recognized as state-of-the-art algorithms for network fault detection[23, 15, 32, 35, 5]. However, those machine learning-based algorithms either relay on global network information or ignore the nonuniformed Key Performance Indicators (KPIs) and state information of heterogeneous nodes. Besides, Those diagnostic algorithms do not cover the complete health management life cycle and still rely on NMs to perform manual troubleshooting to mitigate anomalies after detection, which not only fails to utilize anomaly data efficiently but also significantly increases the time and complexity of anomaly handling.

To address the above problems, we developed an LLM-assisted end-to-end intelligent network health management framework. In the framework, we first propose a Multi-Scale Semanticized Anomaly Detection Model (MSADM) to deal with uniformed KPIs and state information problems, and then integrate LLM to perform full life cycle end-to-end health management.

Unlike existing models that can only handle specific faults of specific devices, the MSADM incorporates multi-scale semantic rule trees with Transformer to unify and standardize abnormal text reports based on the different abnormal degrees of various nodes. Thus, the MSADM can be implemented in differential entities to automatically identify abnormal communication entities and generate unified and standardized expressions of abnormal information.

As shown in Fig.1, to perform end-to-end health management, we integrate LLM in the health management framework to cover the full life cycle and employ MSADM as the facilitating agent for the LLM. This strategic integration facilitates the collection and initial processing of abnormal data, thereby effectively preventing diagnostic errors caused by inconsistent data representations. This preliminary processing also significantly reduces the computational demands on LLM. As shown in Fig.2, the effectiveness of this approach is evident through the detailed diagnostic results generated by LLM. These results succinctly outline the abnormal status and potential causes for each network entity, underscoring the robust capability of our proposed health management program. The main contributions of this paper are summarized as follows:

•
We propose an end-to-end health management framework for DHNs. This framework manages network health through only local and neighboring information and covers the full stages of the health management life cycle, including anomaly detection, fault detection, and mitigation.
•
We propose a Multi-Scale Semanticized Anomaly Detection Model (MSADM) to deal with uniformed KPIs and state information problems. This model standardizes abnormal information from various DHNs equipment, addressing the inefficiencies inherent in traditional distributed anomaly detection information sharing.
•
We incorporate LLM into the network health management process to perform a full life cycle of end-to-end health management. By employing the thinking prompt method, LLM not only analyzes abnormal situations but also offers mitigation solutions.

2 Background and Motivation

In this section, we first review the current research status of anomaly detection models. We then identify the shortcomings and defects of existing methods in DHNs health management. Finally, we explore the potential benefits of integrating semantic work into the health management process of wireless heterogeneous networks.

2.1 Related Work

The traditional anomaly detection algorithm detects anomalies by monitoring wireless measurement data and comparing it with established norms [29]. However, this approach overly depends on expert annotations and proves both time-consuming and labor-intensive. Concurrently, researchers also attempt to validate their findings using both simulated and actual data. Yet, these studies typically rely on a single KPI, such as the call drop rate, to classify anomalies, thereby constraining diagnostic precision to a degree [14]. The Bayesian-based classification method, extensively explored in [2][3], uses probability and graph theories to correlate network anomalies with their root causes. Despite its widespread application, the efficacy of this method significantly hinges on a substantial corpus of historical anomaly data since the causal graphs it generates demand extensive prior knowledge. Moreover, the Bayesian approach faces challenges in scalability and adaptability, struggling to perform well in dynamic, heterogeneous wireless network environments.

Machine learning, recognized as a powerful analytical tool, can effectively mine and perceive potential information in data and sharply detect subtle changes in network status and KPIs, thus enabling faster and more precise network anomaly detection[23]. Researchers propose a diagnostic method based on a supervised genetic fuzzy algorithm[15]. This method employs a genetic algorithm to learn a fuzzy rule base from a combined dataset of simulated and real data containing 72 records. Its accuracy heavily relies on the labeled training set. The Deep Transformer-based temporal anomaly detection model, TranAD[32], incorporates an attention sequence encoder and leverages broader temporal trend knowledge to swiftly conduct anomaly detection. DCdetector[35] masters the representation of abnormal samples using a dual attention mechanism and contrastive learning. While machine learning methods have advanced in feature learning and enhanced their generalization capabilities, they face challenges in wireless networks. Abnormalities are sporadic, and scarce abnormal samples make the models prone to overfitting. Moreover, modeling only the entire network fails to adapt to dynamic DHNs.

Although research on distributed anomaly detection solutions is extensive[5], practical applications suffer due to inconsistent network entity feature representation, weakening detection capabilities[25]. Additionally, using machine learning to model each device alone is both time-consuming and labor-intensive. The models also struggle to capture the interactive information of communication devices. Additionally, existing distributed fault detection methods often consider abnormal situations as a whole, which neglects the specific abnormal representation of individual communication entities, thereby complicating the rapid detection of abnormal nodes by NMs.

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (4)

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (5)

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (8)

2.2 Problem Statement and Our Objectives

Within DHNs, the diverse range of communication devices poses challenges for domain experts in gathering data encompassing all device anomaly types for model training. Furthermore, these models typically lack autonomous learning capabilities. Consequently, the emergence of new communication devices or technologies within the network often detracts from the detection efficacy of the model, leading to performance degradation.

In addition to the aforementioned shortcomings, existing anomaly detection research often emphasizes enhancing detection accuracy or model interpretability. However, the comprehensive coverage of the entire health management life cycle is seldom taken into account. For anomalies detected by the model, the prevalent approach involves NMs extracting information and experience from satisfactorily resolved and archived cases (i.e. marked cases) to alleviate the anomalies[24]. Undoubtedly, this significantly diminishes the efficiency of anomaly mitigation.

We incorporate LLM into the health management life cycle, leveraging its reasoning capabilities to identify the root causes of abnormal situations, thereby furnishing NMs with end-to-end anomaly resolutions. Moreover, LLM’s learning capability enables rapid adaptation to new abnormal information from communication entities. To facilitate LLM in gathering anomaly information, we devised MSADM, deployed on communication entities to execute anomaly detection and information collection. Given the distributed deployment of MSADM, our scheme offers entity-level visibility, contrasting with prior distributed anomaly detection models. In the subsequent section, we will elaborate on our solution scheme in detail.

3 System Architecture

We have introduced an end-to-end health management scheme in DHNs. The Fig.3 displays the architecture of this scheme. An essential component of our solution involves processing time-series data from various devices through a rule base to generate a list of statuses with a uniform scale. We will further elaborate on the creation and use of rule base in (Section3.1). Once we establish the status list with unified scales, our MSADM can pinpoint anomalies using a built-in rule-enhanced transformer time-series classification model (Section3.2) and create anomaly descriptions by integrating semantic rule trees (Section3.3). Additionally, we have developed a statement processing structure equipped with prompts to support the LLM in analyzing these anomaly descriptions. This structure aids the LLM in identifying the causes of anomalies and devising mitigation strategies. The LLM’s output will act as the anomaly report for the network, which NMs will use to swiftly address the anomalies and ensure network health (Section3.4). Below, we provide a detailed introduction to each part of our scheme.

3.1 Construction of Rule Base

In this section, we present the packet loss rate (PLR) as an example to illustrate the shortcomings of existing distributed approaches. We compute a positively distributed interval for the average PLR over $T$ for all devices. Next, we insert the average value of each device into the interval, and its distribution appears in Fig.4. The distribution of PLR varies significantly across different devices, and if such a dataset is used for model training, the model will struggle to adapt to this scenario of anomalous performance with multi-scale devices. Fig.5 shows the change in anomaly detection accuracy for different devices before and after using the rule base. Next, we will provide a detailed description of the process for designing and using the rule base.

We analyze the KPIs[16] common to multiple devices within the simulated network and construct the rule base accordingly. A comprehensive list of KPI types and contents is detailed in appendixA. For each device type, we analyze the collected data to ascertain the distribution of each KPI across various dimensions. Subsequently, we compare the actual KPI changes for these devices against their respective distributions to pinpoint anomalous statuses.

We represent the network background information within $T$ under normal conditions as $\mathcal{N}_{normal}=(N_{f},~{}E_{f},~{}T)$ . $N_{f}$ denotes the attributes of the node itself, expressed as $N_{f}=\{~{}f_{N1},~{}f_{N2},~{}\ldots,~{}f_{Nn}\}$ , while $E_{f}$ represents the attributes of the communication link, similar to the node, and is given by $E_{f}=\{~{}f_{E1},~{}f_{E2},~{}\ldots,~{}f_{En}\}$ . $T$ indicates the period for recording network information.

We collected a substantial number of $\mathcal{N}_{normal}$ for hom*ogeneous entities to enhance our analysis. For each KPI, we calculate its average value (Avg, $F_{a}$ ), fluctuation value (Jitter, $F_{j}$ ), variance (Variance, $F_{v}$ ), and trend (Trend, $F_{t}$ ). The average represents the center or average of the dataset and aids in understanding the general performance level. The fluctuation value represents the dispersion or range of values in the dataset, calculated as the average of the differences between adjacent data points. Variance, the average of the squared differences of each data point from the mean, measures the extent to which individual data points deviate from the mean. Data trends describe the changes in data over time.

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (9)

We can readily compute the numerical distribution diagram of the first three dimensions, thereby getting a set of intervals $Dist$ that depicts the abnormality of the performance indicator. According to its distribution, the interval closer to the peak indicates that the dimensional data aligns more closely with normal data and should be considered more normal. As trend falls into categories such as rise, fall, fluctuation, etc. Its calculation is different. We assess the instantaneous performance and overall trend of the network based on the number of extreme points obtained. The data within $T$ is subdivided into n small periods $t$ . By obtaining the average value within each $t$ , the continuous time data is converted into discrete data values $v=\{~{}v_{1},~{}v_{2},~{}\ldots,~{}v_{n}\}$ .

3.2 Anomaly Information Learning and Detection

We have designed an anomaly detection architecture for KPIs time series data in MSADM. Fig.6 illustrates the structure of the anomaly detection model. In this framework, the time-series data initially passes through a convolutional layer that captures time-series features within a specific segment, followed by a two-layer converter to fully perceive changes in the KPIs. To enhance the model’s robustness, we have embedded a rule-filtered states list prior to the model entering the fully connected layer. Because our goal is for MSADM to recognize the anomaly type while performing anomaly detection, a four-layer fully connected network is employed. The first two layers sense the data association, while the latter two layers handle the detection and classification tasks. The remainder of this section details specific model design concepts.

For anomaly detection tasks, certain element fragments often harbor more anomaly information features. Convolutional Neural Networks (CNN) improve classification accuracy by extracting local features from time series[10]. However, the sequence of elements and their interdependencies are essential for time series analysis. While CNNs excel at focusing on local features, their capability to model global dependencies is comparatively limited[22]. In time series classification tasks that require a global perspective, this limitation may lead to a decrease in model accuracy.

The Transformer, via its self-attention mechanism, can process sequences of any length[21]. This feature efficiently captures global dependencies within sequences, effectively overcoming CNN’s limitations in global modeling.

After applying the rule-embedded transformer, we get the attention output $a$ . we incorporate the KPIs status list obtained through rule filtering into the model’s learning dimension. This status list aids the model in better distinguishing between abnormal and normal situations. Therefore, before inputting data into the FCL, we utilize the linear transformation function $f_{1}$ to combine the status representation $s$ with the attention output $a$ . The interactive representation of the KPIs statuses with the output of the attention mechanism $I_{sa}$ can be denoted as:

I_{sa}=f_{1}(W_{1}[s,~{}a]+b),

(4)

where $W_{1},b$ are trainable parameters. $f_{1}$ is the activation function, and we use $ReLU$ .

The fully connected layer gradually transforms the extracted features into classification probabilities that identify anomalies. Simultaneously, the model goes beyond merely outputting these probabilities; it also specifies the type of anomaly detection identifying the abnormal entity. Consequently, we have separated the fully connected layer at the end to acquire both anomaly detection results and anomaly types through distinct linear layers.

During training, given the dual tasks of classification and detection, we formulate the actual loss function as the summation of two cross-entropy loss functions. The loss5 is as follows:

\text{loss}=-\sum_{i}^{n}y_{ci}\log(p_{ci})-\sum_{i}^{n}y_{di}\log(p_{di}),

(5)

where the log function is the softmax activation function, ${y_{ci}}$ , $y_{di}$ is the actual value, ${p_{ci}}$ , ${p_{di}}$ is the predicted value, and n is the size of the output.

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (10)

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (11)

3.3 Semantic Rule Tree Structure

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (12)

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (13)

In the initial section, we obtain a list of statuses $S$ for the KPIs of the anomaly network entities, filtered according to predefined rules. Utilizing these status lists, MSADM generates detailed anomaly information reports for anomalous network entities via a semantic rule tree.

We explored logical semantics, distributed semantics, hybrid semantics for the NLG model, and a Knowledge Graph-based replication mechanism for sentence generation[4][18]. These models necessitate a large amount of high-quality textual training datasets. However, since our method generates sentences from a list of statuses, training becomes highly inefficient following a significant number of events, and the utterances produced are overly slow and filled with superfluous information. Moreover, the dataset requires expansion to train the model whenever a new description of an anomaly manifestation arises.

Our goal is to generate timely, accurate, and concise sentences. Therefore, we opted, after careful consideration, to employ a template-like approach to sentence generation. Given the limited variety of statuses in the status list, we chose to select words that correspond to the number of statuses for each KPIs evaluation dimension. Unlike traditional template-based approaches, we use a tree structure with a unique one-to-many configuration that effectively captures the abnormal statuses of KPIs under various evaluation metrics. This structure is not only highly flexible and extensible but also facilitates the future integration of new evaluation metrics and statuses. We employ this tree structure to generate sentences for each KPI, which are then compiled into the comprehensive anomaly reports.

As shown in Fig.7, we maintain a vocabulary describing KPIs performance metrics and KPIs status levels and a lexicalized tree adjoining grammar (LTAG) representing the lexicality of words. MSADM can utilize the evaluation dimensions of arbitrary KPIs as the root, connect syntactic trees to form the syntactic part of a sentence and construct a sentence tree by positioning fixed vocabulary in the leaves. Meanwhile, to further speed up the sentence generation, we added the pruning operation of words and LTAGs before sentence generation and tried to keep only the words related to the current KPIs.

The specific build process is as follows: MSADM traverses the sentence tree starting from the root, categorized by a KPIs type with a list of evaluated dimensions and statuses. Each traversal from the root to the leaves yields a semanticized description corresponding to the current KPIs statuses. Considering that actual KPIs data may be more precise than the status description, we incorporate a judgment call in the sentence generation process. When a KPI exhibits significant abnormalities, we add its actual values, such as mean, variance, and jitter, within the timeframe $T$ to enrich the information content of the sentence. The process is shown by algorithm 2.

1:Input:Words $WList$ ;Grammars $GRs$ ,KPIs $K$ ;

2: $R$ $\leftarrow$ pruneGrammar( $GRs$ );

3: $Ws$ $\leftarrow$ pruneWList( $WList$ , $R$ );

4: $sentenceT$ $\leftarrow$ $Tree()$ ; /* init sentence tree */

5:for $k\leftarrow$ to $K$ do

6:for $w\leftarrow$ to $Ws$ do

7: $isLexical$ , $index$ $\leftarrow$ $lexicalrequirements(w,R)$ ;

8:if $True$ == $isLexical$ then

9: $node$ $\leftarrow$ $Tree(w,R[index])$ ;

10: $sentenceT.append(node)$ ;

11:endif

12:endfor

13:endfor

14:return $sentenceT$ ;

After compiling all abnormal sentence expressions from a node and considering the input constraints of the LLM, we strike a balance between the simplicity of the report and the completeness of the information. We then assess the need to further refine the entity information collected in the sentences based on the report’s length and the severity of the KPIs anomalies. We use regular expressions to optimize the report content while ensuring that essential and critical anomaly information is retained.

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (15)

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (16)

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (17)

	Classification Accuracy	Detection Accuracy	Recall	FNR	FPR	Detection Time/ms
SR-CNN	59.36	87.88	94.48	5.52	52.78	2.69
CL-MPPCA	69.69	86.56	89.41	10.61	30.92	19.05
ANOMALYBERT	66.53	86.78	95.75	4.25	68.48	13.15
LSTM-transformer	72.02	88.87	96.10	3.89	55.74	25.21
MSADM	76.73	91.31	96.28	4.72	33.15	19.89

3.4 Information Integration

The LLM’s powerful natural language processing capabilities allow it to deeply understand semantic information and derive meaningful features and patterns[6]. Simultaneously, LLM’s continuous learning ability enables it to adapt and respond effectively to evolving event types, showcasing remarkable scalability and rapid adaptability in complex scenarios[28].

In the information integration phase, we compile the abnormal reports of communication entities within the DHNs and generate prompt text language that the LLM can understand, and tailor.

LLMs often struggle with complex and in-depth reasoning due to their reliance on patterns in data rather than true understanding, leading to difficulties in consistently generating accurate, contextually appropriate responses that require deep domain knowledge or logical consistency[34]. In our integration process, we have bootstrapped the LLM to assist in generating anomaly reports that better align with the requirements of NMs, based on the life cycle of health management.

The structure of the prompt is illustrated in Fig.8. We provide the model with context, questions, and options. The context enables the LLM to comprehend network anomaly information. The question addresses the needs of NMs, specifically the types of abnormalities that may occur and the associated mitigation plans. The option constrains the LLM’s inference results to the specified types of anomalies, thereby enhancing the accuracy of the inferences. Naturally, the options also include others.

Given that large models face input length limitations, the anomaly context must encompass all relevant information of abnormal entities within the local network at the time of the anomaly, a requirement that significantly exceeds the input capacity of the existing model. Consequently, the anomaly context cannot be directly embedded within the prompt text. We collate the collected contextual information regarding entity anomalies, utilize the abnormal status to pinpoint KPIs exhibiting significant abnormalities within network entities and provide a detailed description of such KPIs. Conversely, KPIs exhibiting minor abnormalities are summarized in a consolidated manner. Furthermore, we incorporate the abnormal detection results obtained in section3.2 into the report, thereby enriching the LLM with additional dimensions of information focus.

4 Experimentation

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (18)

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (19)

We implemented MSADM using Python 3.7 and Torch 1.13.1. Due to resource constraints, we utilized eight RTX4090 with 24G RAM on Ubuntu 22.04 for data simulation, model training, and testing. We executed the techniques and algorithms by the system architecture (Fig.6).

We employ NS-3[27] for network simulation. We simulated four different communication entities by varying the transmit power, bandwidth, and other configurations. Furthermore, we categorize network anomalies into six distinct categories and introduce these anomalies into the simulation. Additionally, we construct four diverse communication devices by adjusting parameters such as node bandwidth and movement speed (see appendixC for anomaly types). Subsequently, based on these devices, we build a heterogeneous network, inject network anomalies, and capture KPI changes. We accumulated a total of nearly 20,000 data entries across seven network scenarios, all of which were labeled. We release an open-source demo and dataset¹¹1Demo and Dataset: https://github.com/SmallFlame/MSADM of MSADM to illustrate this workflow.

We will evaluate our scheme from two perspectives to demonstrate its effectiveness. Firstly, we will illustrate the superior accuracy and efficiency of MSADM in anomaly detection models. Secondly, we will present the anomaly report, along with the diagnostic results and scheme descriptions provided by LLM, to verify the feasibility of our approach.

4.1 MSADM Evaluations

We surveyed several popular time series classification models that utilize various technologies. CL-MPPCA employs both neural networks and probabilistic clustering to enhance anomaly detection performance[31]. SR-CNN integrates SR and CNN models to boost the accuracy of time series anomaly detection[26]. AnomalyBERT, built on the Transformer architecture, is designed to discern temporal contexts and identify unnatural sequences [12]. LSTM-transformer introduces a novel hybrid architecture combining LSTM and Transformer, tailored for multi-task real-time prediction[7]. We compare these models with the anomaly detection module of MSADM. We will train these models using the same equipment and conduct a comprehensive comparison.

In Fig.9, the model’s evolution in classification accuracy, detection accuracy, and cross-entropy loss function is depicted over increasing iterations. Notably, our model consistently achieves the highest accuracy, ultimately converging to 91.3%. This figure marks an approximately 3% lead over the runner-up model, LSTM-transformer. Additionally, the Cross-Entropy loss of our model substantially surpasses that of other models upon final convergence.

In Table1, we conducted a comparative analysis between MSADM and various other models concerning fault detection accuracy, anomaly detection accuracy, detection recall rate (Recall), detection false negative rate (FNR), and detection false positive rate (FPR). We meticulously assessed performance across these metrics. Notably, we highlight the superior performance of MSADM, as indicated by the bold data for each metric. The conclusive findings demonstrate that MSADM surpasses other models across most performance indicators. It’s worth mentioning that the detection time, while marginally lower than the LSTM method lacking rule embedding, is attributed to the initial requirement of rule filtering.

The ROC curves represent the true positive rate (TPR) and false positive rate (FPR) under different threshold settings[2]. To compare the robustness and reliability of the models. We plotted the ROC curve. As shown in Fig.10, the ROC curve of the MSADM model is higher than other models most of the time, while the AUC of MSADM is 0.1 higher than the current hottest LSTM-transformer structure.

Due to the anomaly’s limited range of influence, enlarging the network size might result in overlooking the anomaly. Fig.11 illustrates the variation in model accuracy corresponding to changes in network size. In both scenarios with a small and large number of nodes, the MSADM model outperforms other models in both anomaly detection and classification accuracy.

Fig.12 illustrates the confusion matrix analysis of the anomaly detection results produced by our MSADM model on the test set. Fig.12 (a) primarily assesses the model’s accuracy in identifying various anomalies. The results underscore the model’s high accuracy across most anomaly-type classification tasks. Fig.12 (b) depicts the accuracy of anomaly detection. Our identification accuracy for abnormal samples reaches as high as 95%, implying that we can analyze and collect information from almost all abnormal network entities within the network structure. For normal samples that are incorrectly detected, because we gauge the degree of abnormality in the generation of abnormal reports, a large amount of minor abnormal information will not excessively consume abnormal reporting resources.

4.2 Semanticization Evaluations

In this section, we present the text generation component of MSADM to showcase the quality of our semantic generation. We will also highlight segments of the LLM output to underscore the benefits of our thought prompts in guiding LLM reasoning. Due to space constraints, we display only a portion of the anomaly report and LLM output, with the complete textual content available as chapters in the appendix.

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (20)

In the event of a node application crash, the current node becomes unable to request and respond to data packets due to application anomaly, yet it retains its functionality as a packet forwarding relay. We use this scenario as an example to demonstrate the practicality of our generated statements.

The results are depicted in Fig.13. We show a partial anomaly report generated by a single network entity when an anomaly occurs. This section includes descriptions of packet rates, bit error rates, and latencies, while also providing anomalies diagnosed by the model. It is evident from the report that the PLRs and the bit error rate of the nodes are notably high, whereas the PLR and the bit error rate of the communication link remain relatively unaffected, aligning with the observed real-world scenario. See the appendixD for the complete report.

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization (21)

We input the analyzed data from the collected reports into the LLM to generate relevant reports and conclusions. The solution produced by the LLM appears in Fig.14. By incorporating chain-of-thought-based prompts, the LLM assesses various factors that may have contributed to the anomaly, including software and hardware issues, as well as troubleshooting and resolution strategies. This exception report, enhanced by LLM’s insights, significantly surpasses traditional operations and maintenance documentation by reducing empiricism that leads to incorrect exception handling. At the same time, the anomaly solution enables NMs to rapidly mitigate anomalies and maintain network health. The comprehensive exception analysis report is detailed in the appendixE.

5 Discussion

We have illustrated the advantages of our scheme for assisting network operators with health management in DHNs. In this section, we explore potential future directions in conjunction with our scheme.

Modeling Stateful Behaviors:To better adapt to the diverse communicating entities in the DHNs, we deliberately made trade-offs to enhance the model’s scalability. Currently, we model KPIs commonly owned by each entity. However, this approach overlooks the intricate interactions between higher layers, such as the transport protocols they utilize, network layer TM mechanisms, and potential device interactions. A promising future direction involves leveraging MSADM to model the state behavior of higher-level network participants (e.g., Web Server,SQL Server), such as the application layer, and integrating them with our scheme to form a network for microservice architecture-based anomaly detection solutions.

Self-evolution of the LLM:In this article, we utilize LLM to generate the final anomaly inference results. However, this process is one-way and cannot provide feedback to the large model itself. In the future, we posit that the self-evolution method of the learning model can be employed to aid LLM in learning, enhancing, and self-evolving from the experiences it generates. Simultaneously, the evolved LLM can assist MSADM in augmenting and maintaining semantic rule trees to enrich the vocabulary and enhance the quality of the generated sentences.

6 Conclusion

We introduce semantic expression into wireless networks for the first time and develop an LLM-assisted end-to-end health management scheme for DHNs. Our model automatically processes collected anomaly data, predicts anomaly categories, and offers mitigation options. To address the inability of algorithms that depend on expert input or basic rule-based systems to adapt to multi-device environments, we propose the MSADM. MSADM utilizes a predefined rule base to monitor the state of entity communication KPIs, conducts anomaly detection and classification through a rule-enhanced Transformer structure, and produces unified and standardized textual representations of anomalies using a semantic rule tree. Furthermore, the inclusion of a chain-of-thought-based LLM in the diagnostic process not only enhances fault detection but also generates detailed reports that pinpoint faults and recommend optimization strategies. Experiments demonstrate that MSADM surpasses current mainstream models in anomaly detection accuracy. Additionally, the experimentally generated anomaly reports and solutions highlight our approach’s potential to boost the efficiency and accuracy of intelligent operations and maintenance analysis in distributed networks.

References

[1]Toufique Ahmed, Supriyo Ghosh, Chetan Bansal, Thomas Zimmermann, Xuchao Zhang, and Saravan Rajmohan.Recommending root-cause and mitigation steps for cloud incidents using large language models.In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1737–1749, 2023.
[2]Raquel Barco, Pedro Lázaro, Luis Díez, and Volker Wille.Continuous versus discrete model in autodiagnosis systems for wireless networks.IEEE Transactions on Mobile Computing, 7(6):673–681, 2008.
[3]Raquel Barco, Volker Wille, Luis Díez, and Matías Toril.Learning of model parameters for fault diagnosis in wireless networks.Wireless Networks, 16:255–271, 2010.
[4]Connor Baumler and Soumya Ray.Hybrid semantics for goal-directed natural language generation.In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1936–1946, 2022.
[5]Francesca Boem, AlexanderJ Gallo, DavideM Raimondo, and Thomas Parisini.Distributed fault-tolerant control of large-scale systems: An active fault diagnosis approach.IEEE Transactions on Control of Network Systems, 7(1):288–301, 2019.
[6]Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, JaredD Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, etal.Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020.
[7]Kangjie Cao, Ting Zhang, and Jueqiao Huang.Advanced hybrid lstm-transformer architecture for real-time multi-task prediction in engineering systems.Scientific Reports, 14(1):4890, 2024.
[8]Xuehan Chen, Jingjing Tan, Litian Kang, Fengxiao Tang, Ming Zhao, and Nei Kato.Frequency selective surface towards 6g communication systems: A contemporary survey.IEEE Communications Surveys & Tutorials, pages 1–1, 2024.
[9]Yinfang Chen, Huaibing Xie, Minghua Ma, YuKang, Xin Gao, Liu Shi, Yunjie Cao, Xuedong Gao, Hao Fan, Ming Wen, etal.Automatic root cause analysis via large language models for cloud incidents.In Proceedings of the Nineteenth European Conference on Computer Systems, pages 674–688, 2024.
[10]Jiezhu Cheng, Kaizhu Huang, and Zibin Zheng.Towards better forecasting by fusing near and distant future visions.In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34(04), pages 3593–3600, 2020.
[11]Samira Hayat, Evşen Yanmaz, and Raheeb Muzaffar.Survey on unmanned aerial vehicle networks for civil applications: A communications viewpoint.IEEE Communications Surveys & Tutorials, 18(4):2624–2661, 2016.
[12]Yungi Jeong, Eunseok Yang, JungHyun Ryu, Imseong Park, and Myungjoo Kang.Anomalybert: Self-supervised transformer for time series anomaly detection using data degradation scheme.arXiv preprint arXiv:2305.04468, 2023.
[13]MRuofan KarthikeyJini, SVWanithaDevi, JSrinivasan, Bing Arulpg, Wei Wei, Xiaolan Zhang, Xian Chen, Yaakov Bar-Shalom, and Peter Willett.Detecting node failures in mobile wireless networks: A probabilistic approachetecting node failures in mobile wireless networks: a probabilistic approach.IEEE Transath, Actions on Mobile Computing, 15(7):1647–1660, 2015.
[14]RanaM Khanafer, Beatriz Solana, Jordi Triola, Raquel Barco, Lars Moltsen, Zwi Altman, and Pedro Lazaro.Automated diagnosis for umts networks using bayesian network approach.IEEE Transactions on vehicular technology, 57(4):2451–2461, 2008.
[15]EmilJ Khatib, Raquel Barco, Ana Gómez-Andrades, and Inmaculada Serrano.Diagnosis based on genetic fuzzy algorithms for lte self-healing.IEEE Transactions on vehicular technology, 65(3):1639–1651, 2015.
[16]Slawomir Kukliński and Lechosław Tomaszewski.Key performance indicators for 5g network slicing.In 2019 IEEE conference on network softwarization (NetSoft), pages 464–471. IEEE, 2019.
[17]Chen Luo, Jian-Guang Lou, Qingwei Lin, Qiang Fu, Rui Ding, Dongmei Zhang, and Zhe Wang.Correlating events with time series for incident diagnosis.In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, Aug 2014.
[18]Ziyu Lyu, Yue Wu, Junjie Lai, Min Yang, Chengming Li, and Wei Zhou.Knowledge enhanced graph neural networks for explainable recommendation.IEEE Transactions on Knowledge and Data Engineering, page 1–1, Jan 2022.
[19]Minghua Ma, Shenglin Zhang, Junjie Chen, Jim Xu, Haozhe Li, Yongliang Lin, Xin Nie, BoZhou, Yong Wang, and Dan Pei.Jump-starting multivariate time series anomaly detection for online service systems.USENIX Annual Technical Conference,USENIX Annual Technical Conference, Jan 2021.
[20]STEINDER Malgorzata.A survey of fault localization techniques in computer networks.Elsevier Science of Computer Programming Journal, pages 165–194, 2004.
[21]Matthew Middlehurst, Patrick Schäfer, and Anthony Bagnall.Bake off redux: a review and experimental evaluation of recent time series classification algorithms.Data Mining and Knowledge Discovery, pages 1–74, 2024.
[22]Navid MohammadiFoumani, Lynn Miller, ChangWei Tan, GeoffreyI. Webb, Germain Forestier, and Mahsa Salehi.Deep learning for time series classification and extrinsic regression: A current survey.ACM Comput. Surv., 56(9), apr 2024.
[23]IsaacKofi Nti, JuanitaAhia Quarcoo, Justice Aning, and GodfredKusi Fosu.A mini-review of machine learning in big data analytics: Applications, challenges, and prospects.Big Data Mining and Analytics, 5(2):81–97, 2022.
[24]Gopika Premsankar, Mario DiFrancesco, and Tarik Taleb.Edge computing for the internet of things: A case study.IEEE Internet of Things Journal, 5(2):1275–1284, 2018.
[25]Bing Qian and Shun Lu.Detection of mobile network abnormality using deep learning models on massive network measurement data.Computer Networks, 201:108571, 2021.
[26]Hansheng Ren, Bixiong Xu, Yujing Wang, Chao Yi, Congrui Huang, Xiaoyu Kou, Tony Xing, Mao Yang, Jie Tong, and QiZhang.Time-series anomaly detection service at microsoft.In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 3009–3017, 2019.
[27]GeorgeF. Riley and ThomasR. Henderson.The ns-3 Network Simulator, pages 15–34.Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
[28]Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu AwalMd Shoeb, Abubakar Abid, Adam Fisch, AdamR Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, etal.Beyond the imitation game: Quantifying and extrapolating the capabilities of language models.arXiv preprint arXiv:2206.04615, 2022.
[29]Péter Szilágyi and Szabolcs Nováczki.An automatic detection and diagnosis framework for mobile communication systems.IEEE transactions on Network and Service Management, 9(2):184–197, 2012.
[30]Fengxiao Tang, Xuehan Chen, TiagoKoketsu Rodrigues, Ming Zhao, and Nei Kato.Survey on digital twin edge networks (diten) toward 6g.IEEE Open Journal of the Communications Society, 3:1360–1381, 2022.
[31]Shahroz Tariq, Sangyup Lee, Youjin Shin, MyeongShin Lee, Okchul Jung, Daewon Chung, and SimonS Woo.Detecting anomalies in space using multivariate convolutional lstm with mixtures of probabilistic pca.In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2123–2133, 2019.
[32]Shreshth Tuli, Giuliano Casale, and NicholasR Jennings.Tranad: Deep transformer networks for anomaly detection in multivariate time series data.arXiv preprint arXiv:2201.07284, 2022.
[33]Xianbin Wang, Jie Mei, Shuguang Cui, Cheng-Xiang Wang, and XueminSherman Shen.Realizing 6g: The operational goals, enabling technologies of future networks, and value-oriented intelligent multi-dimensional multiple access.IEEE Network, 37(1):10–17, Jan 2023.
[34]Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, EdChi, QuocV Le, and Denny Zhou.Chain-of-thought prompting elicits reasoning in large language models.In S.Koyejo, S.Mohamed, A.Agarwal, D.Belgrave, K.Cho, and A.Oh, editors, Advances in Neural Information Processing Systems, volume35, pages 24824–24837. Curran Associates, Inc., 2022.
[35]Yiyuan Yang, Chaoli Zhang, Tian Zhou, Qingsong Wen, and Liang Sun.Dcdetector: Dual attention contrastive representation learning for time series anomaly detection.In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3033–3045, 2023.
[36]Xun Yuan, Fengxiao Tang, Ming Zhao, and Nei Kato.Joint rate and coverage optimization for the thz/rf multi-band communications of space-air-ground integrated network in 6g.IEEE Transactions on Wireless Communications, pages 1–1, 2023.

Acknowledgments

Appendix A Evaluation of network attributes and performance metrics

Network Entity	Performance Indicators
Node Attributes	Packet Loss Rate
	Bit Error Rate
	Neighboring Nodes Number
	Routing Table Number
	Cache Size
Link Attributes	Packet Loss Rate
	Bit Error Rate
	Transmission Delay

We use KPIs from both communication nodes and communication links as rule-based filtering features and machine-learning features to detect and classify anomalies. The specific features considered are shown in Table 2 below.

Appendix B NETWORK Node PARAMETERS

Device Name	Transmitting Power	Bandwidth	Communication protocol	Range	Speed
Mobile Phone	23 dBm	20 MHz	LTE	200m	10 m/s
Vehicle	30 dBm	10 MHz	802.11p	200m	20 m/s
UAV	20 dBm	5 MHz	802.11AC	400m	15 m/s
Base Station	43 dBm	100 MHz	LTE	500m

On the network simulation platform ns-3, we designed and configured four different devices to build a virtual Ad-hoc network (refer to Table 3 for specific device configurations). This network consists of 9 to 20 nodes. We set a data collection duration of 30 seconds and defined a collection period of 200ms.

Appendix C Anomalies Categories

Layer	Name	Description
Application Layer	Application Download	Application failures can lead to the node’s incapability to request and respond to packets; however, it can still function as a relay station for packet forwarding, ensuring continuous network connectivity.
Application Layer	Malicious Traffic	The node sends and requests a large amount of data in a short period.
Transport Layer	Network Congestion	The traffic in the network exceeds the processing capacity of network devices or links.
Data Link Layer	Communication Obstacles	Obstacles obstructed the line of sight between nodes, causing the wireless transmission to be blocked.
Data Link Layer	Out-of-Range	Node mobility resulted in going out of communication range.
Physical Layer	Network Node Crash	Node lost complete network communication capability due to hardware failure.

When using traditional machine learning techniques for fault detection, we are particularly concerned with obtaining sufficient labeled negative samples. In the context of DHNs, there is a wide range of anomaly types. Therefore, a careful classification of common fault types is crucial. Table4 shows our final classification results for these anomaly types, which are seven in total.

Appendix D Complete Anomaly Report

The rest of this section shows the anomaly reports that our scenario outputs:

Current NetWork Context: The current node0 status is as follows:The packet loss rate shows a very high average value is 44.43%, with extremely volatile fluctuation and has fallen sharply and then rose trend. The information about the communication links of the current node is as follows:The current node may have a fault appdown! The current node1 status is as follows: The number of neighboring nodes shows serious over-average num minor fluctuation and has an up trend. The number of routing table caches shows serious over-average num minor fluctuation and has an up trend. The information about the communication links of the current node is as follows: The current node may fault maliciously! The current node2 status is as follows:The number of neighboring nodes shows serious over-average num minor fluctuation and has an up trend.The number of routing table caches shows serious average num minor fluctuation and has an up trend.The information about the communication links of the current node is as follows: The current node may have a fault appdown!

Questions: According to the preceding description, if similar historical fault information exists, identify the fault type and provide a solution.If no, identify the current fault type and provide the optimal solution. Select a fault type from the options. The fault type mentioned above may not be correct. Determine and confirm the fault according to the context’s information. If you have different views on the fault, state the cause.

Options: Please select the anomaly type that best matches the context’s performance from the following:a: Node Down; b: Malicious Traffic; c: Network Congestion; d: Communication Obstacles; e: Out-of-Range; f: Network Node Crash.

Appendix E Complete LLM output

The rest of this section shows the output of the LLM-based on the anomaly report:

E.1 Node Analysis

Based on the provided network context, here’s the detailed analysis:

Node0: Status: High packet loss rate averaging 44.43Communication Link Info: Indicates a possible fault for the app down.Analysis: The high packet loss rate and the indication of an appdown fault suggest a significant issue with the application layer on Node0. This could be due to software errors, resource limitations, or misconfigurations.Recommended Action: Investigate the application running on Node0, check for any software updates, review configurations, and allocate additional resources if necessary to mitigate packet loss.

Node1: Status: The number of neighboring nodes and routing table caches are significantly above average, with minor fluctuations and an upward trend.Communication Link Info: Possible fault for malicious activity.Analysis: The increasing number of neighboring nodes and routing table caches, along with the indication of malicious activity suggests a potential security breach or an attack targeting Node1.Recommended Action: Conduct a thorough security audit, implement intrusion detection systems, and strengthen network security measures to detect and mitigate malicious activity targeting Node1.

Node2: Status: Similar to Node1, with an increasing number of neighboring nodes and routing table caches showing minor fluctuations and an upward trend.Communication Link Info: Also indicates a possible fault for appdown.Analysis: The pattern of increasing neighboring nodes and routing table caches, along with the appdown fault indication, points to a potential issue similar to Node0, possibly related to application failures or disruptions.Recommended Action: Conduct a detailed investigation into the application running on Node2, review configurations, and address any software issues or resource constraints to prevent further disruptions.

E.2 Overall Network Analysis

The network is experiencing multiple issues across different nodes, including high packet loss rates, indications of malicious activity, and potential application failures.There might be underlying systemic issues affecting the stability and security of the network, such as insufficient resources, misconfigurations, or vulnerabilities.Addressing these issues promptly is crucial to prevent service disruptions, security breaches, and network performance degradation.

Recommended Actions:Perform comprehensive network diagnostics to identify and address underlying issues.Implement robust security measures to mitigate malicious activities, including firewalls, intrusion detection systems, and access controls.Optimize resource allocation and configurations to improve network performance and reliability.Regularly monitor and maintain the network to detect and prevent future issues.If historical fault information exists, it should be analyzed to identify patterns and potential root causes, aiding in developing more effective long-term solutions.