Studentische Arbeiten
Distributed Computation of the GIPS Framework for Virtual Network Embedding in Data Centers
Type of thesis:
Master-Thesis
State: Laufende Arbeiten
Possible start of thesis: 01.12.2024
Tutor: M.Sc. Maximilian Kratz
Student: Janik Stracke
Motivation
The increasing digitization in recent years means that processes in everyday professional and private life are being carried out more and more frequently via digital infrastructure.
To keep pace with the progress, developing new and flexible software solutions is necessary, for which the discipline of software engineering provides adequate methodologies and technologies.
Among those, the Model-driven Software Engineering (MdSE) approaches became increasingly popular in recent years to solve complex problems in various domains, such as consistency and model checks, resource allocation, or task scheduling.
The present state of digitization demands for huge capacities of computing power in data centers.
The cloud operators, e.g. Amazon Web Services, Google Cloud Platform or Microsoft Azure, meet this requirement by continuously installing new hardware in their data centers.
However, running applications right on the appliances leads to suboptimal resource utilization.
To overcome these shortages, Virtual Machines (VMs) and Virtual Networks (VNs) are introduced as an encapsulating middle layer to easily distribute and manage the load throughout the Substrate Machines (SMs) in the data center [7].
Any virtual appliances pose new challenges for us, demanding for the optimal placement of new VMs and VNs in the data center.
The associated resource allocation challenge is usually referred to as Virtual Network Embedding (VNE).
In a comprehensive analysis, it was shown that it is NP-complete and even efficiently finding approximate solutions is infeasible.
Model-driven Virtual Network Embedding (MdVNE) is only one of the broad variety of problems which the Graph-based (M)ILP Problem Specification (GIPS) framework aims to solve.
GIPS is an unopinionated tool to help modelling solutions, unaware and independent of their specific problem domain.
Therefore, it constructs an Integer Linear Programming (ILP) problem based on the graph-representation of the specific domain model.
GIPS also takes into account any global constraints placed on the model as well as an objective optimization function.
Using a general-purpose ILP solver, the corresponding ILP can be solved accordingly.
Afterwards, GIPS applies given Graph Transformation (GT) rules based on the results to transition the initial graph into the modified model.
Due to its NP-complete character, the execution time for GIPS increases exponentially with each additional VN to allocate.
Furthermore, scattering the VMs of a VN too far across a data center might unnecessarily increase the latency of any requests within the virtual network and therefore they should be placed together as close as possible.
Task
The GIPS framework uses a defined workflow for computing and solving the given graph-based ILP problem as depicted in the figure above.
In the beginning, it is initialized with the model and the domain-specific specification of the problem.
A pattern matcher selects those elements of the graph that are affected by any GT rule matches.
Then, the GIPS framework constructs the actual ILP problem and passes it to the ILP solver.
Finally, the returned solution determines how the GT component applies the rules to modify the model towards the optimal solution.
In an initial attempt, this workflow should be executed simultaneously on several GIPS instances, each of which only processes one sub-model.
Later, depending on the results, the workflow could be gradually extended to include first-class support for sub-problems and distributed computation.
Therefore, the extension points E1, E2, E3 and E4 should be introduced to the basic workflow.
Their respective locations are marked in the figure above.
- E1 First, an additional set of rules should be introduced into the GIPS specification to define the criteria for clustering nodes in sub-models.
- E2 Second, the model should be divided into several sub-models based on the rules introduced in the GIPS specification. The resulting sub-models are parts of the overall problem to which all specified global constraints and objectives are applied.
- E3 Third, instead of only one global GIPS instance, it should be possible to run several instances simultaneously, one for each sub-model. By operating only on a reduced graph set, the pattern matching could be accelerated. However, limiting the scatter of rule matches on the sub-model can impair the quality.
- E4 Last, the calculations of all the instances should be collected and a conclusive consensus should be drawn. Depending on the outcome, different solutions might be considered or discarded, depending on the overall objectives. Next steps as appropriate, e.g. recalculating based on the global model, should be taken to guarantee the validity of the solution w.r.t. to defined constraints. The presented concept is just one of various parallelization options and extension points. Depending on the results during implementation, other concepts are also conceivable.
Objectives
By extending the GIPS framework, it should be possible to split a given problem into smaller sub-problems and to distribute its computations across several nodes.
An examination of this approach should answer these questions:
- (RQ1) How does the quality of this distributed approach compare to a single-threaded instance?
- (RQ2) Which are the resulting performance implications, especially for execution time, compared to a single-threaded instance?
- (RQ3) How does the distributed approach scale for different distributions of SMs, VMs and sub-model sizes?
Preconditions
- Very good Java programming skills
- Previous knowledge of ILP
- Previous knowledge in the field of distributed systems
- Previous experience in model-driven software development (helpful but optional)