TSP algorithms: 2-opt, 3-opt in python

2-opt and 3-opt algorithms are used to get approximative solution of the travelling salesman problem. In the previous post I explained what the TSP problem is and I also included the implementation of Christofides algorithm.

# 2-opt algorithm

2-opt algorithm is one of the most basic and widely used heuristic for obtaining approximative solution of TSP problem. 2-opt starts with random initial tour and it improves the tour incrementally by exchanging 2 edges in the tour with two other edges.

2-opt move

In each step, 2-opt algorithm deletes two edges (u_1, u_2) and (v_1, v_2), where u_1, v_1, u_2, v_2 are distinct , thus creating 2 subtours and reconnects the tour with edges (u_1, v_1) and (u_2, v_2) in case that the replacement reduces the length of the tour (Note: beware, that 2 edges cannot be incident to the same node). The final solution cost of the single 2-opt move can be expressed as \Delta_{ij} = c(u_1, u_2) + c(v_1, v_2) - (c(u_1, v_1) +. c(u_2, v_2))

Algorithm ends in the local optimum, in which no further improving step is found. One two-opt move has time complexity of \mathcal{O}(n^2). In one two opt move in the worst case we need to check for one broke edge e (n-2) other edges that could be broken to improve the tour - thus the \mathcal{O}(n). We need to check it for all the edges, therefore we get the equation n * \mathcal{O}(n) = \mathcal{O}(n^2).

3-opt algorithm

3-opt algorithm works in the similar fashion. Instead of breaking the tour by removing 2 edges, it removes three edges, therefore breaking the tour into 3 separate sub-tours that can be reconnected in 8 possible ways (including the original tour) as shown on the following figure.

Let's assume that we got a tour that we split up to 3 different subtours. A = (4,5), B=(6,1), C = (2,3). There are 8 possible ways how to reconnect the original subtour ABC.

 a)\ \ ABC,  b) \ \ A'BC,  c) \ \ ABC',  d) \ \ A'BC',  e) \ \ A'B'C,  f) \ \ AB'C,  g) \ \ AB'C',  h) \ \ A'B'C' (Note: Symbol ' defines the reversed segment).

 

3opt

As you can see, there exists only 4 3-opt moves, namely d) \ A'BC',  e) \ A'B'C, g) \ AB'C' and h) \ A'B'C',  a) is the original tour and the rest are 2-opt moves (one of three broken edges was reinserted back).

We will choose the best case from all 8 moves, particulary through one function:

def get_solution_cost_change(graph, route, case, i, j, k):
    A, B, C, D, E, F = route[i - 1], route[i], route[j - 1], route[j], route[k - 1], route[k % len(route)]

   if case == OptCase.opt_case_1:
        return 0
   elif case == OptCase.opt_case_2:
        # second case is the case A'BC
        return graph[A, B] + graph[E, F] - (graph[B, F] + graph[A, E])
    elif case == OptCase.opt_case_3:
        # ABC'
        return graph[C, D] + graph[E, F] - (graph[D, F] + graph[C, E])
    elif case == OptCase.opt_case_4:
        # A'BC'
        return graph[A, B] + graph[C, D] + graph[E, F] - (graph[A, D] + graph[B, F] + graph[E, C])
    elif case == OptCase.opt_case_5:
        # A'B'C
        return graph[A, B] + graph[C, D] + graph[E, F] - (graph[C, F] + graph[B, D] + graph[E, A])
    elif case == OptCase.opt_case_6:
        # AB'C
        return graph[B, A] + graph[D, C] - (graph[C, A] + graph[B, D])
    elif case == OptCase.opt_case_7:
        # AB'C'
        return graph[A, B] + graph[C, D] + graph[E, F] - (graph[B, E] + graph[D, F] + graph[C, A])
    elif case == OptCase.opt_case_8:
        # A'B'C'
        return graph[A, B] + graph[C, D] + graph[E, F] - (graph[A, D] + graph[C, F] + graph[B, E])

Then, after we get which case of reconnecting the tour is the best, we will need to reverse the segments accordingly:

if (i - 1) < (k % len(route)):
    first_segment = route[k% len(route):] + route[:i]
else:
    first_segment = route[k % len(route):i]
second_segment = route[i:j]
third_segment = route[j:k]

if case == OptCase.opt_case_1:
    # first case is the current solution ABC
    pass
elif case == OptCase.opt_case_2:
    # A'BC
    solution = list(reversed(first_segment)) + second_segment + third_segment
elif case == OptCase.opt_case_3:
    # ABC'
    solution = first_segment + second_segment + list(reversed(third_segment))
elif case == OptCase.opt_case_4:
    # A'BC'
    solution = list(reversed(first_segment)) + second_segment + list(reversed(third_segment))
elif case == OptCase.opt_case_5:
    # A'B'C
    solution = list(reversed(first_segment)) + list(reversed(second_segment)) + third_segment
elif case == OptCase.opt_case_6:
    # AB'C
    solution = first_segment + list(reversed(second_segment)) + third_segment
elif case == OptCase.opt_case_7:
    # AB'C'
    solution = first_segment + list(reversed(second_segment)) + list(reversed(third_segment))
elif case == OptCase.opt_case_8:
    # A'B'C
    solution = list(reversed(first_segment)) + list(reversed(second_segment)) + list(reversed(third_segment))
return solution

After that we reiterate the process until no improvements can be made.

Whole code can be found in Python on my github page. If you see any improvements can be made, feel free to post it into the comments or make PR directly!

 

 

PS: It is not in the scope of this post to describe the 3-opt fully. It is meant to provide the implementation and at least a little bit of description how the code is working. See [2] for more information on this topic.

Source:

[1] https://www.sciencedirect.com/science/article/abs/pii/S0377221799002842

[2] https://tsp-basics.blogspot.com/

Christofides algorithm in Python

Travelling salesman problem (a.k.a chinese postman) is a famous matematical problem firstly mentioned back in 1832.

Basically, it asks the following question: Given a list of cities and distance between them what is the shortest path to visit them all and return back to the original city?

Mathematical definition was laid by W.R. Hamilton and by the British mathematician Thomas Kirkman.

Christofides algorithm

Christofides algorithm is an approximative algorithm for finding solution for travelling salesman problem. Beware, this algorithm works only on instances, where the distances form a metric space. Therefore, it works only if the instance obeys the trinagle inequality and the distance matrix are symmetric.

Christofides algorithm has the best approximation ratio that has been proven so far. The solution provided by Christofides guarantees to be withing 3/2 of the optimum solution. Proof can be read on various other sources.

The first step of the Christofides algorithm is to find minimal spanning tree.

#1 Minimal spanning tree

Given an undirected connected graph G = (V, E) a spanning tree S of graph G is a subgraph of G that connects all vertices together without cycles with minimum possible total edge weight.

The most known two simple algorithms that can find MST are Kruskal and Prim. I implemented the latter, but nevertheless if you want to contribute to this open source project, feel free to make a pull request for Kruskal. Prim can be found here on github.

#2 Get odd degree vertices

From my point of view, this step is pretty clear. Find odd degree vertices and save them into a list.

#3 From odd degree vertices, make minimal-weight perfect matching

Third step of the Christofides algorithm is to find minimal-weight perfect matching. Matching M in a graph G is a set of edges E without common vertices. Perfect matching is a matching which matches all vertices of the graph. That also means that every vertex of graph is incident to exactly one edge of the matching M.

Only few of algorithms were published that can find minimal-weight perfect matching. One of them is original Edmond's Blossom algorithm [1]. There were multiple improvements done in that algorithm like Blossom IV by Cook and Rohe and Blossom V (as far as I know, this is the latest one) by Kolmogorov [2].

I tried to read the papers regarding those algorithms, but after long study I decided that I can use algorithm that is already in the networkx library called max_weight_matching. How? Simply negating the signs in the adjacency matrix. Note that this procedure works only if we are searching for maximum cardinality matching! Otherwise, the edges with negative weights would be simply ignored by the algorithm. Another approach would be to substract each weight from very large number.

#4 Add edges from 3/ into minimal spanning tree

The fourth step is to unite matching M and minimal spanning tree MST to form an Eulerian multigraph. Multigraph means that there may exists repetetive edges. NetworkX provides a nice data structure called Multigraph.

#5 Create euler tour

Again, there is no need to implement algorithm for finding euler tour from the scratch. Fortunately, networkx provide a method for that.

#6 Remove repeated vertices

The last step in the algorithm is to remove repeated vertices that were included in the euler tour. If path is a list then vertices may be removed this way:

path = list(dict.fromkeys(path).keys())
path.append(starting_node)

The whole code can be read on my github.

.

Sources:

[1]:  https://stanford.edu/~rezab/classes/cme323/S16/projects_reports/shoemaker_vare.pdf

[2]: https://pub.ist.ac.at/~vnk/papers/blossom5.pdf

Feature Selection process

#1 Motivation for use of feature selection methods

Feature selection process
Feature selection process

Nowadays, we live in world where smart-devices produces huge amount of data. In real-world applications we encounter a big issue known as Curse of dimensionality. That means, we can find easily datasets with high number of features without adequate number of observation points. That can (and probably will) cause over-fitting of the model.

To effectively implement data mining methods, feature selection methods are often used in preprocessing stage.

#2 What is feature selection?

Feature selection, also known as variable selection is the process of selecting subset of relevant features for use in the following model. They are used to speed up learning process and to improve model interpretability.

Note: Irrelevant features decrease prediction accuracy and simultaneously increase learning time (due to high dimensionality).

#3 Feature selection method approaches

FS methods are commonly divided into 3 main approaches:

  1. Filter methods have the smallest computational complexity. They select the subset of features regardless of model (classifier). The importance of variables are calculated by various statistical tests for their outcome with the output variable. State-of-art methods: Relief, mRMR. Filter methods can be then divided into 2 categories, the first one is univariate FS and the second one is multivariate FS methods. Univariate methods completely isolates features from each other and on the other hand multivariate techniques take into consideration also feature sependency.
  2. Wrapper methods tries to select a subset of features and train a model using them. After the model is trained, some conclusions are made and based on them we either remove/add some of the features from/to that subset. Wrapper methods are really computationally expensive, because the problem is basically doing exhaustive search over all subsets. The final subset is chosen as a subset with highest achieved score.
  3. Embedded methods are methods, that are trying to combine the best parts from Filter and Wrapper methods. It uses feature selection with classification simultaneously.

Notes:

 

 

Python ML classifiers: AdaBoost

Nowadays, there are a lot of classifiers, that performs very similarly and they more or less the same system resources. How do you decide which of these classifiers you should use? Is it AdaBoost, Naive-Bayes or other?

Well, the most obvious answer is that you have to try all of them and then pick the best one! (It varies with your dataset though).

In my next post I will describe Naive-Bayes classifier and I will provide a working example using sci-kit learn framework in python.


 

#1 AdaBoost: Short description

Adaboost is a shorter term for Adaptive Boosting. Boosting is one of the most powerful learning ideas, originally designed for classification problems.

Consider a two-class problem, with the output variable coded as Y = \{0,1\}. Given a vector of predictor variables X, a classifier C(X) returns prediction of one of the two values \{0,1\}. The error is defined as follows:

Err = \frac{1}{N} \sum_{i=1}^{n}I(y_i \neq C(x_i))

The boosting is an iterative procedure that combines the results of many weak classifiers (C_m(x), m = 1,2 \dots , M) on modified data to produce a powerful classifier. Starting with the unweighted training sample, AdaBoost builds a first weak classifier C_1(x), for example Decision Stump that produces class labels.

weak classifier is defined as a classifier whose error rate is slightly better than random pick. If at least one misclassification was produced by one of the classifiers,  the weight of that observation point is increased. Subsequently, the second classifier is build using the new weights. The predictions from all of them are then combined by a weighted majority voting technique, thus producing final prediction:

C(x) = sign(\sum_{i=1}^{M}{a_m C_m(x)})

, where a_1, a_2, \dots a_M are numbers returned by boosting algorithm to increase influence to the better classifiers in the sequence. Each boosting step consists of applying weights w_1, w_2, ..., w_N to each of the training data points (x_i, y_i), i = 1, 2, 3, ..., N).

 


#2 AdaBoost implementation in sci-kit learn

There exist a lot of different algorithms for AdaBoost but I have chosen AdaBoost SAMME that supports multi-class classification and it is already implemented in scikit-learn.

AdaBoostClassifier class actually doesn't require any parameters to run (it has only optional ones) and they are:

  • base_estimator: the default estimator is DecisionTreeClassifier (I called it Decision Stump in the previous section)
  • n_estimators: Mease and Wyner (2009) says, that 1000 estimators should be enough. I believe that sometimes you do not need that much of estimators. Personally I used 500 estimators in my latest simulations.
  • learning_rate: I haven't tried tuning this parameter, so the default is one. There is a minor trade-off between the number of estimators and learning rate. This parameters lower the significance of the contribution of each classifier.
  • algorithm: SAMME.R 
  • random_state:

The actual code is pretty simple:

X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
                           random_state=1, n_clusters_per_class=1)
X, X_test, y, y_test = train_test_split(X_full, y_full, train_size=0.8)
scaler = StandardScaler().fit(X_train, y_train)
clf = AdaBoostClassifier(n_estimators=100)
clf.fit(scaler.transform(X_train, y_train), y_train)
score = clf.score(scaler.transform(X_test, y_test)
print 'RESULTS AdaBoost: ' str(score)

As you can see, you don't need to tune parameters in comparison with, for example SVM (thus not needing to use GridSearchCV).