TSP algorithms: 2-opt, 3-opt in python

2-opt and 3-opt algorithms are used to get approximative solution of the travelling salesman problem. In the previous post I explained what the TSP problem is and I also included the implementation of Christofides algorithm.

# 2-opt algorithm

2-opt algorithm is one of the most basic and widely used heuristic for obtaining approximative solution of TSP problem. 2-opt starts with random initial tour and it improves the tour incrementally by exchanging 2 edges in the tour with two other edges.

2-opt move

In each step, 2-opt algorithm deletes two edges (u_1, u_2) and (v_1, v_2), where u_1, v_1, u_2, v_2 are distinct , thus creating 2 subtours and reconnects the tour with edges (u_1, v_1) and (u_2, v_2) in case that the replacement reduces the length of the tour (Note: beware, that 2 edges cannot be incident to the same node). The final solution cost of the single 2-opt move can be expressed as \Delta_{ij} = c(u_1, u_2) + c(v_1, v_2) - (c(u_1, v_1) +. c(u_2, v_2))

Algorithm ends in the local optimum, in which no further improving step is found. One two-opt move has time complexity of \mathcal{O}(n^2). In one two opt move in the worst case we need to check for one broke edge e (n-2) other edges that could be broken to improve the tour - thus the \mathcal{O}(n). We need to check it for all the edges, therefore we get the equation n * \mathcal{O}(n) = \mathcal{O}(n^2).

3-opt algorithm

3-opt algorithm works in the similar fashion. Instead of breaking the tour by removing 2 edges, it removes three edges, therefore breaking the tour into 3 separate sub-tours that can be reconnected in 8 possible ways (including the original tour) as shown on the following figure.

Let's assume that we got a tour that we split up to 3 different subtours. A = (4,5), B=(6,1), C = (2,3). There are 8 possible ways how to reconnect the original subtour ABC.

 a)\ \ ABC,  b) \ \ A'BC,  c) \ \ ABC',  d) \ \ A'BC',  e) \ \ A'B'C,  f) \ \ AB'C,  g) \ \ AB'C',  h) \ \ A'B'C' (Note: Symbol ' defines the reversed segment).



As you can see, there exists only 4 3-opt moves, namely d) \ A'BC',  e) \ A'B'C, g) \ AB'C' and h) \ A'B'C',  a) is the original tour and the rest are 2-opt moves (one of three broken edges was reinserted back).

We will choose the best case from all 8 moves, particulary through one function:

def get_solution_cost_change(graph, route, case, i, j, k):
    A, B, C, D, E, F = route[i - 1], route[i], route[j - 1], route[j], route[k - 1], route[k % len(route)]

   if case == OptCase.opt_case_1:
        return 0
   elif case == OptCase.opt_case_2:
        # second case is the case A'BC
        return graph[A, B] + graph[E, F] - (graph[B, F] + graph[A, E])
    elif case == OptCase.opt_case_3:
        # ABC'
        return graph[C, D] + graph[E, F] - (graph[D, F] + graph[C, E])
    elif case == OptCase.opt_case_4:
        # A'BC'
        return graph[A, B] + graph[C, D] + graph[E, F] - (graph[A, D] + graph[B, F] + graph[E, C])
    elif case == OptCase.opt_case_5:
        # A'B'C
        return graph[A, B] + graph[C, D] + graph[E, F] - (graph[C, F] + graph[B, D] + graph[E, A])
    elif case == OptCase.opt_case_6:
        # AB'C
        return graph[B, A] + graph[D, C] - (graph[C, A] + graph[B, D])
    elif case == OptCase.opt_case_7:
        # AB'C'
        return graph[A, B] + graph[C, D] + graph[E, F] - (graph[B, E] + graph[D, F] + graph[C, A])
    elif case == OptCase.opt_case_8:
        # A'B'C'
        return graph[A, B] + graph[C, D] + graph[E, F] - (graph[A, D] + graph[C, F] + graph[B, E])

Then, after we get which case of reconnecting the tour is the best, we will need to reverse the segments accordingly:

if (i - 1) < (k % len(route)):
    first_segment = route[k% len(route):] + route[:i]
    first_segment = route[k % len(route):i]
second_segment = route[i:j]
third_segment = route[j:k]

if case == OptCase.opt_case_1:
    # first case is the current solution ABC
elif case == OptCase.opt_case_2:
    # A'BC
    solution = list(reversed(first_segment)) + second_segment + third_segment
elif case == OptCase.opt_case_3:
    # ABC'
    solution = first_segment + second_segment + list(reversed(third_segment))
elif case == OptCase.opt_case_4:
    # A'BC'
    solution = list(reversed(first_segment)) + second_segment + list(reversed(third_segment))
elif case == OptCase.opt_case_5:
    # A'B'C
    solution = list(reversed(first_segment)) + list(reversed(second_segment)) + third_segment
elif case == OptCase.opt_case_6:
    # AB'C
    solution = first_segment + list(reversed(second_segment)) + third_segment
elif case == OptCase.opt_case_7:
    # AB'C'
    solution = first_segment + list(reversed(second_segment)) + list(reversed(third_segment))
elif case == OptCase.opt_case_8:
    # A'B'C
    solution = list(reversed(first_segment)) + list(reversed(second_segment)) + list(reversed(third_segment))
return solution

After that we reiterate the process until no improvements can be made.

Whole code can be found in Python on my github page. If you see any improvements can be made, feel free to post it into the comments or make PR directly!



PS: It is not in the scope of this post to describe the 3-opt fully. It is meant to provide the implementation and at least a little bit of description how the code is working. See [2] for more information on this topic.


[1] https://www.sciencedirect.com/science/article/abs/pii/S0377221799002842

[2] https://tsp-basics.blogspot.com/

Christofides algorithm in Python

Travelling salesman problem (a.k.a chinese postman) is a famous matematical problem firstly mentioned back in 1832.

Basically, it asks the following question: Given a list of cities and distance between them what is the shortest path to visit them all and return back to the original city?

Mathematical definition was laid by W.R. Hamilton and by the British mathematician Thomas Kirkman.

Christofides algorithm

Christofides algorithm is an approximative algorithm for finding solution for travelling salesman problem. Beware, this algorithm works only on instances, where the distances form a metric space. Therefore, it works only if the instance obeys the trinagle inequality and the distance matrix are symmetric.

Christofides algorithm has the best approximation ratio that has been proven so far. The solution provided by Christofides guarantees to be withing 3/2 of the optimum solution. Proof can be read on various other sources.

The first step of the Christofides algorithm is to find minimal spanning tree.

#1 Minimal spanning tree

Given an undirected connected graph G = (V, E) a spanning tree S of graph G is a subgraph of G that connects all vertices together without cycles with minimum possible total edge weight.

The most known two simple algorithms that can find MST are Kruskal and Prim. I implemented the latter, but nevertheless if you want to contribute to this open source project, feel free to make a pull request for Kruskal. Prim can be found here on github.

#2 Get odd degree vertices

From my point of view, this step is pretty clear. Find odd degree vertices and save them into a list.

#3 From odd degree vertices, make minimal-weight perfect matching

Third step of the Christofides algorithm is to find minimal-weight perfect matching. Matching M in a graph G is a set of edges E without common vertices. Perfect matching is a matching which matches all vertices of the graph. That also means that every vertex of graph is incident to exactly one edge of the matching M.

Only few of algorithms were published that can find minimal-weight perfect matching. One of them is original Edmond's Blossom algorithm [1]. There were multiple improvements done in that algorithm like Blossom IV by Cook and Rohe and Blossom V (as far as I know, this is the latest one) by Kolmogorov [2].

I tried to read the papers regarding those algorithms, but after long study I decided that I can use algorithm that is already in the networkx library called max_weight_matching. How? Simply negating the signs in the adjacency matrix. Note that this procedure works only if we are searching for maximum cardinality matching! Otherwise, the edges with negative weights would be simply ignored by the algorithm. Another approach would be to substract each weight from very large number.

#4 Add edges from 3/ into minimal spanning tree

The fourth step is to unite matching M and minimal spanning tree MST to form an Eulerian multigraph. Multigraph means that there may exists repetetive edges. NetworkX provides a nice data structure called Multigraph.

#5 Create euler tour

Again, there is no need to implement algorithm for finding euler tour from the scratch. Fortunately, networkx provide a method for that.

#6 Remove repeated vertices

The last step in the algorithm is to remove repeated vertices that were included in the euler tour. If path is a list then vertices may be removed this way:

path = list(dict.fromkeys(path).keys())

The whole code can be read on my github.



[1]:  https://stanford.edu/~rezab/classes/cme323/S16/projects_reports/shoemaker_vare.pdf

[2]: https://pub.ist.ac.at/~vnk/papers/blossom5.pdf

Django with Docker and Gitlab Runner example

In the second post of this series (check the first one!) we will talk about technologies that will be used in our day-to-day development process. A friend of mine wanted to participate on this project and therefore I decided to set-up my old laptop to be a full-time development server that will be running the newest version of my application.


I would like to make a CI/CD pipeline that will look something like this:

  1. Run pip freeze >> requirements.txt only if you added/removed/changed versions of any libraries in your virtual environment,
  2. Push new commit into gitlab with some changes,
  3. Gitlab runner should automatically run tests against the changes,
  4. If the tests pass successfully, gitlab runner should build the docker image,
  5. Gitlab runner should deploy the new version (called staging deployment environment).


#1 Export all python libraries with their versions defined to the requirements.txt file

Let's say we added/removed/modified the libraries we use in the current project. For example, if I update Django library to the newer version in my virtual environment and use any function from that new version, my colleagues that don't have the updated version of the library will be unable to run the project. We can solve this issue with the of help pip freeze >> requirements.txt command.

This may be a reason why the development server won't work after commit. We need to avoid this!


Command pip freeze does one simple thing. It lists all installed libraries in the current virtual environment. The >> symbol in Linux means that the output of the previous command will be redirected into a file. In our case the file is called requirements.txt. Then we can push the updated file into git.


It is really easy to install all libraries from requirements.txt file. Just use the following command: pip install -r requirements.txt . We will use this command in dockerfile as you will see in a second.


#2 Create docker file

Docker is more or less a new container technology. This technology is beyond scope of this post, so if you don't understand what docker is and you would like to know something more just click here.


# Official Python image
FROM python:3.6

# create root directory for project, set the working directory and move all files
RUN mkdir /cryptoapp
WORKDIR /cryptoapp
ADD . /cryptoapp

# Web server will listen to this port

# Install all libraries we saved to requirements.txt file
RUN pip install -r requirements.txt

The FROM instruction tells docker which image should be used as base image. In this particular case, we specified python 3.6. Although don't forget to specify the version you've been using on your development machine.

The RUN instruction will execute any commands in a new layer on top of the current image and commit the results. That means every RUN command will create a new docker image that will be used in subsequent commands.

The ADD command copies files from local folder (in this case whole folder where Dockerfile is located) and copies it to filesystem of the image.

EXPOSE command informs the docker that the container will listen to port 8000.

Now let's move to create docker-compose.yml file.  My pick for database is postgresql so the first service will be db with image: postgres. The second service is django. The most important part is anything behind "command". python manage.py runserver starts python webserver.


version: "2"
    image: postgres
    build: .
    restart: always
    container_name: cryptoapp
    command: python manage.py runserver
      - "8000:8000"
      - db
#3 Set up gitlab runner

The last step to achieve our goal is to set up gitlab runner. Gitlab runner has its configuration stored in a file called "gitlab-ci.yml". Let's create the basic config (as you can see, docker-compose command is used, so you need to include the python package docker-compose in your requirements.txt):

  - test
  - build
  - deploy

  stage: test
  script: echo "Testing the app"

  stage: build
  - echo "Building the app"
  - docker-compose build

  stage: deploy
  - master
  - echo "Deploying the app"
  - docker-compose up -d

As you can see have 3 separate stages.

The first one is just a testing phase, currently just dummy (the script just prints out the "Testing the app" string. The second one is build phase, where we build docker image.

The third phase is the phase where magic happens, that means we run the docker container up and our server should be ready with the database turn on as well.


#4 Summary

To sum this up completely, we have working CI/CD pipeline for our project. After every commit to gitlab, this pipeline will start. All steps that gitlab runner makes are recoreded in .gitlab-ci.yml file.

01: Crypto dev info: Idea & Choosing technologies

#1 – A Short Introductory to Cryptocurrency

Bitcoin in the past few years has skyrocketed, reaching nearly $20,000 in value in late 2017. Today, it remains at a steady $7,000 but the value of digital assets has significantly gone up over the past few years. Cryptocurrency was invented by a person or team we only know as Satoshi Nakamoto. The official Wikipedia on Satoshi Nakamoto lists 13 different people as being the official creator for Bitcoin.


Not long into the creation of Bitcoin, Satoshi had vanished from the project and was never to be seen or heard from again. Since the creation of Bitcoin, cryptocurrency has surged well above anyone’s original expectations and since then, a lot of new digital assets have been created as well. We’ll discuss the top three digital currencies at this point in time.


#2 – The Top Three Most Famous Cryptocurrencies


#2.1 – Ethereum

Designed by Vitalik Buterin in 2013, Ethereum has risen quickly among the ranks of digital coins. In 2016, a failed DAO project resulted in Ethereum being separated into two different blockchains. There is now Ethereum Classic and Ethereum. Ethereum is still maintained by Vitalik Buterin but Ethereum Classic doesn’t seem to be going anywhere anytime soon with stagnant development.


 #2.2 – XRP

cryptocurrencyXRP was released in 2012 but today, the coin is officially owned by Ripple. Ripple today focuses more on payment processing services and financial institutions. If you want more information on Ripple, you can go to the following blog: https://xrphodor.wordpress.com/or you can go to the official Ripple website.


#2.3 – Bitcoin Cash

In 2017, a separate blockchain was created when a small team of developers wanted to increase the blocksize of Bitcoin and they wanted to change the initial code. As a result, a hard fork occurred and this split the coin into two different entities. Now, there’s the original Bitcoin and Bitcoin Cash.


#3 – An Overview of The Current Market Evaluation and Progress

For the evaluation of the market and cryptocurrency’s total assets, we can use Market Cap. Market Cap is simply a formula of the total number of coins and the price of each coin. Each individual coin has its own value and we can use Market Cap to evaluate this.


For example: If Bitcoin is currently valued at $7,376.41 (changes daily) and there’s 17,249,375 coins in circulation, then the total market cap of BTC is valued at $127,238,394,412.


As of 4/9/2018, the total market cap was at $240,129,298,509 which is a significant change from January of 2018 which had a total market cap of $800,000,000,000. The price change can be attributed to the fundamentals of FUD, which are Fear, Uncertainty and Doubt. This is still a huge price jump from 2013 though, because the total market cap back then was only $1,000,000,000.


#4 – Market Manipulation and ICO Frauds

However, with the success and rise of cryptocurrency, there’s bound to be some projects that are questionable and some that are downright scams. There’s been a lot of promising projects that turned out to either be not worth their own weight in salt or just outright scams. (Bitconnect comes to mind).


My application doesn’t have a name yet but my main mission is to provide development information for all crypto coins. Think of it as a general Wiki for everything cryptocurrency based with information such as how many commits are made, the number of contributors, the number of releases and more. For contributors, that doesn’t just mean the inventor, that means how many software engineers are on the project and how many people are actively working on it at the time being.


The main benefit of this project will be so that investors can look up information on a coin, learn a little about their history and decide whether or not this is a good investment for them. With any investment opportunity, you should always due your diligent research but this project will serve as a nice starting point.


If you have any new ideas or features you’d like to see in this project, please let me know and I will discuss these ideas with my team to see if we can bring them to life.


#5 – The Technology Involved

For the next few posts on this blog, I want to discuss the technology and fundamentals that’ll be going into this project. This will be to provide useful information to developers and people like minded who may be interested in the project. The backend will be programmed in Python with the Django framework. The UI is nice and clean, which was made possible thanks to the use of proper HTML formatting, clean CSS, clean JS scripting and the Bootstrap framework.

Python ML classifiers: AdaBoost

Nowadays, there are a lot of classifiers, that performs very similarly and they more or less the same system resources. How do you decide which of these classifiers you should use? Is it AdaBoost, Naive-Bayes or other?

Well, the most obvious answer is that you have to try all of them and then pick the best one! (It varies with your dataset though).

In my next post I will describe Naive-Bayes classifier and I will provide a working example using sci-kit learn framework in python.


#1 AdaBoost: Short description

Adaboost is a shorter term for Adaptive Boosting. Boosting is one of the most powerful learning ideas, originally designed for classification problems.

Consider a two-class problem, with the output variable coded as Y = \{0,1\}. Given a vector of predictor variables X, a classifier C(X) returns prediction of one of the two values \{0,1\}. The error is defined as follows:

Err = \frac{1}{N} \sum_{i=1}^{n}I(y_i \neq C(x_i))

The boosting is an iterative procedure that combines the results of many weak classifiers (C_m(x), m = 1,2 \dots , M) on modified data to produce a powerful classifier. Starting with the unweighted training sample, AdaBoost builds a first weak classifier C_1(x), for example Decision Stump that produces class labels.

weak classifier is defined as a classifier whose error rate is slightly better than random pick. If at least one misclassification was produced by one of the classifiers,  the weight of that observation point is increased. Subsequently, the second classifier is build using the new weights. The predictions from all of them are then combined by a weighted majority voting technique, thus producing final prediction:

C(x) = sign(\sum_{i=1}^{M}{a_m C_m(x)})

, where a_1, a_2, \dots a_M are numbers returned by boosting algorithm to increase influence to the better classifiers in the sequence. Each boosting step consists of applying weights w_1, w_2, ..., w_N to each of the training data points (x_i, y_i), i = 1, 2, 3, ..., N).


#2 AdaBoost implementation in sci-kit learn

There exist a lot of different algorithms for AdaBoost but I have chosen AdaBoost SAMME that supports multi-class classification and it is already implemented in scikit-learn.

AdaBoostClassifier class actually doesn't require any parameters to run (it has only optional ones) and they are:

  • base_estimator: the default estimator is DecisionTreeClassifier (I called it Decision Stump in the previous section)
  • n_estimators: Mease and Wyner (2009) says, that 1000 estimators should be enough. I believe that sometimes you do not need that much of estimators. Personally I used 500 estimators in my latest simulations.
  • learning_rate: I haven't tried tuning this parameter, so the default is one. There is a minor trade-off between the number of estimators and learning rate. This parameters lower the significance of the contribution of each classifier.
  • algorithm: SAMME.R 
  • random_state:

The actual code is pretty simple:

X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
                           random_state=1, n_clusters_per_class=1)
X, X_test, y, y_test = train_test_split(X_full, y_full, train_size=0.8)
scaler = StandardScaler().fit(X_train, y_train)
clf = AdaBoostClassifier(n_estimators=100)
clf.fit(scaler.transform(X_train, y_train), y_train)
score = clf.score(scaler.transform(X_test, y_test)
print 'RESULTS AdaBoost: ' str(score)

As you can see, you don't need to tune parameters in comparison with, for example SVM (thus not needing to use GridSearchCV).