Colorless Green Ideas Sleep Furiously


Leave a comment

Package and deploy python code

Packaging and deploying python code is very easy.

I found this tutorial quite thorough and easy to follow.

The thing missing in this tutorial is that one feature of setuptools, e.g. entrypoint, which adds a runnable binary to the system path, is not mentioned.

Then refer to a more detailed document

Leave a comment

How to learn

Quite often I feel depressed when I find out that I keep forgetting stuff which I have spent a lot of time learning.  I know that human beings are created this way. We can’t remember things forever unless you keep repeating. People even forget mother languages when they stop using them for ages. But I still feel depressed. I have a feeling that how I can progress if I keep throwing valuable things away when I am learning new knowledge.

I feel the same way when I learn a new language. But I normally use to collect all of the phrases and words l have learnt. And I review them from time to time. In this way I find myself learn new languages much faster and better. But this method can not be used for all of the stuff, especially for very complex technical materials which involves creativity. 

Till recently I feel kind of relieved when I read  this blog post by ying wang. He claims that people can never learn the essence of anything if they can not *reinvent* it. Now I have a brand new view of the occasions when I can’t recall. I treasure these moments as I can try to reinvent them using my limited and broken memory. Even if at last I can not reinvent it, then I pick up a textbook and find out where I got stucked. After two or three rounds of this process, I find myself can hardly forget the same concept again. This works pretty well when I go through the CLRS.

Besides this, I also have some other learning strategies that I always follow,

  1. always remind yourself about the starting point, the motivation.  You will definitely get lost if you don’t even know why you have reached place. 
  2. Try to connect the new knowledge with the old ones you already know very well.  When I learn KMP algorithm which is a pretty complex sting machting algorithm,  I found that my knowledge about DFA makes the algorithm very intuitive.
  3. Thy to abstract details out. Try to learn the theory and view things from a higher level. When I was in  the primary school, I was often struggling with a type of math problem which is called *Rabbits and chicken in the same cage*. It’s a literate translation from its Chinese name. I googled it and find out that it seems pupils other than Chinese don’t need to go through these kind of brain teasers. The problem tells you the total number of heads as well as that of legs, then we are required to calculate the number of chickens and rabbits. At that time we haven’t learned any algebra, so my teacher taught me some weried thinking process such as imagining all of the rabbits suddenly stand up only on two of their four feets. Though I was viewed as a smart student by being good at these questions, I don’t like them at all. I am telling this story to illustrate that theory is indeed important. Thinking in the level of algebra makes a huge type of thoses questions so simple. In a similar way, knowing about the DFA theory makes regular expression a piece of cake. And you won’t struggle yourself to parse HTML using regular expressions.
  4. Try to get it better explained. Different explanations and views make a huge difference. When I entered the college, I didn’t like linear algebra at all. The lecturer started from teaching us determinant even without explaing why we need it. They we are required to prove a few of properties of derterminant. But everything changed when I found the algebra open course given by Gilbert Strang. He is such a great teacher. He makes every point in algebra so clear and well explained. I now have a feeling linear algebra is the most beatiful subject I’ve learned ever. Another story is about Red-black tree. I can never remember all of the operations even I read though the subject in CLRS several times. But things totally changed when I heared how its inventer, Robert Sedgewick present it. He started from 2-3 tree which is conceptually much much simpler thant RB tree, then he described RB tree as a mimic of 2-3 tree. The implementation is so short and concise. It’s definitely a piece of art in computer science. So when you find some cocepts really hard to understand, maybe it’s not your fault but the teacher’s. Try to find out how the best knowledgable person explain it.
  5. Try to dig the rabbit hole. Don’t be satisified with the surface especially for the concepts related to engineering in computer science. Don’t try to memrize the time complexity of different sorting algorithm or the hash table. If you know how they are implemented, you can never forget those. C++ is known for its comlexity and traps. But if you know the details under the hood, you will find the language much easier. I know some people who claim thay are master of certain programming languages by learning the language specification by heart. I think even this method works, it’s really inefficient. Thinking about how one can implement certain language features would make it much simpler. I often find some people get confused by lexical scope and dynamic scope. But if you know the evaluation model, it’s really hard to conflate them.
  6. Try to reinvent the stuff when you forget the details. As I said at the beginning, treasure the opptunities when you forget about the stuff you have learned. It means you don’t really understand them at the first place. 

Leave a comment

Three strategies to implement link data structure

There are a lot of methods to map a conceptually linked structure into language level structures provided by C.

Linked stucture includes simple linkedlist, trees, graphs and so on.

A data object normally contains a key and some satellite data. A keys differentiates the object from others. It can be an integer, but we will see that it can also be other stuff. A data object in a linked structure always contains at least one link which points to other objects in the data structure. A link can be a C porinter, but we will see that links can also be other things.

Here I will show three typical strategies to implement a linked data structure using binary tree as example.

  1. Pure static data structure. We can use three arrays to represent data, and left children and right children. Here the keys are characterized by the subscripts of the arrays. Links are also modeled by the subscripts. All of the array can be hosted on the stack if we omit the stack size limit. That’s why I would call it a pure static data structure.
  2. Pure dynamic data strucure. Each node is modeled using a C structure in which we have two pointers point to other Nodes. Normally in this strategy all of the nodes are dynamically allocated on the heap. The kesys of objects are modeled using their address in the memory. So we need to take care of the memory, otherwise this would lead to a typical memory leak. 
  3. Hybrid approach. We can also allocated all of the data on the stack. A typical way to do this is to allocate an array which holds all the objects on the stack. But we still use pointers to help speed up the travelling through the nodes. Of course you can also put the whole array on the heap, but you see the point, the key(or better called Identitty care characterized  using both subscripts and adresses.

Quite often a subsript based approach can give us a very simple way to iterate through all of the objects.  A pointer based approach gives us a natural approach to access the neighbours.

Leave a comment

Tips of implementing algorithms correct and fast

Recently I am preparing for the interview with google. Google interview is well known for its complexity. Applicants are required to write code on a white board or in google docs to solve complex questions. Not only having a deep understanding of various algorithms is important, but also the ability of implememting them correct and fast is neccessory.  It would be a big plus if one can write correct and clear code in one run.

I believe that practice makes perfect. After several days of coding training, I realized some general rules I follow when I implement complex algorithm.

  1. when dealing with arrays, be careful with the boundaries. My suggestion is to always use a half open interval([,)) as a processing unit. The advantage is that there would never be any overlap between processing units. Besides the terminal condition is easy to check. I would demonstrate this with quicksort in the future.
  2. Always modeling problems recursively if possible. Even if one is implementing algorithms using loops, modeling the problem in a recursive way would help make the code cleaner and conceptually simpler. 
  3. Think about states rather than a lot of variables. Here by states I mean a tuple of a lot variables. One should make it clear in mind how states are transfering. 

To be continued.

Leave a comment

Read it better

After the use of read it latter service such as pocket, I consider that bookmarks should die. The problem of bookmarks is obvious: it’s not related to the content.  It’s very hard for users to refer to those infrequently used bookmarks. Quite often I totally forgot why I’ve bookmarked some strange websites.

Read it latter service doesn’t suffer from this problem. The solution is very simple: searching. Bookmarking is  URL oriented  while read it later is closely connected to the content. From my personal experience, I often come into some technical tricks which I found solutions on certain webpages. However I just keep forgetting those tricks. I often ends up guessing the queries I’ve used to get to those webpages on google. With the help of pocket, I can recall get the information much easier.

I also absorb information from reader sevice like google reader and now digg reader. These services are so-called source based. Those sources which I trust can push new information to me even sometimes they also generate low quatlity information.

Another star website I’ve started using is It’s claimed to be very intelligent. It automatically push content based on the interests user provided. I’m gernerally satisfied with the content. However, it often makes mistakes on disambiguation. 

So how can we do better?

I like the idea of marking useful web pages by ourselves. I consider it can model users’ real interests much more accurately than simply relying on some keywords provided by users(which getprismatic does so). That’s what read it later already gives us. But we can go further with those marks. We can use the content to model users’ interest to push other webpages users may be interested. By crowdsourcing  the behavior of how users add bookmarks can be also effectively used.

I’m looking for this kind service. Or I will create one!

Leave a comment

How to ease ssh connection

My daily work requires connecting to some clusters via ssh quite often.  The organization of the clusters are typical:

  • a server dedicated to login is provided(I will call it login sever in the following post)
  • several different servers for computing can be reached from login server.
  • login server and all computing server share the same global home directory and data directory.

This organization makes a lot of daily work easy. In this post I will describe how I make it handy to work on local laptop by ssh and net filesystem.

The motivation for this is quite obvious:

  • I often need to login to certain server to run some programs
  • I often need to copy bulk data from machine to machine
  • I would rather use the tools and programs on local machine simply because they are often up to date.(for instance, vim on my remote server is still at version 7.2 which makes some extension not work. And I prefer Zsh to bash, but I can’t set my default shell to zsh due to privilege reason. I also don’t want to waste time configuring color theme and a lot of other stuff)

So here in this blog post I will present some techniques I’m using to make life much easier. It’s exactly suitable for people who work in computational linguistics department of Saarland University or Max Planck Institute in Saarbrucken, but for other environments it would be also easy to adapt the methods.

connect to a server without inputting the tedious password every time?

To make the system secure enough, quite often the administrator would ask you to set a rather complicated password which is painful to type every time when you want to login.
Here public-key cryptography comes to help. To make this concept not that technical, you can think of the whole process in this way:

  • first on your own laptop, use ssh-keygen to generate a key and a lock,
    ssh-keygen -t rsa

You need to type enter several times to get the key and lock with the default names. The “lock” is usually named as “” which you will put on the server. The sever will ask you for the key when you try to access it then the server will see whether the key you provided matches the lock. The common name for the “lock” you will find in other tutorials is normally “public key” which is not so intuitive, uh?

Connect to the computing servers directly

Mount remote filesystem on local machine


Get every new post delivered to your Inbox.