Colorless Green Ideas Sleep Furiously


Leave a comment

Valuable tools for writing robust shell script

I hate programming in shell script because there are so many pitfalls. However, sometimes I get stuck with writing it.
I find that the following two tools are very helpful for writing solid and robust scripts:
* shellcheck
* explainshell


Leave a comment

Some thoughts on digtital signature

I am rather new to the topic of cryptography. Those terms such as Digital Certificate, Digital signature, encryption, decryption sometimes confuse and scare me. Here are some thoughts which makes me understand the big picture better.

First, all that I am talking about is based on asymmetric cryptography.  It’s a class of cryptographic algorithms which requires two separate keys. e.g, the secret key and the public key. Then I won’t bother too much about what encrypt or decrypt means. These terms are coined from an aspect from the applications.

Blank Flowchart - New Page

All you should know is that you can use either the Private Key to transform Data1 to Data2, or transform Data2 to Data1. I am not going in to the details how this transform works(Check RSA page on Wikipedia to get the basic idea). There are some important facts to know in order understand the big picture. First of all, If you don’t know the Private Key, it’s very hard to do the transformation from Data1 to Data2. Secondly, if the Private Key is known, it’s trivial to generate a Public Key according to it.

Both data encryption and digital signature relies on theses facts. For data encryption, suppose User1 wants to send Data2 to User2, he will first use the public key to transform Data2 to Data1, then send Data1 on an insecure Channel, or even broadcast it. No user other than User2 has Private Key. Thus only User2 is able to transform Data1 to Data2 to see the original data.

For digital signature, suppose User1 wants to sign Data1  and send it to User2. How can User2 make sure the Data1 he gets is really from User1 and intact? User1 uses his Private Key to transform Data1 to Data2, and sends both Data1 and Data2 to User2. Here Data2 is called the signature. User2 uses the Public Key to transform Data2 back to Data1 and check whether it’s identical to the Data1 he gets. If not, it means Data1 is modified or Data2 is fake. In reality, people don’t generate the signature directly from data because the transformation is expensive. They use a hash function to get a hash value from data then generate the signature from it.

So far so good. But there is still one problem in this picture, how can User2 be sure that the Public Key he holds is really from User1? The obvious way would be that they meet in person. However, it’s not feasible in most cases. The other solution is they both trust in a third person. The third person writes a certificate says “hereby I confirm that the public key belongs to User1” and signs it. The third person is so-called certificate authority. The problem is that how User1 and User2 can trust this certificate? They should both hold a reliable public key of the third person. So we are chasing our own tails now. The solution is that we have some pre-installed certificates on both sides.

Leave a comment

Package and deploy python code

Packaging and deploying python code is very easy.

I found this tutorial quite thorough and easy to follow.

The thing missing in this tutorial is that one feature of setuptools, e.g. entrypoint, which adds a runnable binary to the system path, is not mentioned.

Then refer to a more detailed document

Leave a comment

How to learn

Quite often I feel depressed when I find out that I keep forgetting stuff which I have spent a lot of time learning.  I know that human beings are created this way. We can’t remember things forever unless you keep repeating. People even forget mother languages when they stop using them for ages. But I still feel depressed. I have a feeling that how I can progress if I keep throwing valuable things away when I am learning new knowledge.

I feel the same way when I learn a new language. But I normally use to collect all of the phrases and words l have learnt. And I review them from time to time. In this way I find myself learn new languages much faster and better. But this method can not be used for all of the stuff, especially for very complex technical materials which involves creativity. 

Till recently I feel kind of relieved when I read  this blog post by ying wang. He claims that people can never learn the essence of anything if they can not *reinvent* it. Now I have a brand new view of the occasions when I can’t recall. I treasure these moments as I can try to reinvent them using my limited and broken memory. Even if at last I can not reinvent it, then I pick up a textbook and find out where I got stucked. After two or three rounds of this process, I find myself can hardly forget the same concept again. This works pretty well when I go through the CLRS.

Besides this, I also have some other learning strategies that I always follow,

  1. always remind yourself about the starting point, the motivation.  You will definitely get lost if you don’t even know why you have reached place. 
  2. Try to connect the new knowledge with the old ones you already know very well.  When I learn KMP algorithm which is a pretty complex sting machting algorithm,  I found that my knowledge about DFA makes the algorithm very intuitive.
  3. Thy to abstract details out. Try to learn the theory and view things from a higher level. When I was in  the primary school, I was often struggling with a type of math problem which is called *Rabbits and chicken in the same cage*. It’s a literate translation from its Chinese name. I googled it and find out that it seems pupils other than Chinese don’t need to go through these kind of brain teasers. The problem tells you the total number of heads as well as that of legs, then we are required to calculate the number of chickens and rabbits. At that time we haven’t learned any algebra, so my teacher taught me some weried thinking process such as imagining all of the rabbits suddenly stand up only on two of their four feets. Though I was viewed as a smart student by being good at these questions, I don’t like them at all. I am telling this story to illustrate that theory is indeed important. Thinking in the level of algebra makes a huge type of thoses questions so simple. In a similar way, knowing about the DFA theory makes regular expression a piece of cake. And you won’t struggle yourself to parse HTML using regular expressions.
  4. Try to get it better explained. Different explanations and views make a huge difference. When I entered the college, I didn’t like linear algebra at all. The lecturer started from teaching us determinant even without explaing why we need it. They we are required to prove a few of properties of derterminant. But everything changed when I found the algebra open course given by Gilbert Strang. He is such a great teacher. He makes every point in algebra so clear and well explained. I now have a feeling linear algebra is the most beatiful subject I’ve learned ever. Another story is about Red-black tree. I can never remember all of the operations even I read though the subject in CLRS several times. But things totally changed when I heared how its inventer, Robert Sedgewick present it. He started from 2-3 tree which is conceptually much much simpler thant RB tree, then he described RB tree as a mimic of 2-3 tree. The implementation is so short and concise. It’s definitely a piece of art in computer science. So when you find some cocepts really hard to understand, maybe it’s not your fault but the teacher’s. Try to find out how the best knowledgable person explain it.
  5. Try to dig the rabbit hole. Don’t be satisified with the surface especially for the concepts related to engineering in computer science. Don’t try to memrize the time complexity of different sorting algorithm or the hash table. If you know how they are implemented, you can never forget those. C++ is known for its comlexity and traps. But if you know the details under the hood, you will find the language much easier. I know some people who claim thay are master of certain programming languages by learning the language specification by heart. I think even this method works, it’s really inefficient. Thinking about how one can implement certain language features would make it much simpler. I often find some people get confused by lexical scope and dynamic scope. But if you know the evaluation model, it’s really hard to conflate them.
  6. Try to reinvent the stuff when you forget the details. As I said at the beginning, treasure the opptunities when you forget about the stuff you have learned. It means you don’t really understand them at the first place. 

Leave a comment

Three strategies to implement link data structure

There are a lot of methods to map a conceptually linked structure into language level structures provided by C.

Linked stucture includes simple linkedlist, trees, graphs and so on.

A data object normally contains a key and some satellite data. A keys differentiates the object from others. It can be an integer, but we will see that it can also be other stuff. A data object in a linked structure always contains at least one link which points to other objects in the data structure. A link can be a C porinter, but we will see that links can also be other things.

Here I will show three typical strategies to implement a linked data structure using binary tree as example.

  1. Pure static data structure. We can use three arrays to represent data, and left children and right children. Here the keys are characterized by the subscripts of the arrays. Links are also modeled by the subscripts. All of the array can be hosted on the stack if we omit the stack size limit. That’s why I would call it a pure static data structure.
  2. Pure dynamic data strucure. Each node is modeled using a C structure in which we have two pointers point to other Nodes. Normally in this strategy all of the nodes are dynamically allocated on the heap. The kesys of objects are modeled using their address in the memory. So we need to take care of the memory, otherwise this would lead to a typical memory leak. 
  3. Hybrid approach. We can also allocated all of the data on the stack. A typical way to do this is to allocate an array which holds all the objects on the stack. But we still use pointers to help speed up the travelling through the nodes. Of course you can also put the whole array on the heap, but you see the point, the key(or better called Identitty care characterized  using both subscripts and adresses.

Quite often a subsript based approach can give us a very simple way to iterate through all of the objects.  A pointer based approach gives us a natural approach to access the neighbours.

Leave a comment

Tips of implementing algorithms correct and fast

Recently I am preparing for the interview with google. Google interview is well known for its complexity. Applicants are required to write code on a white board or in google docs to solve complex questions. Not only having a deep understanding of various algorithms is important, but also the ability of implememting them correct and fast is neccessory.  It would be a big plus if one can write correct and clear code in one run.

I believe that practice makes perfect. After several days of coding training, I realized some general rules I follow when I implement complex algorithm.

  1. when dealing with arrays, be careful with the boundaries. My suggestion is to always use a half open interval([,)) as a processing unit. The advantage is that there would never be any overlap between processing units. Besides the terminal condition is easy to check. I would demonstrate this with quicksort in the future.
  2. Always modeling problems recursively if possible. Even if one is implementing algorithms using loops, modeling the problem in a recursive way would help make the code cleaner and conceptually simpler. 
  3. Think about states rather than a lot of variables. Here by states I mean a tuple of a lot variables. One should make it clear in mind how states are transfering. 

To be continued.