Tuesday, September 14, 2010

Visual Studio shortcuts

Some of the shortcuts which are extremely necessary !

Ctrl + ] - for matching brackets
Shift + alt + enter for maximizing the code window to fullscreen.
Ctrl + tab to shift across code files
Ctrl+f3 - go to definition ( class or method)
Also Ctrl-K + Ctrl-C or Ctrl-K + Ctrl-U to comment and uncomment code is a great one.
Ctrl k + ctrl F for auto indenting code

Download the poster available in this msdn link which has a poster containing all essential shortcuts. This will be handy for any visual studio developer.
download link

cheers!

Saturday, August 28, 2010

When things go completely right - there is something wrong somewhere

Things never go the way you want. This hypothesis is nearly always true. But to me - "when things go more than what you want there is some problem somewhere".

I have changed my job and joined THE big software firm. As always , the problem with the big firm is 'process'. Everything has its own process and so every thing , be it an request or suggestion, the turn around time is usually large. Adding to this, the release cycle is in its near end , and I am left with no work. Idle for more than a month , I decided to go ahead with implementing some challenging stuff.

As my interest in machine learning and information retrieval is growing like an exponential upward curve , I decided to do "text classification". The algorithm classifies the given text into some category. Given a web page / document it classifies them automatically into some topics , say news , sports , religion or celebrities or anything.

Its basically a machine learning algorithm. It creates or generates a learning function from the training data. Using this learning function , we classify the test data into one of the categories. I went ahead implementing the naïve bayes classification algorithm as prescribed in Information Retrieval text book. My aim is to build the prototype first and then do extensive tuning and find some important information which improved the classifier accuracy. When am building the prototype , I started reading papers online to see how we can tune this algorithm. My major interest is in finding some better tuning algorithm. But what happened is a miracle. ..

As always , I don’t normally make programming errors and the code is running the first time. Guess what , I felt the fundamental model very simple ,and have implemented the tuning stuff also. After the program ran ,results were weird . The accuracy rate is below 40%. I couldn’t spot the error. After struggling considerable amount of time , I decided , ok this is not going to work and just for the sake of it I removed the extra tuning algorithm and ran the simple one. And there it is 100% accurate classification. I wasn’t happy. I didn’t jump in the air because its 100%. I am quite sure , classification algorithms cant predict them accurately that too 100%. This is one of the moments where scoring an 100 is not good. If you score an hundred ,it could be almost sure that this is due to some programming bug. Adding to my woes, I have removed my so called tuning algorithm. It has defeated my very purpose of doing this project which is to find some heuristics on top of the existing one. disappointed. I realized - " not always scoring an 100 is good" . All the effort , its gives 100% accuracy after having implemented the simple model. Now how do I get the enthusiasm to work and learn complex algorithms. Simple one did everything it has to do. Dejected. Felt something wrong , something fundamentally wrong.

Then started thinking why its 100%. How come it could predict so accurately. The problem is , I have to test with large number of test cases. All my carefully selected test-cases did the trick somehow. Finally I tested the whole program with more number of test cases, and I got the accuracy down to 92%. Happy ! Delighted. Because I believe this is the right number and also because it makes me work on advanced heuristics algorithm to increase the classifier accuracy.

One moment where it exceeded my expectations , still I wasn’t happy with what I got ! Expectations - achieve more or less - there is some problem somewhere ! Now that its 92% , it gives me the greatest opportunity to explore advanced methods and heuristics to go towards 100% ! I never want to reach the ultimate 100 , as it will cease my opportunities to learn more. There should always be another mile stone !

note:

information retrieval book is avaialble here link
this is actually done , by taking details from the assignment given at stanford for IR course - link

Friday, August 20, 2010

Interests: You got to find what you really want to do!

I have spent number of days deciding what I truly want to do. I have decided my domain of work will be search , search advertising which involves information retrieval , machine learning and statistical models. so here it is...


Information retrieval

Index construction and compression
Boolean and probabilistic retrieval models - BPT
Vector space model - Linear algebra
Scoring and ranking in search system
Evaluation and feedback systems
Text classification and categorization
Web search additional challenges

Machine learning (Tom Mitchell)

Decision Tree Learning
Artificial Neural Networks
Bayesian Learning
Instance-Based Learning
Genetic Algorithms
Learning Sets of Rules
Analytical Learning
Reinforcement Learning

Graph theory

Enumeration
Path problems
Coloring , covering and partitioning
Network flow
Properties and different graphs

Game theory
Optimization
Natural language processing
Data mining

Mathematics

Probability and statistics
Discrete mathematics
Linear algebra
calculus

Algorithms and Data structures

I feel lucky to have identified my areas of interest. Currently I am reading IR and going to follow up with machine learning and statistics.

Have some good knowledege in algorithms and graph theory.
These are more related to web , search and information retrieval and audience intelligence algorithms.

I wish I keep getting opportunities to work on these areas all through my career and I want to touch every topic in this in the near future! Right now , am doing Vector space models and text classification algorithms in IR

Thursday, August 19, 2010

Thursday, July 22, 2010

Primer : Probability

This blog is very basic and Yes everybody in the world know this.

P(A) = n(A)/n(S)
P(A u B) = P(A)+P(B)-P(A n B)

Probability - how likely the event will occur. If we toss a coin , it is likely that the head will occur once in two tosses. But this may not be true. Two consecutive tosses may yield both tails. So how 1/2. If this experiment is repeated a huge number of times , it is observed that the odds with which the head will fall is 1/2. obvious the coin is unbiased.

Two events , are said to be independent if the outcome of one event doesn’t impact the outcome of the other. Those events are called as independent events. Lets say there are two coins and I toss them simultaneously , the outcome of one coin has no effect on the other coin and hence they are independent.

Now whats the probability of getting both heads when two coins are tossed simultaneously. Its 1/4

P(A n B) - getting both heads . P(A).P(B)

Two events are said to be mutually exclusive if the occurrence of one event , results in the absence of another event. If event A occurs event B doesn’t occur and vice versa. Here P(A n B) = 0 - mutually exclusive.


Conditional Probability

Defn : If A and B are not independent , given that the event B has occurred what is the probability of event A to occur.

Lets take one example. Rolling a die.
A : number 5
B : odd number

Lets say ,its given that B has occurred in an experiment. Now what is the probability of A. its 1/3. How did we get. The sample space is now 3 since the event B has occurred and the odds that it will be number 5 is one amongst three.

P(A | B) { A given B} = P(A n B)/P(B)

If two events are independent then
P(A|B)=P(A).(P(B)/P(B) which gives P(A). Which says the occurrence of B does not change the odds of A, its still P(A) regardless.

If two events are mutually exclusive , then the probability of event A given event B has occurred is P(A|B) = 0. If B has occurred then A will not occur that’s mutually exclusive.

Bayes theorem

In simple terms, it is the relation between the conditional probability and its inverse.

P(A|B)=(P(B|A)*P(A))/P(B)

Just brushing the basics again and one final time !

Tuesday, July 13, 2010

Classical todo list

ABSTRACTION

we are right now working in the top most layer say .net framework , java or something. we dont need to worry how actually the code is executed by the machine , how the code loads in memory or how our thread gets scheduled or when vitual memory swapping happens or anything. its all abstraction. You dont need to know whats happening at our processor level. We just know it happens. But Its always essential to understand some inner details to have the complete picture. This is highly essential to become a complete software engineer and its a pre-req for any architect. I have been in many discussion where some guy starts talking about the internals and I have no clue at that level and so the other guy kind of gets the upper hand. and so....

here is my list of todo's click this link.
Currently am focussed in understanding some architecture and operating systems concepts.