1. Natural Language Toolkit (NLTK)

NLTK is an essential library supports tasks such as classification, stemming, tagging, parsing, semantic reasoning, and tokenization in Python. It’s basically your main tool for natural language processing and machine learning. Today it serves as an educational foundation for Python developers who are dipping their toes in this field (and machine learning).

The library was developed by Steven Bird and Edward Loper at the University of Pennsylvania and played a key role in breakthrough NLP research. Many universities around the globe now use NLTK, Python libraries, and other tools in their courses.

This library is pretty versatile, but…

What is a word vector?

At one level, it’s simply a vector of weights. In a simple 1-of-N (or ‘one-hot’) encoding every element in the vector is associated with a word in the vocabulary. The encoding of a given word is simply the vector in which the corresponding element is set to one, and all other elements are zero.

Suppose our vocabulary has only five words: King, Queen, Man, Woman, and Child. We could encode the word ‘Queen’ as:

Using such an encoding, there’s no meaningful comparison we can make between word vectors other than equality testing.

In word2vec, a distributed representation of a word…

Logistic regression is a ubiquitous and widely used algorithm for classification. It is a classification model, very easy to use and its performance is superlative in linearly separable class. This is based on the probability for a sample to belong to a class. Here probabilities must be continuous and bounded between (0, 1). It is dependent on a threshold function to make a decision that is called Sigmoid or Logistic function.

Important Points:

  • Logistic regression is widely used for classification problems
  • Logistic regression doesn’t require linear relationship between dependent and independent variables. …

Decision trees use multiple algorithms to decide to split a node in two or more sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes.We know that decision tree have many type of structure of tree

1. CHAID (Chi-square Automatic Interaction Detector)

2. CART (Classification and Regression Tree)

3. The Iterative Dichotomiser 3 (ID3)

4. C4.5

5. C5.0

in this post we will discuse about CART. It is an algorithm to find out the statistical significance between the differences between sub-nodes and parent node. …

