Discriminative models
Model P(Y|X)
Construct a single model for all the data
Predict labels based on observed data
Requires fewer data, fast, good performance even with small dataset
Generative models
Model P(X,Y)
Need to construct a model for each label (yi). During prediction, select yi based on the highest probability from each of the model of yi. So no boundary.
Need a lot of data to build the model, but the information is complete. Can do more than predicting labels.