Finding The Most Important Sentences Using TF-IDF by Python

Newt Tan
1 min readApr 29, 2020

--

Normally, the TF-IDF is used on words while not sentences. This realization is part of my college research project in fact. The dataset won’t be provided here for privacy reasons.

I read an article about the realization of this by javascript, which is quite good. But part of the code is possible to improve and rewrite by python. So I wrote this article and want to share it with your guys.

The performance is like:

Sentences Importances rank

Part of the code is:

InverseDocumentFreq Code

If you want to know more about the principles of the algorithm. You can read the reference article. Help yourself.

Here it is:

Source code: https://github.com/Wapiti08/Algorithms_on_Feature_Engineering/blob/master/TF-IDF-Sen.ipynb

Reference: https://hackernoon.com/finding-the-most-important-sentences-using-nlp-tf-idf-3065028897a3

--

--

Newt Tan

In the end, the inventor is still the hero and always will be. Don’t give up on your dreams. We started with DVDs.