Normally, the TF-IDF is used on words while not sentences. This realization is part of my college research project in fact. The dataset won’t be provided here for privacy reasons.
I read an article about the realization of this by javascript, which is quite good. But part of the code is possible to improve and rewrite by python. So I wrote this article and want to share it with your guys.
The performance is like:
Part of the code is:
If you want to know more about the principles of the algorithm. You can read the reference article. Help yourself.
Here it is:
Source code: https://github.com/Wapiti08/Algorithms_on_Feature_Engineering/blob/master/TF-IDF-Sen.ipynb
Reference: https://hackernoon.com/finding-the-most-important-sentences-using-nlp-tf-idf-3065028897a3