American Scientific Research Journal for Engineering Technology Sciences

Closest Match Based Information Retrieval and Recommendation Engine using Signature-Trees and Fuzzy Relevance Sorting Algorithm

Aug 1, 2018

American Scientific

This paper proposes a recommendation technique to avoid exhaustive search to be ran on the database with thousands of records, before coming to a conclusion or inference, where it can be said that recommended thing is matching up to a significant percentage of what was initially desired. Often such searches involve not just the simple full-match search based on indexes, but also the partial or nearby match searches where which percentage of match between entities is relevant enough for ultimate recommendation. Usually these problems are tackled by various methods like Fuzzy operations, Reg-Ex searches, Clustering, Similarity Analysis each having its own set of effectiveness as well as efficiency. Our goal here was to create a search and recommendation system which can perform fuzzy-search and fuzzy-similarity-analysis with near-match percentages in an effective, efficient as well as user-friendly manner on thousands of records/ files/ rows with 100s of attributes/ features/ columns. Inspired from Google’s Image Searching Algorithm, that search on the basis of signatures based on feature-extraction from each image, we have created Match engine, that read schema of data or files, compiles encoded signature and store them as an index. That index is then converted into a tree (S-Tree), on the basis of relevance of each field/ column and data frequency observed. After compilation done, system can now search and recommendation of best matches in very efficient manner. For further optimization we use heuristics like dividing feature sets into hard-filters and soft-filters, former demands full match and later demands fuzzy match. On arriving even one best match, we can retrieve other matches without searching.