Get started on the first variant of building a search engine
Get started on the first variant of building a search engine
Welcome to the first challenge in the Search Engine Series, a step-by-step guide to building a fully functional and complex search engine. This introductory problem lays the groundwork by implementing a simple search engine based on an inverted index.
You will create a REST API that includes two essential functionalities:
No submissions yet, start by making your first submission
Welcome to the first problem in the Search Engine Series, a journey to learning how to build a functional and complex search engine from scratch. You will discover step by step some of the different techniques and algorithms used in making search engines.
In this introductory problem, you are going to task is to implement a simple search engine based on an inverse index. It will be in the form of a Rest API with the following functionalities:
This problem introduces the concept of an inverted index, which you will use to build your solution.
An inverted index is a data structure designed for efficient text searching. It maps each unique word (or token) in a collection of documents to the list of document IDs where the word appears. This allows for rapid retrieval of relevant documents based on search queries. Here's a breakdown of how it works:
The inverted index is built by processing each document in the system and recording the occurrence of each word within it. In a real-world scenario, you would typically perform preprocessing on the text (e.g., removing stop words, tokenization, stemming). However, for simplicity in this project, we assume each word in the text is a valid token without additional preprocessing.
Consider two documents:
"hello world hello"
"hello there"
The inverted index would look like this:
{
"hello": [1, 2],
"world": [1],
"there": [2]
}
Here:
"hello"
appears in Document 1 and Document 2."world"
appears in Document 1."there"
appears in Document 2.The inverted index is used to efficiently retrieve documents that match a search query. Here's how:
Using the inverted index from the previous example:
"hello"
→ Lookup "hello"
→ Result: [1, 2]
"world"
→ Lookup "world"
→ Result: [1]
"hello world"
:
[1, 2]
(contain either "hello"
or "world"
)[1]
(contains both "hello"
and "world"
)An inverted index is a highly effective data structure for fast and accurate text searching. It maps words to documents and enables efficient retrieval of relevant results. This foundational concept will serve as the building block for more advanced search engine features, such as ranking and handling complex queries, in subsequent problems of this series.
You need to implement an API with the following routes:
POST /documents
{ "id": integer, // Unique identifier for the document "text": "string" // The text content of the document }
{ "message": "Document added successfully." }
id
already exists, update its text.Endpoint: GET /search
Query Parameters:
query
: A string containing the search query.mode
: A string specifying the search mode, either "union"
or "intersection"
.Response:
{ "results": [ { "id": integer, "text": "string" } ] }
Functionality:
10,000
characters and will not contain any punctuation.10,000
.500
characters.