본문 바로가기
생성형 AI/langchain

Langchain 1 - preface, pdf.ai (Eng)

by monsangter 2024. 7. 8.

pdf.ai processes a multi-page PDF, and when a user inputs a question, it provides an answer related to that question.

 

 

How does this process work?

 

First, let's consider dragging and copying all the text from the PDF, appending the user's question at the end.

Since only alimited amount of text can be input into GPT, this is not a good idea. Additionally, inputting a large amount of text eventually leads to increased costs.

 

Another method is:

when a user inputs a PDF, extract all the text, devide it into segments, and store summaries of these segments. Then, append the user's question to these summaries. These text segments are converted into 1536 vectors (a set of real numbers) using an embedding generation algorithm and sotred in a specialized database.

 

Using the stored vectors and the similarly embedded question, the the system calcuates the similarity and appends and queries segmens with similar relevance.

This is how the entire process works.

 

Can this process be implemented with LangChain?

 

LangChain provides libraries that can parse and extract words from necessary text, including PDFS, JSON, and even S3 servers. it also offers libraries for storing this text in a vector storage.

 

Given the current focus on vector store, many companies and startups, including Postgres and Redis. provide these services. Langchain offers interchangeable libraries for these platforms (ensuring similar library signatures for parameters, etc)

 

 

Therefore, LangChain's goals are as follows:

1. Load, parse, store, query data, and pass it through models like GPT.

2. Integrate using various services provided by numerous companies into one.

3. Simplify switching betweeen different providers. Swapping out between different models and services is straightforward.

'생성형 AI > langchain' 카테고리의 다른 글

Langchain - preface, pdf.ai (한)  (1) 2024.07.08

댓글