아이폰 | 1 | 0 | 0 | 0 | 0 |
갤럭시 | 0 | 1 | 0 | 0 | 0 |
애플워치 | 0 | 0 | 1 | 0 | 0 |
맥북 | 0 | 0 | 0 | 1 | 0 |
갤럭시북 | 0 | 0 | 0 | 0 | 1 |
위와같은 형태의 벡터 데이터를 가질 수 있다.
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("https://snuac.snu.ac.kr/2015_snuac/wp-content/uploads/2015/07/asiabrief_3-26.pdf")
from langchain.text_splitter import RecursiveCharacterTextSplitter
textSplitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=0)
from langchain_openai import OpenAIEmbeddings
embeddingsModel = OpenAIEmbeddings(openai_api_key=OPENAI_KEY)
embeddings = embeddingsModel.embed_documents(documents)
print(embeddings[0])
# output
[-0.00743497050557687,
-0.01563150953761078,
0.0015626668057332756,
-0.019168283291034625,
-0.018213096531648893,
0.014663413716354573,
-0.008596684559761767,
0.004595224625870433,
-0.005343885021779546,
-0.0034141486086942847,
0.011862392228291914,....]
임베딩 쿼리 질의
from langchain_openai import OpenAIEmbeddings
embeddings_model = OpenAIEmbeddings(openai_api_key=OPENAI_KEY)
embeddings_query = embeddings_model.embed_query("안녕!")
print(len(embeddings_query))
print(embedded_query[:5])
#output
1536
[-0.00743497050557687, -0.01563150953761078, 0.0015626668057332756, -0.019168283291034625, -0.018213096531648893]
sample_texts = [
"안녕!",
"빨간색 공",
"파란색 공",
"붉은색 공",
"푸른색 공",
]
documents = []
for item in range(len(sample_texts)):
page = Document(page_content=sample_texts[item])
documents.append(page)
db = Chroma.from_documents(documents=documents, embedding=embeddings_model)
query = "레드"
docs = db.similarity_search(query)
print(docs[0].page_content)
# output : 빨간색 공
query = "인사"
docs = db.similarity_search(query)
print(docs[0].page_content)
# output : 안녕!
import sys
sys.path.append('/Users/jinwook/Documents/langchain/quick_start/')
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from chat_pdf_template import *
from langchain_openai import ChatOpenAI
from langchain.schema.runnable import RunnablePassthrough
from langchain_const import *
loader = PyPDFLoader("https://snuac.snu.ac.kr/2015_snuac/wp-content/uploads/2015/07/asiabrief_3-26.pdf")
# PDF 를 로드하고 일정 기준으로 스플릿한다.
pages = loader.load_and_split()
# 스플릿한 데이터를 추가적으로 스플릿한다.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=0)
splits = text_splitter.split_documents(pages)
# 임베딩을 통한 벡터데이터 저장
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings(openai_api_key=OPENAI_KEY)) # text-embedding-ada-002
retriever = vectorstore.as_retriever()
llmChatOpenAi = ChatOpenAI(model_name=GPT_MODEL, temperature=0, api_key=OPENAI_KEY)
# RunnablePassThrough 공부하기
ragChain = {"ragContext": retriever, "question": RunnablePassthrough()} | ragPromptCustom | llmChatOpenAi
print(ragChain.invoke('한국의 저출산 원인이 뭐야?'))
Langchain 과 Few-Shot 프롬프트 엔지니어링 (feat. OpenAI) (0) | 2024.07.29 |
---|---|
Cypress - 클라이언트 테스트 자동화란? + env 동적 전달 (0) | 2023.04.20 |
SpringBoot APM - Prometheus 란? (0) | 2023.04.20 |
GIT: Interactive 와 vscode에서 사용하기 (0) | 2022.07.04 |
redis queue system (1) | 2022.03.30 |