Using AutoGluon-RAG to generate embeddings.#
agrag = AutoGluonRAG(
data_dir="path/to/data",
preset_quality="medium_quality", # or path to config file
)
agrag.initialize_data_module()
agrag.initialize_embedding_module()
processed_data = self.process_data()
embeddings = agrag.generate_embeddings(processed_data=processed_data)
Here, instead of calling initialize_rag_pipeline to initialize the entire pipeline, we simply initialize the data and embedding modules to generate the embeddings.
generate_embeddings returns a pandas DataFrame with the following columns: "doc_id", "chunk_id", "text", "embedding", "all_embeddings_hidden_dim".
You can obtain the actual embeddings by:
embeddings_list = embeddings["embedding"].tolist()
embeddings_array = np.array(embeddings_list)