Usage#

To use this framework, you must first install AutoGluon RAG:

git clone https://github.com/autogluon/autogluon-rag
cd autogluon-rag

# Create a Virtual Environment (using Python, or conda if you prefer)
python3 -m virtualenv venv
source venv/bin/activate

#Install the package
pip install -e .

You can now use the package in two ways.

Use AutoGluon-RAG through the command line as agrag:#

AutoGluon-RAG


usage: agrag [-h] --config_file

AutoGluon-RAG - Retrieval-Augmented Generation Pipeline

options:
  -h, --help        show this help message and exit
  --config_file        Path to the configuration file 

Use AutoGluon-RAG through code:#

from agrag.agrag import AutoGluonRAG


def ag_rag():
    agrag = AutoGluonRAG(
        preset_quality="medium_quality", # or path to config file
        web_urls=["https://auto.gluon.ai/stable/index.html"], # List of URLs to use for RAG
        base_urls=["https://auto.gluon.ai/stable/"], # List of base URLs to use when processing web     
                                                     # URLs. Only Web URLs that stem from a base URL 
                                                     # will be processed.
        parse_urls_recursive=True, # Whether to recursively parse all URLs from the provided web url list
        data_dir="s3://autogluon-rag-github-dev/autogluon_docs/" # Directory containing files to use for RAG
    )
    agrag.initialize_rag_pipeline() # Initializes all modules in the RAG pipeline
    agrag.generate_response("What is AutoGluon?") # Generator


if __name__ == "__main__":
    ag_rag()

Configuring Parameters for AutoGluon-RAG:#

Using AutoGluonRAG class#

For a list of configurable parameters that can be passed into the AutoGluonRAG class, refer to the tutorial here.

Using Configuration File#

You can also use a configuration file with AutoGluonRAG. The configuration file contains the specific parameters to use for each module in the RAG pipeline. For an example of a config file, please refer to example_config.yaml in src/agrag/configs/. For specific details about the parameters in each individual module, refer to the README files in each module in src/agrag/modules/.

There is also a shared section in the config file for parameters that do not refer to a specific module. Currently, the parameters in shared are:

pipeline_batch_size: Optional batch size to use for pre-processing stage (Data Processing, Embedding, Vector DB Module). This represents the number of files in each batch. The default value is 20.