module medagentgym.agent

The MedAgent class serves as an agent designed to interact with Electronic Health Record (EHR) environments using Large Language Models (LLMs). It dynamically selects appropriate model configurations, manages conversation history, and processes environment observations to generate relevant actions based on permitted actions.


dataclass LLMConfig

A dataclass defining basic configuration parameters for the LLM:

Args:

  • model_path (str): Path to the LLM model.
  • tokenizer_path (str): Path to the tokenizer.
  • max_length (int): Maximum token length for generated sequences.
  • device (str): Device specifcation (e.g., "cpu", "cuda").
  • port (int): Network port for model deployment (used primarily for vLLM). The default value is 8000.

class MedAgentClass

Initialization: Initializes the agent with the provided configuration and permitted actions:

  • agent_config (dict): Configuration dictionary containing model specifications, retry logic, and other operational parameters.
  • permitted_actions (list): List of permitted actions the agent can perform.

The initialization dynamically sets up an LLM model based on the specified type (OpenAI, Azure, or vLLM). It also initializes:

  • Conversation history management.
  • Prompt handling with DynamicPrompt.
  • LLM response parsing.
  • Cost tracking.

Supported LLM Types:

  • OpenAI: Standard OpenAI models.
  • Azure: Azure-hosted models with configurable deployment parameters.
  • vLLM: Locally hosted vLLM instances on specific ports.

Raises an error if an unsupported model type is provided.


class act

act(obs: Any) -> Tuple[str, Dict[str, Any]]

Generates actions based on the observation (obs) from the MedAgentGym environment

Input:

  • obs (dict): A dictionary containing keys such as info and env_message>. Typically includes the current instruction, task goals, or environment feedback.

Output: Returns a tuple

  • action (str): Action identified by the LLM based on observations and conversation history.
  • params (dict): Additional parameters or details associated with the action.

Process Flow

Initialization of Conversation: On first interaction:

  • Extracts and formats permitted actions.
  • Constructs a prompt from instruction, action_definitions, and action_formats.
  • Initializes the conversation history with system and user messages based on the LLM type.

LLM Interaction:

  • Attempts LLM invocation, retrying up to n_retry times upon failure.
  • Logs errors and appends error handling messages to the conversation history.

Response Parsing:

  • Parses response using parse_llm_response.
  • Adds parsed actions and response to conversation history.

Error Handling:

  • Handles exceptions in parsing and LLM response generation gracefully by retrying and appending error messages.

Error Handling

  • implements retries upon encountering exceptions.
  • Delays retries based on retry_delay specified in the agent configuration.
  • Logs detailed error messages to aid debugging.

Usage Example

agent_config = {
    'llm': {
        'model_type': 'Azure',
        'model_name': 'gpt-4.1-mini',
        'temperature': 0.0,
        'max_new_tokens': 8192,
        'deployment_name': 'gpt-4.1-mini',
        'log_probs': False
    },
    'n_retry': 3,
    'retry_delay': 10
}
permitted_actions = ['retrieve', 'update', 'submit']
agent = EHRAgent(agent_config, permitted_actions)
obs = {
    'info': {'instruction': 'Retrieve patient data', 'task_goal': 'Get recent medical records'},
    'env_message': 'Patient ID 1234'
}
action, params = agent.act(obs)

This documentation covers the essential methods, configuration options, and detailed error-handling processes within the EHRAgent class for effective integration and utilization.