Skip to content

Read

read_delta

read_delta(table_uri: str, eager: bool = False) -> PolarsFrame

Reads a Delta table from the specified abfss URI. Automatically handles the authentication with OneLake.

Parameters:

Name Type Description Default
table_uri str

The abfss URI of the Delta table to read.

required
eager bool

If True, reads the table eagerly; otherwise, returns a lazy frame. Defaults to False.

False

Returns:

Name Type Description
PolarsFrame PolarsFrame

The data from the Delta table.

Example
from msfabricutils.etl import read_delta

workspace_id = "12345678-1234-1234-1234-123456789012"
lakehouse_id = "beefbeef-beef-beef-beef-beefbeefbeef"
table_name = "my-delta-table"
table_uri = f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/{table_name}"

df = read_delta(table_uri, eager=True)
lazy_df = read_delta(table_uri, eager=False)

read_parquet

read_parquet(table_uri: str, eager: bool = False) -> PolarsFrame

Reads a Parquet file from the specified abfss URI. Automatically handles the authentication with OneLake.

Parameters:

Name Type Description Default
table_uri str

The abfss URI of the Parquet file to read. Supports globbing.

required
eager bool

If True, reads the file eagerly; otherwise, returns a lazy frame. Defaults to False.

False

Returns:

Name Type Description
PolarsFrame PolarsFrame

The data from the Parquet file.

Example

Reading a single file

from msfabricutils.etl import read_parquet

workspace_id = "12345678-1234-1234-1234-123456789012"
lakehouse_id = "beefbeef-beef-beef-beef-beefbeefbeef"

file_path = "my-parquet-file.parquet"
folder_uri = f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Files/"

df = read_parquet(folder_uri + file_path, eager=True)

Reading all Parquet files in a folder

from msfabricutils.etl import read_parquet

workspace_id = "12345678-1234-1234-1234-123456789012"
lakehouse_id = "beefbeef-beef-beef-beef-beefbeefbeef"

folder_uri = f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Files/"
glob_df = read_parquet(folder_uri + "**/*.parquet", eager=True)