Read
read_delta
read_delta(table_uri: str, eager: bool = False) -> PolarsFrame
Reads a Delta table from the specified abfss URI. Automatically handles the authentication with OneLake.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_uri
|
str
|
The abfss URI of the Delta table to read. |
required |
eager
|
bool
|
If True, reads the table eagerly; otherwise, returns a lazy frame. Defaults to False. |
False
|
Returns:
Name | Type | Description |
---|---|---|
PolarsFrame |
PolarsFrame
|
The data from the Delta table. |
Example
from msfabricutils.etl import read_delta
workspace_id = "12345678-1234-1234-1234-123456789012"
lakehouse_id = "beefbeef-beef-beef-beef-beefbeefbeef"
table_name = "my-delta-table"
table_uri = f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/{table_name}"
df = read_delta(table_uri, eager=True)
lazy_df = read_delta(table_uri, eager=False)
read_parquet
read_parquet(table_uri: str, eager: bool = False) -> PolarsFrame
Reads a Parquet file from the specified abfss URI. Automatically handles the authentication with OneLake.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_uri
|
str
|
The abfss URI of the Parquet file to read. Supports globbing. |
required |
eager
|
bool
|
If True, reads the file eagerly; otherwise, returns a lazy frame. Defaults to False. |
False
|
Returns:
Name | Type | Description |
---|---|---|
PolarsFrame |
PolarsFrame
|
The data from the Parquet file. |
Example
Reading a single file
from msfabricutils.etl import read_parquet
workspace_id = "12345678-1234-1234-1234-123456789012"
lakehouse_id = "beefbeef-beef-beef-beef-beefbeefbeef"
file_path = "my-parquet-file.parquet"
folder_uri = f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Files/"
df = read_parquet(folder_uri + file_path, eager=True)
Reading all Parquet files in a folder
from msfabricutils.etl import read_parquet
workspace_id = "12345678-1234-1234-1234-123456789012"
lakehouse_id = "beefbeef-beef-beef-beef-beefbeefbeef"
folder_uri = f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Files/"
glob_df = read_parquet(folder_uri + "**/*.parquet", eager=True)