Main Interface

Displaying Data

There are two easy ways to get going:

  • quickshow(input_data)
    • Pass either a pandas DataFrame or the name/path of a local csvfile. quickshow will display the results in Jupyter Notebook.
  • run_transforms(transforms, display_results='notebook')
    • Runs a stack of transforms, which can include sourcing and sinking data. To pass in an existing DataFrame, see SourceDF.
    • If display_results is set to ‘notebook’, Datamode will display the UI if it detects that it’s running in Jupyter Notebook. Disable this explicitly by passing None.
    • Returns a TransformContext object. You can access the output DataFrame in the member df_current.

Transforming Data

The heart of Datamode is the Transforms stack.

  • Load your data with SourceFile, SourceDF, SourceSql or any of our supported options.
  • Choose the transforms you want to apply to your dataset and pass them to run_transforms.
  • You can then use the DataFrame result (available in tcon.df_current), or write to a table or file with SinkData.

Run this code in Jupyter Notebook:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from datamode.interface import *

tcon = run_transforms([
  SourceFile('https://raw.githubusercontent.com/datamode/datasets/master/movies.csv', sample_ratio=1),
  DropNumericalOutliers('@number'),
  CanonicalizeDate('release_date'),
  SinkFile('output.csv'),
])

# The output DataFrame is tcon.df_current.

The above code will fix dates and outliers, write them to output.csv, and visualize your dataset in Jupyter. You can change sample_ratio to load more or less of your dataset.