PSql.Deploy

3.0.0-pre08

en-US/about_PSql_Deploy_Seeds.help.txt

                                about_PSql_Deploy_Seeds

=======================

SHORT DESCRIPTION

A PSql.Deploy seed is a T-SQL script that populates a target database with

data.  The Invoke-SqlSeed cmdlet applies seeds to target databases, with

configurable parallelism to support both large databases and many-database

scenarios.  PSql.Deploy supports SQL Server, Azure SQL Database, and compatible

databases.

LONG DESCRIPTION

:: Source Directory Structure ::

PSql.Deploy expects seeds to have a specific filesystem layout.

src\                    The source directory: a set of migrations and

 │                        seeds for one database design.  The path can vary.

 │

 ├─ Seeds\              Seeds.  Required only if there are any seeds.

 │   │

 │   ├─ TestData\       One seed.  The name can vary.

 │   │   │

 │   │   ├─ _Main.sql   Top-level script for the seed.  It can

 │   │   │                include other files with the :r directive.

 │   │   ├─ FileA.sql   Example file included by _Main.sql.

 │   │   ├─ FileB.sql   Example file included by _Main.sql.

 │   │   └─ ...         More files, subdirectories, etc.

 │   │

 │   └─ ...             More seeds.

 │

 ├─ Migrations\         Migrations.  Required only if there are any migrations.

 │   └─ ...             See about_PSql_Deploy_Migrations for details.

 │

 └─ ...                 PSql.Deploy does not care about other files or

                          directories present in the source directory.

Given an arbitrary source directory, PSql.Deploy expects to find seeds in a

Seeds subdirectory.  Within the Seeds directory, each subdirectory containing a

_Main.sql file is an individual seed.  The name of the subdirectory determines

the name of the seed.  A seed name must be a valid directory name but is

otherwise unrestricted.

The _Main.sql file is the entry point for the seed.  The file is a T-SQL script

with a few extensions to support modularity and parallelism.

:: Seed Script SQLCMD Support ::

PSql.Deploy seed scripts support a limited set of SQLCMD directives.  Typical

seeds use these directives to organize code into multiple files, to separate

SQL batches, and to replace hard-coded values with variables.

    GO

        Ends the current SQL batch and begins a new one.

    $(<name>)

        Replaced by the value of the SQLCMD variable <name>.

    :r <file>

        Replaced by the contents of the specified file.  The path is relative

        to the current PowerShell directory ($PWD or Get-Location), matching

        SQLCMD.EXE behavior.

        To enable inclusion relative to the seed, PSql.Deploy predefines the

        SQLCMD variable 'Path' as the full path of the directory containing the

        _Main.sql file.  Thus the directive

          :r $(Path)\Foo.sql

        includes the file Foo.sql from the same directory as _Main.sql.

        The path may be enclosed within double quotes.  This is mandatory if

        the path contains spaces, tabs, or double quotes.  To include a double

        quote in the path, use two double quotes.  Examples:

          :r "$(Path)\My Script.sql"

          :r "$(Path)/My ""Special"" Script.sql"

        Note that Windows does not support tabs or double quotes in file names.  

    :setvar <name> <value>

        Sets the SQLCMD variable <name> to the spacified <value>.

        The value may be enclosed within double quotes.  This is mandatory if

        the value contains spaces, tabs, or double quotes.  To include a double

        quote in the value, use two double quotes.

          :setvar MyVar MyValue

          :setvar MyVar "My Value"

          :setvar MyVar "My ""Special"" Value"

:: Seed Script Parallelism Directives ::

Seeds typically consist of many statements that could execute in parallel if

the language supported it.  For example, inserting rows into different tables

could occur in parallel, provided that all rows necessary to satisfy foreign

key constraints are already present.  Parallelism can significantly reduce the

time required to seed large databases.

PSql.Deploy recognizes special directives to enable parallelism within seeds.

These directives are 'magic' comments: line comments beginning with '--#'.

The MODULE directive enables the author to split a seed script into 'modules'

that can execute in parallel.  A module is a named chunk of SQL that performs

one unit of seed work, such as populating a particular table with data.  For

example, a seed script might contain the following two modules:

    --# MODULE: Customer

    INSERT dbo.Customer ...

    --# MODULE: Product

    INSERT dbo.Product ...

All lines of SQL code after a MODULE directive until the next MODULE directive

(or the end of the file) are part of that module.  Any lines of SQL code before

the first MODULE directive are implicitly part of a module named 'init'.  Thus

each line of SQL code in a seed script is part of exactly one module.

Module names are author-defined.  They appear in logging and error messages.

A module can contain multiple GO-separated batches.

A module can declare dependencies upon other modules.  PSql.Deploy's

Invoke-SqlSeed cmdlet executes modules in parallel where possible but

guarantees that each module's dependencies are completed before the module

itself executes.

The dependency model is based on 'topics'.  A topic is an author-defined name

that represents some facet of a seed's work that PSql.Deploy should track.

Each module 'provides' one or more topics, meaning that the module contributes

to the work represented by each topic.  A module always provides its own name

as a topic and can use the PROVIDES directive to declare additional provided

topics.  A module can 'require' one or more topics via the REQUIRES directive.

When a module requires a topic, the module will not execute until that topic is

complete.

For example, consider a seed that populates three tables: Customer, Product,

and Order.  The Order table has foreign keys to both Customer and Product.

Customer and Product can be populated in parallel, but Order cannot be

populated until both Customer and Product are complete.  The seed script might

approach this as follows:

    --# MODULE: Customer

    INSERT dbo.Customer ...

    --# MODULE: Product

    INSERT dbo.Product ...

    --# MODULE:   Order

    --# REQUIRES: Customer Product

    INSERT dbo.Order ...

Another approach is to use a topic to represent the concept of 'master data':

    --# MODULE:   Customer

    --# PROVIDES: MasterData

    INSERT dbo.Customer ...

    --# MODULE:   Product

    --# PROVIDES: MasterData

    INSERT dbo.Product ...

    --# MODULE:   Order

    --# REQUIRES: MasterData

    INSERT dbo.Order ...

Invoke-SqlSeed executes seed modules in parallel by creating multiple worker

tasks.  Each worker repeatedly picks the next ready module and executes it on

the worker's connection to the target database.  In rare cases, an author might

want a seed module to execute on every worker.  For example, an author might

want to set connection options (like NOCOUNT) or to create a temporary table.

The WORKER directive enables this behavior.  To perform connection setup:

    --# MODULE: InitWorker

    --# WORKER: all

    SET NOCOUNT ON; -- This happens in every connection

    --# MODULE:   Customer

    --# REQUIRES: InitWorker

    INSERT dbo.Customer ...

Seed directive reference:

    --# MODULE: <name> [<topic> ...]

        Starts a new module with the specified name, optionally declaring extra

        provided topics.  The module name itself is a provided topic.

    --# PROVIDES: <topic> [<topic> ...]

        Indicates that the current module provides the specified topics.

    --# REQUIRES: <topic> [<topic> ...]

        Indicates that the current module requires the specified topics.

    --# WORKER: all|any

        Specifies worker execution mode: 'all' means that a copy of the module

        will run on every worker; 'any' (the default) means that the module

        executes once on any single available worker.

:: See Also ::

- about_PSql_Deploy

- Invoke-SqlSeed

- https://github.com/sharpjs/PSql.Deploy