en-US/about_PSql_Deploy_Seeds.help.txt

about_PSql_Deploy_Seeds
=======================

SHORT DESCRIPTION

A PSql.Deploy seed is a T-SQL script that populates a target database with
data. The Invoke-SqlSeed cmdlet applies seeds to target databases, with
configurable parallelism to support both large databases and many-database
scenarios. PSql.Deploy supports SQL Server, Azure SQL Database, and compatible
databases.


LONG DESCRIPTION

:: Source Directory Structure ::

PSql.Deploy expects seeds to have a specific filesystem layout.

src\ The source directory: a set of migrations and
 │ seeds for one database design. The path can vary.
 │
 ├─ Seeds\ Seeds. Required only if there are any seeds.
 │ │
 │ ├─ TestData\ One seed. The name can vary.
 │ │ │
 │ │ ├─ _Main.sql Top-level script for the seed. It can
 │ │ │ include other files with the :r directive.
 │ │ ├─ FileA.sql Example file included by _Main.sql.
 │ │ ├─ FileB.sql Example file included by _Main.sql.
 │ │ └─ ... More files, subdirectories, etc.
 │ │
 │ └─ ... More seeds.
 │
 ├─ Migrations\ Migrations. Required only if there are any migrations.
 │ └─ ... See about_PSql_Deploy_Migrations for details.
 │
 └─ ... PSql.Deploy does not care about other files or
                          directories present in the source directory.

Given an arbitrary source directory, PSql.Deploy expects to find seeds in a
Seeds subdirectory. Within the Seeds directory, each subdirectory containing a
_Main.sql file is an individual seed. The name of the subdirectory determines
the name of the seed. A seed name must be a valid directory name but is
otherwise unrestricted.

The _Main.sql file is the entry point for the seed. The file is a T-SQL script
with a few extensions to support modularity and parallelism.


:: Seed Script SQLCMD Support ::

PSql.Deploy seed scripts support a limited set of SQLCMD directives. Typical
seeds use these directives to organize code into multiple files, to separate
SQL batches, and to replace hard-coded values with variables.

    GO
        Ends the current SQL batch and begins a new one.

    $(<name>)
        Replaced by the value of the SQLCMD variable <name>.

    :r <file>
        Replaced by the contents of the specified file. The path is relative
        to the current PowerShell directory ($PWD or Get-Location), matching
        SQLCMD.EXE behavior.

        To enable inclusion relative to the seed, PSql.Deploy predefines the
        SQLCMD variable 'Path' as the full path of the directory containing the
        _Main.sql file. Thus the directive

          :r $(Path)\Foo.sql

        includes the file Foo.sql from the same directory as _Main.sql.

        The path may be enclosed within double quotes. This is mandatory if
        the path contains spaces, tabs, or double quotes. To include a double
        quote in the path, use two double quotes. Examples:

          :r "$(Path)\My Script.sql"
          :r "$(Path)/My ""Special"" Script.sql"

        Note that Windows does not support tabs or double quotes in file names.

    :setvar <name> <value>
        Sets the SQLCMD variable <name> to the spacified <value>.

        The value may be enclosed within double quotes. This is mandatory if
        the value contains spaces, tabs, or double quotes. To include a double
        quote in the value, use two double quotes.

          :setvar MyVar MyValue
          :setvar MyVar "My Value"
          :setvar MyVar "My ""Special"" Value"


:: Seed Script Parallelism Directives ::

Seeds typically consist of many statements that could execute in parallel if
the language supported it. For example, inserting rows into different tables
could occur in parallel, provided that all rows necessary to satisfy foreign
key constraints are already present. Parallelism can significantly reduce the
time required to seed large databases.

PSql.Deploy recognizes special directives to enable parallelism within seeds.
These directives are 'magic' comments: line comments beginning with '--#'.

The MODULE directive enables the author to split a seed script into 'modules'
that can execute in parallel. A module is a named chunk of SQL that performs
one unit of seed work, such as populating a particular table with data. For
example, a seed script might contain the following two modules:

    --# MODULE: Customer
    INSERT dbo.Customer ...

    --# MODULE: Product
    INSERT dbo.Product ...

All lines of SQL code after a MODULE directive until the next MODULE directive
(or the end of the file) are part of that module. Any lines of SQL code before
the first MODULE directive are implicitly part of a module named 'init'. Thus
each line of SQL code in a seed script is part of exactly one module.

Module names are author-defined. They appear in logging and error messages.
A module can contain multiple GO-separated batches.

A module can declare dependencies upon other modules. PSql.Deploy's
Invoke-SqlSeed cmdlet executes modules in parallel where possible but
guarantees that each module's dependencies are completed before the module
itself executes.

The dependency model is based on 'topics'. A topic is an author-defined name
that represents some facet of a seed's work that PSql.Deploy should track.
Each module 'provides' one or more topics, meaning that the module contributes
to the work represented by each topic. A module always provides its own name
as a topic and can use the PROVIDES directive to declare additional provided
topics. A module can 'require' one or more topics via the REQUIRES directive.
When a module requires a topic, the module will not execute until that topic is
complete.

For example, consider a seed that populates three tables: Customer, Product,
and Order. The Order table has foreign keys to both Customer and Product.
Customer and Product can be populated in parallel, but Order cannot be
populated until both Customer and Product are complete. The seed script might
approach this as follows:

    --# MODULE: Customer
    INSERT dbo.Customer ...

    --# MODULE: Product
    INSERT dbo.Product ...

    --# MODULE: Order
    --# REQUIRES: Customer Product
    INSERT dbo.Order ...

Another approach is to use a topic to represent the concept of 'master data':

    --# MODULE: Customer
    --# PROVIDES: MasterData
    INSERT dbo.Customer ...

    --# MODULE: Product
    --# PROVIDES: MasterData
    INSERT dbo.Product ...

    --# MODULE: Order
    --# REQUIRES: MasterData
    INSERT dbo.Order ...

Invoke-SqlSeed executes seed modules in parallel by creating multiple worker
tasks. Each worker repeatedly picks the next ready module and executes it on
the worker's connection to the target database. In rare cases, an author might
want a seed module to execute on every worker. For example, an author might
want to set connection options (like NOCOUNT) or to create a temporary table.
The WORKER directive enables this behavior. To perform connection setup:

    --# MODULE: InitWorker
    --# WORKER: all
    SET NOCOUNT ON; -- This happens in every connection

    --# MODULE: Customer
    --# REQUIRES: InitWorker
    INSERT dbo.Customer ...

Seed directive reference:

    --# MODULE: <name> [<topic> ...]
        Starts a new module with the specified name, optionally declaring extra
        provided topics. The module name itself is a provided topic.
    
    --# PROVIDES: <topic> [<topic> ...]
        Indicates that the current module provides the specified topics.
    
    --# REQUIRES: <topic> [<topic> ...]
        Indicates that the current module requires the specified topics.
    
    --# WORKER: all|any
        Specifies worker execution mode: 'all' means that a copy of the module
        will run on every worker; 'any' (the default) means that the module
        executes once on any single available worker.


:: See Also ::

- about_PSql_Deploy
- Invoke-SqlSeed
- https://github.com/sharpjs/PSql.Deploy