Public/FileServer/Expand-VmTarball.ps1

<#
.SYNOPSIS
    Stages a host-side gzipped tarball to a Hyper-V VM and extracts it
    into <Destination> atomically under sudo.
 
.DESCRIPTION
    Single-round-trip primitive that joins the existing FileServer family
    (Add-VmFileServerFile, Invoke-WithVmFileServer) with a remote
    extract-then-swap step. The cmdlet performs the host-side stage, then
    one SSH call that:
 
      1. Creates a sibling tempdir under the destination's parent
         (`<parent>/.expand.XXXXXX`).
      2. Streams the tarball over HTTP and pipes the bytes into
         `sudo tar -xzf -` with the caller-supplied `--strip-components`.
      3. Removes any existing object at <Destination> (file, symlink, or
         directory tree).
      4. Renames the tempdir to <Destination>.
 
    The mktemp + extract + mv sequence is what makes the swap atomic from
    any observer's point of view: <Destination> either points at the old
    tree or the freshly extracted tree, never at a half-populated dir.
    The cmdlet does NOT install a trap for crash cleanup - if the remote
    script is killed between mktemp and mv, the tempdir is left as a
    sibling and the next clean run finds <Destination> unchanged. That
    is intentional: making cleanup the caller's problem keeps the
    primitive single-purpose and is what the integration test suite
    verifies.
 
    Skip-unchanged is gated by a marker file
    (`<Destination>/.infra-hyperv-tarball.sha256`) that records the
    SHA-256 of the source tarball bytes. The digest is computed
    host-side once per call so the VM never has to read the whole
    tarball back. On a match the remote script exits before any
    `curl` / `tar` / `mv`. The marker is itself written into the
    fresh tempdir before the final rename, so the dir-swap leaves
    <Destination> in a state where the next call can short-circuit.
    Pass -NoSkipUnchanged to force a re-extract (the marker is still
    written, so subsequent default calls can short-circuit again).
 
    Path validation is host-side and runs before any staging or SSH:
    <Destination> must be a non-empty absolute path with no `..`
    segments, no NUL byte, and no single quote (the remote script
    embeds it inside a single-quoted bash assignment, matching the rest
    of the module). <TarballPath> must exist on the host so the
    Add-VmFileServerFile call cannot fail late in the flow.
    <StripComponents> is a non-negative integer that flows through to
    `tar --strip-components` verbatim.
 
.PARAMETER SshClient
    A live Renci.SshNet.SshClient. The caller owns the client's
    lifecycle - this function neither connects nor disposes it.
 
.PARAMETER Server
    The file server handle returned by Start-VmFileServer (or received as
    the script-block argument from Invoke-WithVmFileServer). Forwarded
    verbatim to Add-VmFileServerFile to stage <TarballPath>.
 
.PARAMETER TarballPath
    Absolute path on the Windows host to a gzipped tar archive
    (e.g. `E:\cache\jdk-21.tar.gz`). The file is staged into the live
    server and pulled down over HTTP by the VM.
 
.PARAMETER Destination
    Absolute path on the VM where the extracted tree should live. The
    parent directory is created if missing; the destination itself is
    replaced atomically (rm -rf then mv from a sibling tempdir).
 
.PARAMETER StripComponents
    Non-negative integer passed to `tar --strip-components`. Defaults to
    0 (no stripping). Set to 1 to discard a single wrapper directory
    inside the tarball.
 
.PARAMETER NoSkipUnchanged
    Forces the extract path even when the on-VM marker file already
    records the host-computed SHA-256 of the source tarball. Off by
    default - the skip-unchanged branch produces identical observable
    state at lower cost. Use this switch when callers want to be sure
    the destination's mtime advances or when recovering from
    out-of-band tampering inside <Destination>.
 
.EXAMPLE
    Invoke-WithVmFileServer -VmIpAddress '10.10.0.50' -ScriptBlock {
        param($server)
        Expand-VmTarball -SshClient $ssh -Server $server `
            -TarballPath 'E:\cache\jdk-21.tar.gz' `
            -Destination '/opt/jdk-21' -StripComponents 1
    }
 
.NOTES
    On-VM commands run under sudo so the function can write to
    privileged locations regardless of which user the SSH client
    authenticated as. The caller is responsible for ensuring that user
    has password-less sudo.
#>

function Expand-VmTarball {
    [CmdletBinding()]
    param(
        [Parameter(Mandatory)]
        [object] $SshClient,

        [Parameter(Mandatory)]
        [PSCustomObject] $Server,

        [Parameter(Mandatory)]
        [string] $TarballPath,

        [Parameter(Mandatory)]
        [string] $Destination,

        [Parameter()]
        [int] $StripComponents = 0,

        [switch] $NoSkipUnchanged
    )

    # Host-side validation. Runs before any staging / SSH so malformed
    # input never reaches the wire and the file server is not asked to
    # stage a file we will refuse to use. The path rules match the rest
    # of the module: absolute, no `..`, no NUL, no single quote (the
    # value embeds into a single-quoted bash assignment in the emitted
    # script).
    if ([string]::IsNullOrEmpty($Destination)) {
        throw "Expand-VmTarball: -Destination must be a non-empty string."
    }
    if (-not $Destination.StartsWith('/')) {
        throw ("Expand-VmTarball: -Destination '$Destination' must be an " +
            "absolute path (start with '/').")
    }
    if ($Destination.Contains([char]0)) {
        throw "Expand-VmTarball: -Destination contains a NUL byte."
    }
    if ($Destination.Contains("'")) {
        throw ("Expand-VmTarball: -Destination '$Destination' contains a " +
            "single quote, which is not allowed.")
    }
    if ($Destination.Split('/') -contains '..') {
        throw ("Expand-VmTarball: -Destination '$Destination' contains a " +
            "'..' segment.")
    }
    if ($StripComponents -lt 0) {
        throw ("Expand-VmTarball: -StripComponents must be non-negative " +
            "(got $StripComponents).")
    }
    if (-not (Test-Path -LiteralPath $TarballPath)) {
        throw "Expand-VmTarball: -TarballPath '$TarballPath' does not exist on the host."
    }

    $vmHost = if ($SshClient.PSObject.Properties['ConnectionInfo'] -and $SshClient.ConnectionInfo) {
        $SshClient.ConnectionInfo.Host
    } else { '(unknown)' }

    # Host-side SHA-256 of the tarball bytes. Computed once per call so
    # the VM never has to read the whole archive back; the digest is
    # embedded as a bash literal in both the skip-unchanged check and
    # the marker write. Lower-case hex matches the form that
    # `sha256sum` would print, so future on-VM diagnostics can compare
    # by eye.
    $digest = (Get-FileHash -LiteralPath $TarballPath -Algorithm SHA256).Hash.ToLowerInvariant()

    # Stage the tarball through the live file server. This must come
    # after host-side validation so a malformed Destination does not
    # leave a staged copy behind.
    $url = Add-VmFileServerFile -Server $Server -LocalPath $TarballPath

    # The marker filename starts with a dot so `ls` inside <Destination>
    # does not surface it by default, and the prefix namespaces it to
    # this module so consumers can spot which tool owns it. The marker
    # lives at the top level of <Destination> so the dir-swap moves it
    # atomically with the rest of the tree.
    $markerName = '.infra-hyperv-tarball.sha256'

    # Skip-unchanged pre-check: if the marker file under <Destination>
    # records the same digest as the source tarball, exit before any
    # curl / tar / mv. Suppressed by -NoSkipUnchanged.
    #
    # Both the existence test AND the read happen under sudo because
    # the destination dir inherits mktemp's 0700 root mode from the
    # previous extract - a plain `[ -f ]` would silently return false
    # for any non-root caller (no search permission on the parent),
    # and skip-unchanged would never fire. Routing the read through
    # `sudo cat 2>/dev/null || true` doubles as the existence test:
    # a missing-or-unreadable marker yields an empty string, which
    # the digest comparison treats as "no match". command-substitution
    # strips a trailing newline from the marker, which matches how
    # the marker is written (printf '%s\n').
    $skipBlock = if ($NoSkipUnchanged) { '' } else {
@"
 
marker="`$destination/$markerName"
existing_digest="`$(sudo cat "`$marker" 2>/dev/null || true)"
if [ -n "`$existing_digest" ] && [ "`$existing_digest" = "`$desired_digest" ]; then
    exit 0
fi
"@

    }

    # The mktemp template lives next to <Destination> (same filesystem)
    # so the final `mv` is a single rename inode operation rather than
    # a cross-device copy. The leading dot keeps the partial tree out
    # of casual `ls` output while it is being populated. The marker
    # write goes into the fresh tempdir (no observer, no atomicity
    # concern) so the helper's atomic-write tail would be overkill
    # here - the dir-swap itself is what makes the marker land
    # atomically alongside the new tree.
    $script = @"
set -euo pipefail
destination='$Destination'
url='$url'
strip='$StripComponents'
desired_digest='$digest'
marker_name='$markerName'
parent="`$(dirname "`$destination")"
sudo mkdir -p "`$parent"$skipBlock
tmpdir="`$(sudo mktemp -d "`$parent/.expand.XXXXXX")"
curl -fsSL "`$url" | sudo tar -xzf - -C "`$tmpdir" --strip-components="`$strip"
printf '%s\n' "`$desired_digest" | sudo tee "`$tmpdir/`$marker_name" >/dev/null
if [ -e "`$destination" ] || [ -L "`$destination" ]; then
    sudo rm -rf -- "`$destination"
fi
sudo mv "`$tmpdir" "`$destination"
"@


    # Windows PowerShell here-strings use CRLF; remote bash interprets
    # the trailing \r as part of the token. Normalise to LF, same as
    # the rest of the module.
    $script = $script -replace "`r`n", "`n"

    $result = Invoke-SshClientCommand -SshClient $SshClient -Command $script

    if ($result.ExitStatus -eq 0) { return }

    throw ("Expand-VmTarball failed (vm: $vmHost, tarball: $TarballPath, " +
        "destination: $Destination, exit $($result.ExitStatus)). " +
        "stdout: $($result.Output) stderr: $($result.Error)")
}