Environment Management
Immutable data science environments
The following cookbook example uses the R and Python environment management features described in the Posit Connect Admin Guide. We will also use Connect’s off-host execution mode for this example to demonstrate how to create an immutable and reproducible data science environment.
The environment management features used in this guide are also applicable for Connect’s local execution mode. Off-host execution is not required to use the R and Python environment management features.
This example demonstrates how to use the official rstudio/r-session-complete image, which is suitable for running content on Posit Workbench, when deploying content to Posit Connect. By re-using this image, we can ensure that the exact same packages that were used when developing our content in Posit Workbench are used when executing our content on Posit Connect.
The source code for the content used in this example can be found here.
Prerequisites
Completing this cookbook example requires the following:
- a Posit Connect installation configured to use off-host execution
- a Posit Connect API key with the Administrator role
- push access to a container registry
Create the image
First, we define our image which can be used to develop our content on Posit Workbench, and then later is also used to execute our content on Posit Connect. We are using the r-session-complete image as the base and we are installing additional R and Python packages that are required by our content.
Dockerfile
FROM ghcr.io/rstudio/r-session-complete:jammy-2023.06.1--cd1a0c5
ARG GIT_SHA="4e4be3f59f0fbcf3ccecc724a00b0da7a4ad6f07"
ARG CRAN_MIRROR="https://p3m.dev/cran/__linux__/jammy/latest"
ARG PYPI_MIRROR="https://p3m.dev/pypi/latest/simple"
# Install the python packages
# This commands installs the python packages defined in the requirements.txt
# which pins the package versions and provides an immutable set of Python dependencies.
RUN pip install --upgrade pip && \
curl -sSfL https://raw.githubusercontent.com/sol-eng/python-examples/${GIT_SHA}/reticulated-image-classifier/requirements.txt \
-o /tmp/requirements.txt && \
pip install --default-timeout=1000 --index-url=${PYPI_MIRROR} -r /tmp/requirements.txt && \
rm /tmp/requirements.txt
# Install the R packages
ENV RENV_PATHS_LIBRARY renv/library
RUN R -e $"install.packages('renv', repos = c(CRAN = '${CRAN_MIRROR}'))" && \
curl -sSfL https://raw.githubusercontent.com/sol-eng/python-examples/${GIT_SHA}/reticulated-image-classifier/renv.lock \
-o /tmp/renv.lock && \
R -e $"renv::restore(lockfile='/tmp/renv.lock', repos = c(CRAN = '${CRAN_MIRROR}'))" && \
rm /tmp/renv.lockBuild the image with:
# use a container registry that you have push access to
CONTAINER_REGISTRY="myorg/myrepo"
# build the image
docker build . -t ${CONTAINER_REGISTRY}/image-classifier:jammy
# push it to your registry
docker push ${CONTAINER_REGISTRY}/image-classifier:jammyAdd the execution environment
Next, we use the Connect Server API POST /v1/environments endpoint to create a new execution environment. This execution environment can then be used by content.
The value for matching in the environment created is exact. This indicates that the environment should only be used if it is explicitly requested by a piece of content. Connect never chooses this environment during automatic selection.
Creating an environment via the /v1/environments API endpoint requires the Administrator role.
curl -XPOST -H "Authorization: key ${CONNECT_API_KEY}" ${CONNECT_SERVER}/__api__/v1/environments \
--data '{
"title": "Custom image classifier",
"description": "My custom image classifier environment",
"cluster_name": "Kubernetes",
"name": "'${CONTAINER_REGISTRY}'/image-classifier:jammy",
"matching": "exact",
"r": {
"installations": [
{
"version": "4.2.3",
"path": "/opt/R/4.2.3/bin/R"
}
]
},
"python": {
"installations": [
{
"version": "3.9.14",
"path": "/opt/python/3.9.14/bin/python"
}
]
}
}'Deploying the content
First, create a new content item using the Posit Connect Server API. The request payload specifies initial values for default_image_name, default_r_environment_management, and default_py_environment_management. By setting default_image_name during the initial deployment, we ensure that Connect uses our custom image the first time the content builds during the deployment. We specify false for both default_r_environment_management and default_py_environment_management so that Connect does not attempt to install any R or Python packages during the first build and when the content executes, it uses the packages that are installed on the image instead of looking for packages in the R/Python package cache.
curl -XPOST -H "Authorization: key ${CONNECT_API_KEY}" ${CONNECT_SERVER}/__api__/v1/content \
--data '{
"name": "my-image-classifier-app",
"default_image_name": "'${CONTAINER_REGISTRY}'/image-classifier:jammy",
"default_r_environment_management": false,
"default_py_environment_management": false
}'Make a note of the guid in the server response. We use this as our CONTENT_GUID later when we deploy our application.
Next, we need to clone the content to our workstation and create a content bundle so that we can publish it to the Connect server.
# clone the repo
git clone https://github.com/sol-eng/python-examples.git
git checkout -b connect-custom-execution-env 4e4be3f59f0fbcf3ccecc724a00b0da7a4ad6f07
cd python-examples
# create the content bundle
tar czvf bundle.tar.gz -C ./reticulated-image-classifier ./
# upload the content bundle to Posit Connect
curl -XPOST -H "Authorization: key ${CONNECT_API_KEY}" ${CONNECT_SERVER}/__api__/v1/content/${CONTENT_GUID}/bundles \
--data-binary @"bundle.tar.gz"Make a note of the id in the server response. We use this as our BUNDLE_ID in the next step.
Now we can activate the bundle to complete the content deployment.
curl -XPOST -H "Authorization: key ${CONNECT_API_KEY}" ${CONNECT_SERVER}/__api__/v1/content/${CONTENT_GUID}/deploy \
--data '{
"bundle_id": "'${BUNDLE_ID}'"
}'The server logs should indicate that the content requests our custom image and that there is no package installation required for this deployment:
time="2023-09-05T20:38:17.782Z" level=info msg="Bundle created with R version 4.2.3 and Python version 3.9.14 is compatible with environment Kubernetes::myorg/myrepo/image-classifier:jammy with R version 4.2.3 from /opt/R/4.2.3/bin/R and Python version 3.9.14 from /opt/python/3.9.14/bin/python " bundle_id=24 content_guid=3578a80e-3150-417d-b24f-8c56b9a8beae content_id=20 correlation_id=e062c25c-7f18-403f-b28f-72e9d128492d
time="2023-09-05T20:38:17.782Z" level=info msg="Bundle requested no R environment restore; Connect will not perform any R package installation." bundle_id=24 content_guid=3578a80e-3150-417d-b24f-8c56b9a8beae content_id=20 correlation_id=e062c25c-7f18-403f-b28f-72e9d128492d
time="2023-09-05T20:38:17.782Z" level=info msg="Bundle requested no Python environment restore; Connect will not perform any Python package installation." bundle_id=24 content_guid=3578a80e-3150-417d-b24f-8c56b9a8beae content_id=20 correlation_id=e062c25c-7f18-403f-b28f-72e9d128492d
time="2023-09-05T20:38:17.785Z" level=info msg="Launching Shiny application..." bundle_id=24 content_guid=3578a80e-3150-417d-b24f-8c56b9a8beae content_id=20 correlation_id=e062c25c-7f18-403f-b28f-72e9d128492d
The image classifier application should now be fully published and available through the Posit Connect dashboard.