Skip to content

Data Science

Recommendations

  • Data sets as container images
    • Use a container to store your data set
    • Serve via http protocol - s3, webdav
      • minio, rclone serve, httpd
  • Push to public repo (quay.io, ghcr.io)

TODO

  • Build an example of containerizing a data set
RC_USER=admin
RC_PASS=rclone

cat > rclone.conf <<CONFIG
[local]
type = local
CONFIG

podman run \
  -it \
  --rm \
  -v $(pwd):/data \
  -p 8080:8080 \
  docker.io/rclone/rclone \
    rcd \
    --config /tmp/rclone.conf \
    --rc-web-gui \
    --rc-addr :8080 \
    --rc-user ${RC_USER} \
    --rc-pass ${RC_PASS}