Docker – Knowledge Article

Use SMALLer base image – debian:jessie. Do not upgrade inside container.

Docker works based on layers – every instruction commits a new layer. Docker also maintains cache so it can reuse the layers – Considering this keep all your base instructions as common as possible. Sometimes it can go other way that even when you want to update – Docker using cache won’t really update. While building image use no-cache option “docker build –no-cache”.

Difference between ADD and COPY

  • COPY is most preferred over ADD.
  • While using ADD source can be a URL.
  • ADD also extracts some of the known archive file types ( tar, bzip ) into the destination directory.
  • But REMEMBER If you are downloading .tar.gz file from URL using ADD it won’t extract – It just downloads. So the best way is do RUN + wget or curl and pipe to tar -xzC for unarchiving.


  • In general RUN is used for installing packages and to execute commands.
  • ENTRYPOINT is to make the docker as an executable/binary.
  • Always use the array syntax when using CMD and ENTRYPOINT.
  • If there are multiple CMD commands in Dockerfile  only the last one is executed.

This part is copy & paste from Michael Crosby ( full credit to the author ). I am keeping it here for my quick reference.

  • In case you don’t know ENTRYPOINT makes your dockerized application behave like a binary. You can pass arguments to the ENTRYPOINT during docker run and not worry about it being overwritten ( unlike CMD ). ENTRYPOINT is even better when used with CMD. Let’s checkout my Rethinkdb Dockerfile and see how to use this.
    # Dockerfile for Rethinkdb 
    FROM ubuntu
    MAINTAINER Michael Crosby <>
    RUN echo "deb precise main universe" > /etc/apt/sources.list
    RUN apt-get update
    RUN apt-get upgrade -y
    RUN apt-get install -y python-software-properties
    RUN add-apt-repository ppa:rethinkdb/ppa
    RUN apt-get update
    RUN apt-get install -y rethinkdb
    # Rethinkdb process
    EXPOSE 28015
    # Rethinkdb admin console
    EXPOSE 8080
    # Create the /rethinkdb_data dir structure
    RUN /usr/bin/rethinkdb create
    ENTRYPOINT ["/usr/bin/rethinkdb"]
    CMD ["--help"]

    This is everything that is required to get Rethinkdb dockerized. We have my standard 5 lines at the top to make sure the base image is updated, ports exposed, etc… With the ENTRYPOINT set, we know that whenever this image is run, all arguments passed during docker run will be arguments to the ENTRYPOINT ( /usr/bin/rethinkdb ).

    I also have a default CMD set in the Dockerfile to --help. What this does is incase no arguments are passed during docker run, rethinkdb’s default help output will display to the user. This is same functionality that you would expect interacting with the rethinkdb binary.

    docker run crosbymichael/rethinkdb


    Running 'rethinkdb' will create a new data directory or use an existing one,
      and serve as a RethinkDB cluster node.
    File path options:
      -d [ --directory ] path           specify directory to store data and metadata
      --io-threads n                    how many simultaneous I/O operations can happen
                                        at the same time
    Machine name options:
      -n [ --machine-name ] arg         the name for this machine (as will appear in
                                        the metadata).  If not specified, it will be
                                        randomly chosen from a short list of names.
    Network options:
      --bind {all | addr}               add the address of a local interface to listen
                                        on when accepting connections; loopback
                                        addresses are enabled by default
      --cluster-port port               port for receiving connections from other nodes
      --driver-port port                port for rethinkdb protocol client drivers
      -o [ --port-offset ] offset       all ports used locally will have this value
      -j [ --join ] host:port           host and port of a rethinkdb node to connect to

    Now lets run the container with the --bind all argument.

    docker run crosbymichael/rethinkdb --bind all


    info: Running rethinkdb 1.7.1-0ubuntu1~precise (GCC 4.6.3)...
    info: Running on Linux 3.2.0-45-virtual x86_64
    info: Loading data from directory /rethinkdb_data
    warn: Could not turn off filesystem caching for database file: "/rethinkdb_data/metadata" (Is the file located on a filesystem that doesn't support direct I/O (e.g. some encrypted or journaled file systems)?) This can cause performance problems.
    warn: Could not turn off filesystem caching for database file: "/rethinkdb_data/auth_metadata" (Is the file located on a filesystem that doesn't support direct I/O (e.g. some encrypted or journaled file systems)?) This can cause performance problems.
    info: Listening for intracluster connections on port 29015
    info: Listening for client driver connections on port 28015
    info: Listening for administrative HTTP connections on port 8080
    info: Listening on addresses:,
    info: Server ready
    info: Someone asked for the nonwhitelisted file /js/handlebars.runtime-1.0.0.beta.6.js, if this should be accessible add it to the whitelist.

    And there it is, a full Rethinkdb instance running with access to the db and admin console by, interacting with the image the same way you interact with the binary. Very powerful and yet extremely simple. I love simple.









Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s