Prelim AWS Notes
Tue, Nov 22, 20161. Create root account
Browse to https://aws.amazon.com/ and click "Create an AWS Account" button to go through root account setup and activation and establish billing.
1.1. IAM roles/sub-accounts
Once a root account is established, sub-accounts and roles can be set up and managed within the shared context of the overarching root account. Some activities trigger warnings about security if you try to do them in the root account but, the root account does not seem to actually be restricted from doing anything.
1.2. Console Page
Logging in, either as root or as a subordinate IAM role, puts you on the "Console Page". After the first visit, the "Recently Visited" panel will have links to services/dashboards you have gone to before but, the search bar to top left can be used to search for services such as "EC2" and "IAM" and browse to their dashboards.
1.3. Security Credentials/access keys (First time setup)
To the top right of the browser window, there are drop-down lists for region selection and user account information. Click the drop down farthest right which should be displaying your login account name and select "Security Credentials".
In the security credentials dashboard, look for the "Access keys" subsection. In here you will create an access key set and download a csv file containing this information to be read into "aws configure" when you launch an instance. Access keys for subordinate accounts are managed similarly as accounts are established.
2. IAM Roles and S3 access permissions
From the search bar at the top left again, search for "IAM" and browse to the IAM Dashboard. Under "IAM resources", the space for "Roles" should be 0 to start. Click the link attached to whatever number is there to go to the Roles dashboard. Here you need to set up an IAM role to allow S3 access so that either the aws cli or julia routines will be able to see the Chirp buckets.
Click "Create Role" and select "AWS service" for Trusted entity type and "EC2" for Use case. Clicking "Next" will take you to the "Add Permissions" page. Search for "S3" or straight for "AWSS3FullAccess" and then select the "AWSS3FullAccess" policy and click "next".
Give the new role a name and click "Create role" at bottom of page. You should get sent back to the IAM > Roles page and should see the new role listed and available for selection later to attach to instances that you launch.
3. Launch an instance
You can now use the search bar to look for "EC2" and go to the EC2 dashboard and launch an instance.
3.1. Set region (IMPORTANT)
BUT FIRST: set your region in the top right next to your account name. Instances are tied to regions so there is no switching of regions once you launch. You can change IAM roles on a running instance but not regions. For Chirp data, select "US West (Oregon) us-west-2" to match the description in the EarthData Chirp Cloud access information (e.g. here).
Now, you can launch an instance by pressing the "Launch Instance" button. Give the instance a name.
3.2. Select OS
I would recommend the Amazon Linux as the others do not seem to have the awscli installed by default so Amazon Linux saves an installation step.
3.3. Instance Type
The default t2.micro instance type is sufficient for basic testing and demo but does not have enough memory to test heavy file processing.
3.4. Key pair (login)
Click "create new key pair" to generate a key pair for ssh. This gets saved as part of your AWS acount info and can be reused on multiple instances so it does not need to be done for every launch.
3.5. Network setting
default values fine
3.6. Configure storage
default values fine for basic testing
3.7. Advanced details
Here is the first place you can add the S3 access permissions IAM role created above. In the IAM instance profile drop down box, the role defined above should be listed and can be selected. The rest of the details can be left default.
Then, press "Launch instance" to the right. This should take you to the Instances dashboard. and show the new instance as starting up.
4. Instance dashboard
In the instance dashboard, you can select the details on the new instance. If instances get created and then stopped instead of terminated, they will be kept in the dashboard and are available to reactivate.
4.1. Actions
to the top right there is a link for "Actions". If you launch an instance without defining the S3 access profile and need to apply it or create a new policy that you want to apply to a running instance, it can be done here. Look for "Actions > Security > Modify IAM role"
4.2. Instance state
The instance state link and drop down allows one to start/stop/terminate selected instances.
4.3. Instance ID
Once an instance exists, clicking on it's instance ID gives you the instance summary showing details for the instance. Good place to check that the IAM role for S3 access is applied, the region is correct, etc.
4.4. Connect to instance
Once an instance is running, the "connect" box at the top of both the instance dashboard and the instance summary becomes clickable. Clicking that brings up a "Connect to instance" window showing options for connecting the instance. I typically just copy/paste the ssh example at the bottom of the "SSH Client" tab to connect from my local terminal session or strowinteract. The key-pair file produced earlier and referenced in the "-i" option to ssh has to be available in the local directory where you are executing ssh from or edit the command for it's full path on your machine.
5. Logged in
5.1. Fresh instance
A fresh instance of any of the Linux OS comes with editors vi and nano installed. On Amazon linux instances, the AWS CLI is installed by default but, it's the earlier version 1. On other linux options, the CLI has to be installed. Amazon linux instances use the "yum" package manager, Ubuntu instances have "aptitude" and "snap" installed.
Apart from awscli on the amazon OS instances, installs appear to be base linux installs so emacs, gcc, julia, etc have to be installed.
*Note: default awscli installation is version 1 but default documentation online is for version 2. I don't know why Amazon is doing this but it is quite annoying. To get to version 2, one has to uninstall version 1 and then install 2. Updating awscli via the package manager just gets you a newer revision of version 1. Not something to deal with now.
5.1.1. emacs
- Amazon OS
- sudo yum emacs
- Ubuntu
- sudo snap install emacs –classic
5.1.2. julia
Instructions for julia v1.6.6.
# download julia tarball wget https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.6-linux-x86_64.tar.gz # untar julia tarball tar zxvf julia-1.6.6-linux-x86_64.tar.gz # remove tarball to free up space rm julia-1.6.6-linux-x86_64.tar.gz # move julia directory out to /opt sudo mv julia-1.6.6 /opt/julia export PATH=$PATH:/opt/julia/bin
5.1.3. Populate AWS configuration
scp the csv file of user credentials saved previously over to the running instance.
Since we haven't taken time to update awscli to version 2, we can't just import the csv file directly and have to do a little manual copy/paste of configuration.
The csv credentials file should contain two fields: Access key ID and Secret access key. Run "aws configure" and populate with the first two value prompts with the fields from the csv file. The next prompt is for "Default region name" which should be set to "us-west-2" to match the region set for the instance and which matches the region for the Chirp S3 store. The last prompt I usually fill with "text".
This creates the directory ~/.aws and populates two files within it: config and credentials. The access_key values are in "credentials". The temporary values gotten from EarthData "Get AWS S3 Credentials" will go into the credentials file. For now by hand.
5.1.4. EarthData Temporary credentials
Once those temporary credentials are available, edit the credentials file. It should start with two fields: aws_access_key and aws_secret_access_key. Replace the values for those fields with the corresponding fields from the temporary credentials. The additional two fields in the temporary credentials should be put in as values to keys: aws_session_token and aws_expiration.
5.1.5. aws cli "directory" listing
at this point,
aws s3 ls s3://gesdisc-cumulus-prod-protected/CHIRP/SNDR13CHRP1AQCal.2/
should return a list of year directories. I also see this list returned as an indented list with each line prepended with "PRE ". Everything after "PRE " can be copied and pasted onto the S3 URL to see the next level in the hierarchy:
aws s3 ls s3://gesdisc-cumulus-prod-protected/CHIRP/SNDR13CHRP1AQCal.2/2016/
should return a list of days
aws s3 ls s3://gesdisc-cumulus-prod-protected/CHIRP/SNDR13CHRP1AQCal.2/2016/325/
should return a list of netcdf granules
5.1.6. aws cli to cp files to EC2 local storage
aws s3 cp s3://gesdisc-cumulus-prod-protected/CHIRP/SNDR13CHRP1AQCal.2/ . --recursive --exclude "*" --include "*20161120*.nc"
for example, will copy netcdf files from s3://gesdisc-cumulus-prod-protected/CHIRP/SNDR13CHRP1AQCal.2/2016/325 into the EC2 local directory under a matching directory tree structure ./2016/325/*.nc
There is also the option "aws s3 sync" which seems to do the same thing but does not require the "–recursive" keyword. Seems similar to rsync, actually.