AWS_julia

Mon, Aug 8, 2022 Steven Buczkowski

1 DONE Figure out julia package issues on taki (for local practice and comparison)

Seems there were two issues:

something screwed up in package registry
- SOLVED by clearing and rebuilding registry
  - registry remove General
  - registry add https://github.com/JuliaRegistries/General.git
julia was picking up taki local libraries instead of using julia's copy
- SOLVED by setting LD_LIBRARY_PATH to include the julia lib
directory for the currently loaded julia module (should be done as part of module init(?). Send note to OIT)
- export LD_LIBRARY_PATH="/usr/ebuild/software/Julia/1.6.2-linux-x86_64/lib/julia:$LD_LIBRARY_PATH"

Can now run julia with NetCDF and NCDatasets packages on taki/strowinteract

2 DONE Figure some basic trivialities with Julia

Before figuring out how to read S3 stores in julia, need to figure out how to do some basics on filesystems I understand: reading directories, filtering filenames, reading filepaths from a text file.

2.1 reading from a text file

for line in eachline("path-to-file-of-paths")
     ## do some stuff with variable "line"
end

2.2 reading directory directly with readdir()

readdir(): reads/lists $PWD
readdir("path"): reads/lists contents at "path"

2.2.1 absolute paths

have to map the results of readdir() with the abspath() function

map(abspath, readdir())

2.2.2 filtering return file list on contents

readdir() does not filter returned values, this has to be done by wrapping readdir in an external filtering function

filter(x -> occursin("text", x), readdir())
# returns only files containing "text" in name

filter(x -> occursin("text", x), map(abspath, readdir()))
# returns files where "text" occurs anywhere in full path

filter(x -> occursin(r"regex", x), readdir())
# returns files matching regular expression "regex"

Other functions and anonymous functions (the "x -> occursin…") can be used in "filter" so this is probably far from the only way to list and filter directories.

3 DONE Basic reads of netcdf files with NCDatasets package from normal filesystem

using Pkg
Pkg.add("NCDatasets")
using NCDatasets

ds = NCDataset("path the netcdf file")

# quick list of variables available in netcdf file
for (varname, var) in ds
  @show (varname, size(var))
end

# lazy load attributes of variable "var" but without loading data
var = ds["varname"] # using "varname" generically, not as variable from previous block

# actually loads data for var. Second set of brackets can set chunk boundaries and strides
var = ds["varname"][:,:]

4 TODO Basic timed read of 100-400 CHIRP files from S3

Make list of files/buckets and loop over them to read aggregating time to do so(?)
read in list of CHIRP files via readir() (?) and loop over this
- this is what we need in longer term for actual work. Will require understanding authentication/authorization key aging and how to re-authorize programmatically (see S3 bullet below)

5 TODO Timed read of rtp related variables from 100-400 CHIRP files

offshoot of code from previous bullet, just add reading some variables into arrays while timing

6 TODO Figure out S3 access and ongoing authentication with AWS.jl and AWSS3.jl

7 TODO (Ongoing) start setting up both AWS and julia on both AWS and UMBC to be more turnkey

customize julia environment and put on github
set up AWS environment template(s) for spinning up nodes

ASL

Atmospheric Spectroscopy Lab UMBC