AWS_julia
Mon, Aug 8, 20221 DONE Figure out julia package issues on taki (for local practice and comparison)
Seems there were two issues:
- something screwed up in package registry
- SOLVED by clearing and rebuilding registry
- registry remove General
- registry add https://github.com/JuliaRegistries/General.git
- SOLVED by clearing and rebuilding registry
julia was picking up taki local libraries instead of using julia's copy
- SOLVED by setting LD_LIBRARY_PATH to include the julia lib
directory for the currently loaded julia module (should be done as part of module init(?). Send note to OIT)
- export LD_LIBRARY_PATH="/usr/ebuild/software/Julia/1.6.2-linux-x86_64/lib/julia:$LD_LIBRARY_PATH"
Can now run julia with NetCDF and NCDatasets packages on taki/strowinteract
2 DONE Figure some basic trivialities with Julia
Before figuring out how to read S3 stores in julia, need to figure out how to do some basics on filesystems I understand: reading directories, filtering filenames, reading filepaths from a text file.
2.1 reading from a text file
for line in eachline("path-to-file-of-paths") ## do some stuff with variable "line" end
2.2 reading directory directly with readdir()
- readdir()
- reads/lists $PWD
- readdir("path")
- reads/lists contents at "path"
2.2.1 absolute paths
have to map the results of readdir() with the abspath() function
- map(abspath, readdir())
2.2.2 filtering return file list on contents
readdir() does not filter returned values, this has to be done by wrapping readdir in an external filtering function
filter(x -> occursin("text", x), readdir()) # returns only files containing "text" in name filter(x -> occursin("text", x), map(abspath, readdir())) # returns files where "text" occurs anywhere in full path filter(x -> occursin(r"regex", x), readdir()) # returns files matching regular expression "regex"
Other functions and anonymous functions (the "x -> occursin…") can be used in "filter" so this is probably far from the only way to list and filter directories.
3 DONE Basic reads of netcdf files with NCDatasets package from normal filesystem
using Pkg Pkg.add("NCDatasets") using NCDatasets ds = NCDataset("path the netcdf file") # quick list of variables available in netcdf file for (varname, var) in ds @show (varname, size(var)) end # lazy load attributes of variable "var" but without loading data var = ds["varname"] # using "varname" generically, not as variable from previous block # actually loads data for var. Second set of brackets can set chunk boundaries and strides var = ds["varname"][:,:]
4 TODO Basic timed read of 100-400 CHIRP files from S3
- Make list of files/buckets and loop over them to read aggregating time to do so(?)
- read in list of CHIRP files via readir() (?) and loop over this
- this is what we need in longer term for actual work. Will require understanding authentication/authorization key aging and how to re-authorize programmatically (see S3 bullet below)
5 TODO Timed read of rtp related variables from 100-400 CHIRP files
- offshoot of code from previous bullet, just add reading some variables into arrays while timing
6 TODO Figure out S3 access and ongoing authentication with AWS.jl and AWSS3.jl
7 TODO (Ongoing) start setting up both AWS and julia on both AWS and UMBC to be more turnkey
- customize julia environment and put on github
- set up AWS environment template(s) for spinning up nodes