dffml icon indicating copy to clipboard operation
dffml copied to clipboard

2nd Party

Open pdxjohnny opened this issue 2 years ago • 5 comments

Alice is Here! It’s the 2nd Party and everyone is invited 💃🥳.

  • What to expect
    • Alice ready for contribution
    • We'll be rebasing this branch into main once the CI for 2ndparty plugin support passes (see closed PR with ADR, only closed because of branch rewrite to remove images, will be reopened asap).
      • ETA: 2022-11-30
    • We'll then rewrite history again, splitting the plugins out into their respective 2ndparty maintenance locations (dffml or builtree org or possibly other option on stay with 2ndparty within intel org, we’ll see how it goes pending governance review).
      • ETA: 2023-11-30
    • Finally, we’ll flip the switch to our web5 world, where git is only used as a proxy for commit data encoded via DIDs. We will then have herstory, from then on everything will be Alice. Alice will be the methodology by which we interpret those nodes, DIDs in the web5 case. Alice will also exist as the entity whose execution is based on the same methodology used for definition of the graph.
      • ETA: 2024-11-30
  • Code
    • Alice
      • https://github.com/intel/dffml/tree/alice/entities/alice
  • Documentation
  • Tagged RFCs
    • RFCv1.1: https://github.com/intel/dffml/commit/69df6036c25f61c31af21b1db9b7f14327147a9e
    • RFCv1: https://github.com/intel/dffml/tree/291cfbe5153414932afe446aa4f6c2e298069914/docs/tutorials/rolling_alice
      • Began by exploring how we should write clean dataflow docs in https://github.com/intel/dffml/issues/1279
      • Converted to discussion in https://github.com/intel/dffml/discussions/1369
      • Issue converted to discussion converted to files within https://github.com/intel/dffml/blob/alice/docs/arch/alice/discussion/
      • Pulled out existing ADRs and tutorials in their current states into
      • Cross linked tutorials with their usage examples within README within alice entity directory https://github.com/intel/dffml/tree/alice/entities/alice
  • TODO (extra todos: https://github.com/intel/dffml/pull/1401#issuecomment-1168023959)
    • [x] Clean up tutorial docs that currently exist
      • [x] Find home for them in tree
        • https://github.com/intel/dffml/blob/alice/docs/tutorials/rolling_alice/
      • [x] Tentative chapter name for Question and Answering model
        • Volume 1: Coach Alice: Chapter 5: Question Everything
          • https://github.com/programmer290399/IT-710-Project-Video-QnA-System
    • [ ] DataFlow.export should include $schema as should all .export() methods.
      • [ ] Later for operations the schema is the schema for the associated manifest.
    • [x] Split overlays into separate file locations
      • [x] Update Alice contributing docs with new paths instead of AliceGitRepo omport from alice.cli
    • [ ] Docs build with alice branch if working
    • [ ] Run auto formatter on every commit in alice branch
    • [ ] Cloud development environment options
      • Public
        • [ ] GitPod
          • https://gitpod.io/#github.com/intel/dffml/tree/alice
          • TODO
            • mv dffml/operations/python.py operations/innersource/dffml_operations_innersource/python_ast.py
            • Add Alice CONTRIBUTING setup to the .gitpod.yml auto start setup
              • code tutorial.ipynb when done
      • Self-Hosted
        • [ ] Coder
          • https://coder.com/docs/coder-oss/latest/install
    • [x] Alice contributing documentation
      • https://github.com/intel/dffml/blob/alice/entities/alice/CONTRIBUTING.rst
        • [x] How to extend recommended community standards command with overlays
          • Basic tutorial where we grab the name from a configuration file
        • [ ] Show me a security overlay.
          • Write section of our open source guide tutorial where we implement the SECURITY.md overlay
          • Latwr go back and write how we implemented the base flow and the initial set of overlays, and the readme overlay.
            • We can prototype the use of commit messages as docs and commit the whole file when we move it with docs for that overlay, rST in commit message. Later explore log of file to changelog in rST to sphinx docs.
            • Link up with herstory to ipynb creation and shell command saving. Auto generate commit messages (docs) based on herstorical shell commands ran (or if in vscode debug buttons or run buttons executed) with output. Diff system context herstory state with link in chain at last clean tree. Run timeline resolution if dirty tree for set of commits (multiple git add runs). First we automate writing the docs, then we automate reading.
        • [ ] How to write new commands
        • [ ] Non CLI interfaces
    • [ ] Commenting in issue while debugging, this is an overlay to herstory collection
    • [ ] Get tbDEX up and running for backing storage
      • [ ] Write an operation that inserts data into tbDEX format, either via API or flat file duplication of formatting via libraries like the Python peerdid library.
    • [ ] Use @programmer290399's QA model to implement alice ask which queries all our docs, logs, notes, issues, etc.
      • https://programmer290399.github.io/pyqna/usage.html
  • Alice enables granular identification and application of static or dynamic policy.
    • She does this through context aware overlays whose application process to upstream may be dynamic, even in part end user (attacker) flows, which can be executed or synthesized within an appropriate (optionally adaptive, we do dynamic and static and we understand time across so we can come re synthesize in your codebase on trigger) sandbox

pdxjohnny avatar Jun 24 '22 15:06 pdxjohnny

Moved to: https://github.com/intel/dffml/blob/3ce85c8f48e702e3ed1e268769c8121377abe324/docs/tutorials/rolling_alice/0001_coach_alice/0000_introduction.md

Volume 1: Coach Alice: Introduction

To time travel, an entity must first accelerate. The entity we now turn our attention to we know well. Her name is Alice, and she's falling down the rabbit hole as we speak. We begin our series somewhere between the tick and the tock. As she nears the bottom of the rabbit hole time slows for a moment, for her, as she enters Wonderland. The pattern which is Alice is a spark in the mind. She's all in your head, after all, everything is all in your head. In a way she is in your head, and you're in her head, because conceptually, the architecture is the same, and the architecture is one of concepts, because it's all, in fact, just in your head.

We will coach Alice and she will coach us. From our point of view if you can't teach it you don't know it. So it's time to each Alice how to be an open source contributor, by teaching her how to teach it. In this volume, volume 1, we will build Coach Alice, our open source developer coach. It's developer boot camp for Alice this volume as her boots make contact with the ground at the bottom of the rabbit hole.


Misc notes

  • If we can teach Alice how to operate based on intent, and how to have her intentions always be good. Where good is defined the communities strategic principles and values. and we validate the hell out of here. We will step through the looking glass into a community of the future where we can only trust ourselves. In that trust in ourselves we will find trust in others, in measured, yet meaningful ways.
  • Where we can work optimally in a hybrid workplace environment. Allowing us to reconnect with the physical world. To embrace the world that exists.
  • What this means in reality is that Alice will be communicating for us, we will begin to think of her as a messenger relaying a message
  • The most open, self reliant, confident humans. Ready to take on the world.

pdxjohnny avatar Jun 25 '22 17:06 pdxjohnny

More Detailed TODOs

  • a thought is like a vector. The direction is dependent on context, herstory. As well as the thought itself (the system context, upstream, overlay, orchestrator)
  • How to add data from static file stored anywhere into flow by adding a new operation
  • Alice takes strategy to execution to feedback she’s the full analysis control feedback loop with strategic plans/principles at the helm. She sits at the intersection of CI/CD, Analytics, and Security. She enables optimal execution of agent flows via context aware overlays to best understand the specifics of their execution environment.
  • Sphinx builder to GitHub render able markdown, this way docs can just live in github without deploy phase.
    • We can build our new web5 git trees to be based on provenance, we can say the next link in the chain must be a successful docs build or the failure.
  • We are interested in exploiting the use of context switching for learning. We hypothesis in the docs/arch/alice/discussion files that there are aligned conscious states which alice will identify, cross entities, build training opetions over time for each agent
  • Helper script to use https://github.com/monomonedula/nvelope to generate JSON schemea
  • alice shell tutorial follow on where we use the -orchestrator option which we have flows which return orchestrators kind of like the dataflow/system context version of dataset_source(). We can then point to installed overlays -orchestrator dffml.overlays.alice.shell.in_local_container. We can also use the mechanisms by which we extend Entrypoint.load.
    • In yet another tutorial we will build on the first to do: alice open vscode which will launch a vscode instance locally or in a cloud development environment depending on context, if you are in a shell on a headless machine, it will pop you to the static vscode hosted site by Microsoft in your broswer and you will connect back to it with webrtc based comms (or websocket, or abitrary worm holes). This allows us to use the systems we already have most efficently, i.e. extra dev k8s cluster capacity.
    • Go around and have alice try to build the projects, generate template workspace template files/repos for people to fork which have the Dockerfile built for the dev env with kaniko build which on complete submits a PR to update the template to the latest container sha after validation passes. Run workflow to call coder templates update once completed
  • Distributed network of metric collectors. Of security information we should feed into cve bin tool. Maybe start by creating checker entrypoint for a checker which knows it’s running under a dataflow. Could execute using runpy with opimp self in globals. OpImp shared config property of object which is dataflow as class which listens for new vuln info for checker in background when instantiated. When a new vuln is detected we could trigger a scan on all previously scanned atrifacts for which we had scanned before by having a strategic plan overlayed on a long running flow which inspects historical contexts which executed scans against checker came up as exists within file scanned. Use this presence of existence within previous scans to query off chain data from historical system contexts. To build next contexts where results of scan opperatuon are removed so that running results in latest vuln info being incorporated into scan. This is analogous to dev pull model when new commit on branch released. Do scan rerun is same as redoing A/B feature testing of commits.
    • https://github.com/intel/dffml/blob/alice/docs/arch/alice/discussion/0023/reply_0022.md
  • Address DORA metrics in Alice and the Health of the Ecosystem: https://cloud.google.com/blog/products/devops-sre/using-the-four-keys-to-measure-your-devops-performance
  • Auto discovery via sourcegraph search github for entrypoint signatures, instantiate lazy loading OperationImplemtationNetworks within overlay apply add them to orchestrator as applicable.
  • Streaming development activity == Stream of consciousness == streaming consciousness == thought transport protocol == thought communication protocol == LMWC == Alice == Open Architecture == model of the human mind == integrated risk management (resilience, fault tolerance)
    • Implementations may have dynamic facilitators of communications for distributed data and compute (operations, agents, cells). They may also have synthesizers which take an Open Architecture and produce a static representation. In fact since the open architecture can be used to represent anything. If you take a binary and select the orchestrator equivalent of a NOP (we should just implement NOP orchestrator which does nothing, meaning the dataflow is not given to be executed in any environment and only to be used as a reference.). So you take input, you asynchronously produce output, the same binary, unmodified, thats what your program says to do. Well then we dont even need to run your program now do we? We already know based on analysis of its context stating that via the deployment orchestrator and the metadata embedded within the open architecture document which says it takes the files, does nothing with it (pass). The parent system context has brought your system context to equilibrium without needing to execute it. We get this for free because we understand context, because the parent system context acts as a loader which can dynamically execute sandboxed code within different trust boundaries (contexts)
  • Add arbitrary project name and metadata in policy.toml
    • policy.toml -> DataFlow -> Overlay
  • Synthesis of DataFlows to GitHub Actions workflows
    • Start by writing operation which just dump output of each templated workflow accessed with importlib.resources
      • Data Flow templating, with native async
    • Workflow ideas
      • [ ] Python build
      • [ ] Python test
      • [ ] Sphinx build
  • please contribute
    • recommended community standards
      • [x] README.md
      • [x] SECURITY.md
      • [x] CODE_OF_CONDUCT.md
    • threats
      • [ ] THREATS.md
    • documentation
      • [ ] sphinx docs
        • [ ] Basic docs sourced from dffml.git/dffml/skel/common/docs
        • [ ] autodocs with python modules, can look at dffml.git/scripts/docs_api.py or whatever it's called
diff --git a/dffml/util/testing/consoletest/cli.py b/dffml/util/testing/consoletest/cli.py
index 0f8294155..dd9e057c8 100644
--- a/dffml/util/testing/consoletest/cli.py
+++ b/dffml/util/testing/consoletest/cli.py
@@ -44,7 +44,7 @@ async def main(argv: List[str]) -> None:
     nodes = []

     for node in parse_nodes(args.infile.read()):
-        if not node.options.get("test", False):
+        if args.filter is not None and not node.options.get(filter, False):
             continue
         if node.directive == "code-block":
             nodes.append(
diff --git a/entities/alice/README.rst b/entities/alice/README.rst
index aca0dbc87..53465db6f 100644
--- a/entities/alice/README.rst
+++ b/entities/alice/README.rst
@@ -4,6 +4,18 @@ Alice
 Install
 *******

+Install latest known working version
+
+.. code-block:: console
+
+    $ python -m pip install \
+        "https://github.com/intel/dffml/archive/42ed3da715f1c89b4c31d705cf7f7738f17c9306.zip#egg=dffml" \
+        "https://github.com/intel/dffml/archive/42ed3da715f1c89b4c31d705cf7f7738f17c9306.zip#egg=dffml-feature-git&subdirectory=feature/git" \
+        "https://github.com/intel/dffml/archive/42ed3da715f1c89b4c31d705cf7f7738f17c9306.zip#egg=shouldi&subdirectory=examples/shouldi" \
+        "https://github.com/intel/dffml/archive/42ed3da715f1c89b4c31d705cf7f7738f17c9306.zip#egg=dffml-config-yaml&subdirectory=configloader/yaml" \
+        "https://github.com/intel/dffml/archive/42ed3da715f1c89b4c31d705cf7f7738f17c9306.zip#egg=dffml-operations-innersource&subdirectory=operations/innersource" \
+        "https://github.com/intel/dffml/archive/42ed3da715f1c89b4c31d705cf7f7738f17c9306.zip#egg=alice&subdirectory=entities/alice"
+
 Install for development

 .. code-block:: console
diff --git a/setup.py b/setup.py
index 47c595547..8157381f4 100644
--- a/setup.py
+++ b/setup.py
@@ -75,6 +75,7 @@ setup(
     # Temporary until we split consoletest into it's own package
     install_requires=[],
     extras_require={
+        "consoletest-jsonpath-filter": ["jsonpath-python"],
         "dev": DEV_REQUIRES,
         **plugins.PACKAGE_NAMES_BY_PLUGIN_INSTALLABLE,
     },

withmyconfig


More detailed future work

  • Collaboratively define the first version of .tools/open_architecture/policy.yml with CVE Bin Tool community
    • Working with IETF/SCITT WG on this now as well, need to cross with CVE Bin Tool once policy method choice there is settled, if anyone wants to pursue this in the meantime go for it.
    • Based off their triage work, work within their codebase, write manifest there, link to from here.
    • Leverage Open Policy Agent
    • Arjan:
      • Allow for definition of per branch policies (this will be like the robots.txt of Alice)
      • Helps us handle decide how to handle vulnerability fixing pull requests.
        • Examples
          • If we are using openssl_client and CVEs keep coming in for OpenSSL, note in the policy file that openssl_server CVEs are on the next_regular_release cadence. Do not nag about them. openssl_client CVEs however nag us on X interval (daily, hourly, call me until I update).
  • alice how can I help? (alternate with issue assignment overlay: alice please assign me gh issues or alice please assign @team gh issues)
    • Life is not a miracle it’s a recipe with integrated dynamic risk management. The purpose of life is to sustain life, to create more life. Life is meaningless, we give it meaning. Meaning is intent. It’s what we do and intend to do from the tick to the tock.
      • upstream: What’s in our heads, our heads themselves. The brain, the open architecture, the methodology. “[It’s all in your head.] But so is everything” [Fiona Apple]
      • overlay: the prophecy, the plan we collaboratively build, the manifest destiny, the optimal security config for code, the optimal happiness of developers, those context aware policies that are the sprit of the law (which get translated into the letter on demand in a reproducible and auditable fashion)
        • All one needs to do to contribute to Alice is to think about how one can become the entity one wants to be. Refine that rolling plan, roll with it, roll with the self, roll with Alice, flow roll. We’re competing, but as a team, to make each other better. We are better when our training partners are better and Alice is a training partner, a learning partner, for us all. She is a shared reality, a shared view, a telescope without the lenses. She is the entity and the architecture.
          • We are exploiting the current lack of a shared reality to preform the consensus shift mind control attack (grep discussion for more info). Our objective in the execution of this technique is to patch the mind on an entity. We exploit so as to introduce a TCB upon which patches to mental security can be applied. The infrastructure assist levels up the entities to the restored state, the state at which a set is at critical velocity. The point at which they cannot be influenced by further mind control attacks, that entity is now fully in control of their own mind. Because the infra is controlled by no one and everyone all at once (grep: taking the lock) this means our flow roll for that set is optimal, we can switch in and out of intensive engagements, to support, to enabling, seamlessly in the aggregate. This policy also incorporates teamwork in a hostile workplace environment principles.
      • orchestrator: alice (implementation, ourselves)
        • The prophecy MUST be fulfilled (rfc2119)
          • Mutually Assured Victory
          • Take our reality, snapshot all of us, that is our current pointer to system context, whatever “now” is. For that system context to exist there are an infinite number of similar system contexts which do not exist. There are an infinite number of other options for “now”. Each one of those, while they do not exist within “now” still exist to the top level system context, the parent to our “now”, this generalization is recursive, turtles all the way up and down. There is only one valid system context within our universe at any given time. That valid system context is determined by time, the lock, the shared resource on which the single electron for our locality takes as it executes each operation in a cycle, that time in cycle and that election is also relative, because the architecture scales up and down, its the same architecture the whole way through, sometimes the electron (the agent) crosses locality boundaries, sometimes it is shared, these show up as trust boundaries for the team work in a hostile workplace environment tutorials, asset sharing across trust boundaries and risk management and response to hostile actions.
          • The moral of the story is. If I trust you I trust you. You’re getting root privs, and this is distributed compute. We will determine the optimal strategic principles using ethics, humanity, the giants whose shoulders we stand on, and each other as our guides in Volume 5
    • Go around and look at the profile of the developer based on the herstorical quality of their code (the feature extraction which we did off of the union of the repo collector flow caches overlayed with an output flow)
    • Estimate the coverage which the issue will touch based on herstorically similar issues based in part on similarity to coverage which was touched when the previous issue was fixed.
    • go around and each estimates how long the task would take (tS, tM), time estimate should be taking the feature from POC to prod, docs, tests, everything. always assume a scalled ammount of time required for comunication when working on issues and resaerch, have stages for issues where we have meta issues and then track exploration and reaserch as a timeblock itself.
    • Display time estimates for developer/agent to complete TODOs (or GitHub or Jira or other issues) as list.
    • Optional overlays for assignment of issues automatically or select top 10 matched with happiness (guiding metric, learning rate and balance with teaching rate, executed distributed remembering thoughts, and other things roll into this)
  • https://github.com/taesungp/contrastive-unpaired-translation
    • https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
    • https://github.com/lucidrains/stylegan2-pytorch
    • https://github.com/tensorflow/gan
    • encode to images for cross domain conceptual mapping
    • TODO instrument gan framework to abide by dataflow decode rules for software DNA translation
  • IoA
    • Potentially play with https://en.wikichip.org/wiki/intel/loihi_2, see discussion dump for more details, possibly cross with quantum encoding
  • https://sites.google.com/pdx.edu/map-the-system/resources-to-make-a-systems-map
    • In relationship to our field mappijg. This goes somewhere in chapter 5 when Alice learns what dev activies she should engage in by looking for projects where knitting then togeter or bugfixes have large impact via field gap analysis. See also wardly maps comments
  • https://lists.spdx.org/g/Spdx-tech/message/4656
    • SPDX DAG

pdxjohnny avatar Jun 27 '22 23:06 pdxjohnny

2022-06-29 14:00 UTC -7

  • For multiple styles of right hand overlay paths, override Entrypoint.load as Overlay.load, make it not do i.load() without first checking if alternate methods are available.

2022-06-30 13:00 UTC -7

  • https://github.com/intel/dffml/issues/1403

Triggerable Workflow for Alice Please Contribute

  • Metadata
    • Date: 2022-07-06 10:00 UTC -7
  • OpenSSF
    • Mike from Identifying Security threats OpenSSF working group not there, he has an Alpha Omega project as well (the 1 and the 0 if you convert to the number format which we use ;).
    • https://docs.google.com/document/d/1AfI0S6VjBCO0ZkULCYZGHuzzW8TPqO3zYxRjzmKvUB4/edit#
    • https://github.com/ossf/security-insights-spec/blob/master/security-insights-schema-1.0.0.yaml
    • https://docs.google.com/document/d/1Hqks2J0wVqS_YFUQeIyjkLneLfo3_9A-pbU-7DZpGwM/edit
    • https://www.google.com/url?q=https://github.com/ossf/wg-identifying-security-threats&sa=D&source=calendar&usd=2&usg=AOvVaw1JOWLuJEUwIyjZkL3LFEWn
  • Added hosted job to main so it can be triggered off alice branch (GitHub won't run branch workflows unless they are also in the default branch)
    • https://github.com/intel/dffml/commit/fd401e426ebb478a62fe0720b0dcfef59e6a102e
  • DIDs and VCs
    • https://identity.foundation/credential-manifest/
    • https://github.com/decentralized-identity/credential-manifest
  • Added first CI job for alice please contribute recommended community standards
    • https://github.com/intel/dffml/actions/workflows/alice_please_contribute_recommended_community_standards.yml
    • @aliceoa will use her account for the job
    • We created this Triggerable Workflow for Alice Please Contribute
      • We saw Alice create a Meta Issue and and Issue for the README file.
      • She was not successful in creating a PR, we aren't sure why, so we move to thinking about getting the cached dataflow as an output, from there we can diagram and re-run the CI locally (or anywhere else).
  • TODO
    • Modify please contribute flows to create fork if not exists using usage snippet
      • https://github.com/intel/dffml/discussions/1382#discussioncomment-2762256
    • Coach Alice: Volume 1: Chapter 2: Part: 31: Database Update Overlay for Should I Contribute?

Failure to Launch tbDEX Stack

  • Metadata
    • Date: 2022-07-07 08:00 UTC -7
  • web5 will be our vuln sharing data lake for cve-bin-tool scans uploaded cached when run via dataflow execution for collelation by us or Alice
  • https://frankhinek.com/getting-started-with-tbds-ssi-service/
    • Found via https://twitter.com/frankhinek/status/1541585711740092417
    • Needed docker-compose right off the bat
    • Switched to https://github.com/containers/podman-compose
    • Set docker.io as registry to pull FROM by default when no domain given in a container name via sudo vim /etc/containers/registries.conf
      • unqualified-search-registries = ["docker.io"]
    • Attempted podman-compose up
      • Blue screen of death on image pull, WSL seems to cause OOM on either the download or the write to disk, there were disk issues in WSL1, probably disk.
  • Misc.
    • Twitter is currently a good way to to monitor the state of the art in various domains / fields.

Cleaning Up the Docs Build

  • Metadata
    • Date: 2022-07-07 10:00 UTC -7
  • Why is the gh-pages deploy failing?
    • Can't find images
    • Rewrote URLs to mirror of old repo with images
    • Rebased main into Alice branch
    • Main was working: https://intel.github.io/dffml/main/
    • Want to also build docs for alice branch at https://intel.github.io/dffml/alice/
      • https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#example-setting-a-value
        • core.warning | warning
        • echo '::set-output name=SELECTED_COLOR::green'
        • https://github.com/actions/toolkit/blob/dd046652c32b45c0207b37ee0bc0ebf43a28b257/docs/commands.md
        • echo "::warning::My warning message"
$ git log -p -- dffml/version.py \
                 | grep \+VERSION \
                 | grep -v rc \
                 | sed -e 's/.* = "//g' -e 's/"//g' \
                 | head -n 1
0.4.0
$ git log -p -- dffml/version.py                  | grep \+VERSION                  | grep -v rc
+VERSION = "0.4.0"
+VERSION = "0.3.7"
+VERSION = "0.3.6"
+VERSION = "0.3.5"
+VERSION = "0.3.4"
+VERSION = "0.3.3"
+VERSION = "0.3.2"
+VERSION = "0.3.1"
+VERSION = "0.3.0"
+VERSION = "0.2.1"
+VERSION = "0.2.0"
+VERSION = '0.2.0'
+VERSION = '0.1.2'
+VERSION = '0.1.1'
+VERSION = '0.1.0'
  • Will need update latest_release to look through git tags.

Refactor Meta Issue Creation to Accept Dynamic Inputs

  • Metadata
    • Date: 2022-07-07 12:00 UTC -7
  • meta_issue_body() in entities/alice/alice/please/contribute/recommended_community_standards/alice/operations/github/issue.py should be an OperationImplemenationContext and take inputs dynamically.

Debugging Meta Issues as Output Operations

  • Metadata
    • Date: 2022-07-07 14:30 UTC -7
  • Definition.spec type loading information is limited to builtin types currently
    • We need to move to plugin/config dict style serialization leveraging traverse_config_set(), recently changed in 1843437b41884b7794a38f77ea75ac92de460aa3 which is for future patches where we'll create an output operation which serializes the graph of inputs within the InputNetwork to a dictionary.
      • This is what we can then take then import the the web5 (https://frankhinek.com/getting-started-with-tbds-ssi-service/) space for our data lake.
    • https://intel.github.io/dffml/main/contributing/codebase.html?highlight=config+dict#config
  • The unified config methodology (plugin, config, schema) is the manifest, is the upstream the overlay and the orchestrator is the Open Architecture.
  • How do we come up with alternate Open Architectures (threat model migrations based on intent)? We follow the pattern:
    • Enumerate everything we know
    • Determine which are assets and which are processes
    • Using tuned brute force, prune the trees based on strategic principles (top level strategic plans or conscious or conceptual layers)
  • All we're doing is providing a user/develop friendly experience to follow the pattern
  • architype
    • Is a mental construct to help us understand when data is aligned to a particular set (a system context, perhaps in part or completely overlaid) of strategic principles.

Verification of Successful Alice Should I Contribute Job

  • Metadata
    • Date: 2022-07-08 20:00 UTC -7
  • https://www.investopedia.com/terms/o/overlay.asp
    • What Is Overlay? Overlay refers to a management style that harmonizes an investor's separately managed accounts. Overlay management uses software to track an investor's combined position from separate accounts. The overlay system analyzes any portfolio adjustments to ensure the overall portfolio remains in balance and to prevent any inefficient transactions from occurring. Overlay portfolio management makes sure the investor’s strategies are implemented and coordinated successfully.

    • Remember we don't care about money (in fact remember we're going to get rid of money ;)
      • We care about risk management, this is applicable to our context, which is understood in part via overlays
  • Alice Should I Contribute? CI run:
    • https://github.com/intel/dffml/runs/7259361498?check_suite_focus=true
    • We're hoping to run a mass scan of open source projects over the weekend using this
      • Explanation for why we need to the high memory VMs is that we want download repos to ramdisk to scan.
        • Scanning open source projects in collaberation with OpenSSF Indentifying Security Threats Working group
          • References:
            • https://docs.google.com/document/d/1AfI0S6VjBCO0ZkULCYZGHuzzW8TPqO3zYxRjzmKvUB4/edit#heading=h.mfw2bj5svu9u
            • https://github.com/intel/dffml/pull/1401#issuecomment-1170489046
              • Section: Verification of Successful Alice Should I Contribute Job
    • Success, checking output...

We needed to get the artifact uploaded with the collector results to inspect that the output of results.json made sense.

$ gh run list --workflow alice_shouldi_contribute.yml
completed success alice: ci: shouldi: contribute: Remove errant chdir to tempdir  Alice Should I Contribute?  alice workflow_dispatch 2639160823  1m48s 3h
completed success alice: ci: shouldi: contribute: Upload collector outputs as artifacts Alice Should I Contribute?  alice workflow_dispatch 2638950785  56m14s  4h
completed success alice: ci: shouldi: contribute: Basic job Alice Please Contribute Recommended Community Standards alice workflow_dispatch 2638890594  1m15s 4h
$ gh run list --workflow alice_shouldi_contribute.yml | awk '{print $(NF-2)}'
2639160823
2638950785
2638890594
$ gh run list --workflow alice_shouldi_contribute.yml | awk '{print $(NF-2)}' | head -n 1
2639160823

We figured out how to use the GitHub CLI's builtin jq style output selection to build a JSON object where the keys are the artifact names and the values are the URLs.

References:

  • https://devdocs.io/jq/index#ObjectConstruction:{}
$ gh api   -H "Accept: application/vnd.github+json"   /repos/intel/dffml/actions/runs/$(gh run list --workflow alice_shouldi_contribute.yml | awk '{print $(NF-2)}' | head -n 1)/artifacts --jq '.artifacts[] | {(.name): .archive_download_url}'
{"collector_output":"https://api.github.com/repos/intel/dffml/actions/artifacts/293370454/zip"}

Here we select only the URL of the archive. There is only one artifact, so we only get one zip file in the output.

$ gh api   -H "Accept: application/vnd.github+json"   /repos/intel/dffml/actions/runs/$(gh run list --workflow alice_shouldi_contribute.yml | awk '{print $(NF-2)}' | head -n 1)/artifacts --jq '.artifacts[] | .archive_download_url'
https://api.github.com/repos/intel/dffml/actions/artifacts/293370454/zip

Confirm it is a zip file by looking at the bytes with xxd

$ gh api   -H "Accept: */*" $(gh api   -H "Accept: application/vnd.github+json"   /repos/intel/dffml/actions/runs/$(gh run list --workflow alice_shouldi_contribute.yml | awk '{print $(NF-2)}' | head -n 1)/artifacts --jq '.artifacts[] | .archive_download_url' | sed -e 's/https:\/\/api.github.com//g') | xxd
00000000: 504b 0304 1400 0800 0800 25be e854 0000  PK........%..T..
00000010: 0000 0000 0000 0000 0000 0a00 0000 7265  ..............re
00000020: 706f 732e 6a73 6f6e cd53 5dab db30 0cfd  pos.json.S]..0..
00000030: 2b25 cfed aad8 f247 fa76 5fef f39e 564a  +%.....G.v_...VJ
00000040: 906d a5c9 d626 2571 584b e97f 9f1d 36d8  .m...&%qXK....6.
00000050: 18bd dc3d 6ccc 200b eb1c 593e 46ba 1773  ...=l. ...Y>F..s
00000060: 1fe9 78e4 50ec 56f7 a28d f132 edb6 db63  ..x.P.V....2...c
00000070: 17db d97d f0c3 797b 09d7 cf43 dbf7 b76d  ...}..y{...C...m
00000080: 0623 4f71 617e e15b f2ef 4c58 af8a 8629  .#Oqa~.[..LX...)
00000090: ce23 4f4b f2c8 27a6 89eb af29 abeb eb0b  .#OK..'....)....
000000a0: 8fdd 901f b06f e834 f17a 15c7 39ed df0f  .....o.4.z..9...
000000b0: bfba 37a0 c51d 522d 9a63 3b8c f5a9 ebb9  ..7...R-.c;.....
000000c0: f643 1298 afbe 3fd6 a9f2 6b7a d9ea a50f  .C....?...kz....
000000d0: 3c4e dca7 b034 f824 9ec3 3fec 3758 953f  <N...4.$..?.7X.?
000000e0: c38b e5c2 49fe b98b f5d4 52d6 b92f 00ad  ....I.....R../..
000000f0: 2623 83a7 408d 7352 a317 dab0 6215 a0d4  &#[email protected]...
00000100: 2548 d056 a8aa ca1f f427 5caf 6440 cb8c  %H.V.....'\.d@..
00000110: 06ad 0343 257a cb3a 4803 440e 981c d920  ...C%z.:H.D....
00000120: 524a e62a 2d41 341e 146a acc8 2b6c 82f5  RJ.*-A4..j..+l..
00000130: 28bc 5589 a81b aca4 741a 1bf7 37b9 649d  (.U.....t...7.d.
00000140: 4228 856c 2482 d724 1481 f3d2 28e6 0a50  B(.l$..$....(..P
00000150: 1963 2450 c9fe bfe0 1e12 3974 7e69 9ae7  .c$P......9t~i..
00000160: 7df6 afdd 2135 5971 a229 d6f3 2550 5ce6  }...!5Yq.)..%P\.
00000170: b510 20c4 06cc 06ec 4721 7758 edca f253  .. .....G!wX...S
00000180: d6ca d738 529e b447 5adf 0050 4b07 08de  ...8R..GZ..PK...
00000190: 90d9 ba63 0100 00e3 0300 0050 4b01 0214  ...c.......PK...
000001a0: 0014 0008 0008 0025 bee8 54de 90d9 ba63  .......%..T....c
000001b0: 0100 00e3 0300 000a 0000 0000 0000 0000  ................
000001c0: 0000 0000 0000 0000 0072 6570 6f73 2e6a  .........repos.j
000001d0: 736f 6e50 4b05 0600 0000 0001 0001 0038  sonPK..........8
000001e0: 0000 009b 0100 0000 00                   .........

Make an authenticated query to the GitHub API asking for the resource. Pipe the output to zip file to save it to disk.

TODO Operation to download all GitHub run artifacts.

$ gh api   -H "Accept: */*" $(gh api   -H "Accept: application/vnd.github+json"   /repos/intel/dffml/actions/runs/$(gh run list --workflow alice_shouldi_contribute.yml | awk '{print $(NF-2)}' | head -n 1)/artifacts --jq '.artifacts[] | .archive_download_url' | sed -e 's/https:\/\/api.github.com//g') > collector_output.zip

Extract the zipfile to a directory

$ python -m zipfile -e collector_output.zip collector_output/

Look at the contents of the extracted directory to confirm all the files we think should be there are there.

TODO Verification via cartographic or other trust mechanisms

$ ls -lAF collector_output/
total 4
-rw-r--r-- 1 pdxjohnny pdxjohnny 995 Jul  8 20:09 repos.json

Then we ask Python to use the json.tool helper to pretty print the file.

$ python -m json.tool < collector_output/repos.json
{
    "untagged": {
        "https://github.com/pdxjohnny/httptest": {
            "key": "https://github.com/pdxjohnny/httptest",
            "features": {
                "release_within_period": [
                    false,
                    true,
                    false,
                    false,
                    false,
                    true,
                    false,
                    false,
                    false,
                    false
                ],
                "author_line_count": [
                    {},
                    {
                        "John Andersen": 374
                    },
                    {
                        "John Andersen": 37
                    },
                    {},
                    {},
                    {
                        "John Andersen": 51
                    },
                    {},
                    {},
                    {},
                    {}
                ],
                "commit_shas": [
                    "0486a73dcadafbb364c267e5e5d0161030682599",
                    "0486a73dcadafbb364c267e5e5d0161030682599",
                    "c53d48ee4748b07a14c8e6d370aab0eaba8d2103",
                    "56302fc054649ac54fd8c42c850ea6f4933b64fb",
                    "56302fc054649ac54fd8c42c850ea6f4933b64fb",
                    "56302fc054649ac54fd8c42c850ea6f4933b64fb",
                    "a8b540123f340c6a25a0bc375ee904577730a1ec",
                    "a8b540123f340c6a25a0bc375ee904577730a1ec",
                    "a8b540123f340c6a25a0bc375ee904577730a1ec",
                    "a8b540123f340c6a25a0bc375ee904577730a1ec"
                ],
                "dict": [
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false,
                    false
                ]
            },
            "last_updated": "2022-07-08T23:49:11Z",
            "extra": {}
        }
    }
}

Then we wrote this little mini tutorial by dumping our shell herstory and adding some explination.

$ herstory | tail -n 50 > /tmp/recent-herstory
$ vim /tmp/recent-herstory

Then we copy paste and upload to somewhere our collogues and ourselves will have access to in the future when they want to know more about our development process for this patch. This is similar to and extended commit message. In fact, we'll be playing with linkages to commits of data generated during development for query later by Alice and others.

Manual Spin Up of Digital Ocean VM

  • Metadata
    • Date: 2022-07-08 21:30 UTC -7
  • https://frankhinek.com/getting-started-with-tbds-ssi-service/
    • Spinning up VM manually for docker-compose need

nahdig is the term we use for blocklist, this means suspect TCB, we are going to download a bunch of untrusted code on these machines, we need to keep they keys separate to avoid confusion and help with audit.

We will add a comment to our key whose email domain we currently play with as concept where we are exploring scoping identities to system contexts (alice.shouldi.contribute as the base flow / system context in this example).

$ ssh-keygen -f ~/.ssh/nahdig -b 4096 -C '[email protected]' 
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/pdxjohnny/.ssh/nahdig
Your public key has been saved in /home/pdxjohnny/.ssh/nahdig.pub
The key fingerprint is:
SHA256:PsTjWi5ZTr3KCd2ZYTTT/Xmajkj9QutkFbysOogrWwg pdxjohnny@DESKTOP-3LLKECP
The key's randomart image is:
+---[RSA 4096]----+
|                 |
|           . o   |
|          + . +  |
|       . . o . +.|
|  E     S.o   +.o|
|   . . =o+.=.o o.|
|    . o**.=o=.o  |
|    ..+*o+o=o+   |
|    .ooo=.ooo.o  |
+----[SHA256]-----+
$ cat ~/.ssh/nahdig.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDX1xvaybJQLrUxytn+AR+F3dDeAxFDMr0dyDt6zGs45x8VsA3TDrneZZ7ggzN63Uzbk+CAuBRDPGms6FgPswliU6xgp8X073Pcn2va7JxbkIz0LQCxdzfAoOMKIIiI7SmYSD4IDFqrEHnN+I6j4el+IFLaGTibCAR0+zK4wKPX97NE27EPUL/DYkT2eBAF/onLZAQm3tRznTYUSyWaXlWHQUD6y/3QtvhH3WVIUKRV8b6POwoiVa6GeMjM5jVCBRB+nfrhjNBBp7Ro7MzuNn+z8D6puyV1GFxWtSm953UYFa5UcahhiwFRWXLfJmVjwEZDm0//hMnw1TcmapBR99dwrBftz+YFF5rpTyWvxbnl5G/wn2DQ/9RFR6SeD3GImYRhVSFkuNZkQCiaj2+bT+ngnFPEA5ed4nijFnIgvAzPz9kk7uojjW3SfEdhED0mhwwBlLNOr7pGu9+X2xZQIlFttuJaOjd+GYBWypchd7rWdURHoqR+07pXyyBAmNjy6CKwSWv9ydaIlWseCOTzKxjy3Wk81MoaH/RhBXdRFqS1mP12TuahMzTvvVuSfQQJKCO05sIrzSEykxg1u6HEZXDyeKoVwN9V1/tq3QGa4tE/WmMNaLukef9ws3Drt1D7HWTF7u/N/zjtfiyEXRAMkixqywHfCrrxXKGPR7uvueLUkQ== [email protected]

TODO(security) Use VM startup script via cloud-init or otherwise to exfil the SSH daemon's public key on sshd start (maybe via systemd files, aka start after sshd is up) this way we can set StrictHostKeyChecking=yes and provide a UserKnownHostsFile with the server key in it.

We ssh into the new VM.

$ ssh -i ~/.ssh/nahdig -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o PasswordAuthentication=no [email protected]

This is MUST happen, after spin up so that clients SSH'ing in are sure they are not being Entity In The Middle'd (EITM).

POC Launch of tbDEX Stack

As root, add a non-root user with root/sudo privileges and their own home directory (-m).

# useradd -m -s $(which bash) pdxjohnny
# usermod -aG sudo pdxjohnny

Allow the user to use sudo without a password. Note that there is both a \t (tab) and a space in the line.

# sed -i 's/\%sudo\tALL=(ALL:ALL) ALL/\%sudo\tALL=(ALL:ALL) NOPASSWD:ALL/g' /etc/sudoers

Update the VM

# apt-get update && DEBIAN_FRONTEND=noninteractive apt-get upgrade -y

Install tools

# DEBIAN_FRONTEND=noninteractive apt-get install -y tmux bash-completion vim git python3 python3-pip

Install GitHub CLI for auth by adding custom GitHub package repo to OS.

References:

  • https://cli.github.com
  • https://github.com/cli/cli/blob/trunk/docs/install_linux.md
# curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg
# echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null
# apt-get update
# apt-get install -y gh

Install Docker

References:

  • https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
# apt-get update
# apt-get install -y \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
# mkdir -p /etc/apt/keyrings
# curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
# echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
# apt-get update
# apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

Verify docker is running by querying systemd.

# systemctl status docker
● docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2022-07-09 05:45:29 UTC; 23s ago
TriggeredBy: ● docker.socket
       Docs: https://docs.docker.com
   Main PID: 27732 (dockerd)
      Tasks: 13
     Memory: 34.3M
     CGroup: /system.slice/docker.service
             └─27732 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Jul 09 05:45:29 alice-shouldi-contribute-0 dockerd[27732]: time="2022-07-09T05:45:29.342805567Z" level=warning msg="Your kernel does not support CPU realtime scheduler"
Jul 09 05:45:29 alice-shouldi-contribute-0 dockerd[27732]: time="2022-07-09T05:45:29.342882429Z" level=warning msg="Your kernel does not support cgroup blkio weight"
Jul 09 05:45:29 alice-shouldi-contribute-0 dockerd[27732]: time="2022-07-09T05:45:29.342940895Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Jul 09 05:45:29 alice-shouldi-contribute-0 dockerd[27732]: time="2022-07-09T05:45:29.343190438Z" level=info msg="Loading containers: start."
Jul 09 05:45:29 alice-shouldi-contribute-0 dockerd[27732]: time="2022-07-09T05:45:29.459917630Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Jul 09 05:45:29 alice-shouldi-contribute-0 dockerd[27732]: time="2022-07-09T05:45:29.518683858Z" level=info msg="Loading containers: done."
Jul 09 05:45:29 alice-shouldi-contribute-0 dockerd[27732]: time="2022-07-09T05:45:29.542989855Z" level=info msg="Docker daemon" commit=a89b842 graphdriver(s)=overlay2 version=20.10.17
Jul 09 05:45:29 alice-shouldi-contribute-0 dockerd[27732]: time="2022-07-09T05:45:29.543173681Z" level=info msg="Daemon has completed initialization"
Jul 09 05:45:29 alice-shouldi-contribute-0 systemd[1]: Started Docker Application Container Engine.
Jul 09 05:45:29 alice-shouldi-contribute-0 dockerd[27732]: time="2022-07-09T05:45:29.575936377Z" level=info msg="API listen on /run/docker.sock"

Add the non-root user to the docker group

# usermod -aG docker pdxjohnny

Now leave the root session.

Configure git by copying over creds to new user, run the following on your local machine.

$ (cd ~ && tar -c .gitconfig .config/gh | ssh -i ~/.ssh/nahdig -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o PasswordAuthentication=no [email protected] tar -xv)

Then log into the VM via ssh as the new user.

$ ssh -i ~/.ssh/nahdig -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o PasswordAuthentication=no [email protected]

Install dotfiles.

References:

  • https://github.com/pdxjohnny/pdxjohnny.github.io/blob/dev/content/posts/dev-environment.md#new-dev-box-bring-up
$ git config --global user.name "John Andersen"
$ git config --global user.email [email protected]
$ git clone https://github.com/pdxjohnny/dotfiles ~/.dotfiles
$ cd ~/.dotfiles
$ ./install.sh
$ echo 'source "${HOME}/.pdxjohnnyrc"' | tee -a ~/.bashrc
$ dotfiles_branch=$(hostname)-$(date "+%4Y-%m-%d-%H-%M")
$ git checkout -b $dotfiles_branch
$ sed -i "s/Dot Files/Dot Files: $dotfiles_branch/g" README.md
$ git commit -sam "Initial auto-tailor for $(hostname)"
$ git push --set-upstream origin $dotfiles_branch

Close out the SSH session, SSH back into the host recording via asciinema's stored locally. Pass -t to ssh to allocate a pty for full interactive functionally when running a command on ssh (tmux after the hostname).

$ python -m asciinema rec --idle-time-limit 0.5 --title "$(date +%4Y-%m-%d-%H-%M-%ss)" --command "ssh -t -i ~/.ssh/nahdig -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o PasswordAuthentication=no [email protected] tmux" >(xz --stdout - > "$HOME/asciinema/rec-$(hostname)-$(date +%4Y-%m-%d-%H-%M-%ss).json.xz")

Update Python packaging core packages

$ python3 -m pip install -U pip setuptools wheel

Install docker-compose

$ python3 -m pip install docker-compose

Add the pip installed scripts to the PATH

References:

  • https://stackoverflow.com/a/62167797

Find the location of the installed script directory and replace the homedir literal with the HOME variable via sed. Add the addition to the PATH to the .bashrc file or .bash_profile on some distros. Restart the shell to pick up changes and allow us to run docker-compose.

$ python_bin=$(python3 -c 'import os,sysconfig;print(sysconfig.get_path("scripts",f"{os.name}_user"))' | sed -e "s#${HOME}#\${HOME}#g")
$ echo "export PATH=\"\$PATH:$python_bin\"" | tee -a ~/.bashrc
$ exec bash

We should now see the directory containing docker-compose at the end of our PATH.

$ echo $PATH
/home/pdxjohnny/.local/bin:/home/pdxjohnny/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/pdxjohnny/.bin:/home/pdxjohnny/.bin:/home/pdxjohnny/.local/bin:/home/pdxjohnny/.local/bin
  • Successful run of tutorial: https://frankhinek.com/getting-started-with-tbds-ssi-service/
  • https://github.com/TBD54566975/ssi-service
    • The Self Sovereign Identity Service (SSIS) facilitates all things relating to DIDs and Verifiable Credentials -- in a box! The service is a part of a larger Decentralized Web Platform architecture which you can learn more about in our collaboration repo. The SSI Service is a RESTful web service that wraps the ssi-sdk. The core functionality of the SSIS includes, but is not limited to: interacting with the standards around Verifiable Credentials, Credential Revocations, requesting Credentials, exchanging Credentials, data schemas for Credentials and other verifiable data, messaging using Decentralized Web Nodes, and usage of Decentralized Identifiers. Using these core standards, the SSIS enables robust functionality to facilitate all verifiable interactions such as creating, signing, issuing, curating, requesting, revoking, exchanging, validating, verifying credentials in varying degrees of complexity.

SSI Service high level architecture from their docs

Collector Result Storage

  • Metadata
    • Date: 2022-07-08 11:35 UTC -7
  • Decentralized Web Node (DWN) messaging not complete yet
    • Will loop back when it is
    • For now we will upload results to a DigitalOcean "space" which is publicly readable
  • Created https://results-alice-shouldi-contribute.sfo3.digitaloceanspaces.com/
    • Note: Do not use . in the name of the space! There is a bug with DigitalOcean where it accepts a name with . characters but that ends up with an endpoint with an invalid certificate and the upload APIs don't work either.
  • https://docs.digitalocean.com/reference/api/spaces-api/
    • Trying to figure out how to upload files
    • Will write a source after we know how, then run collector flow (alice shouldi contribute)
    • https://docs.digitalocean.com/reference/api/create-personal-access-token/
      • Create an access token

Export the token as a variable within the server tmux shell session.

$ export DIGITALOCEAN_ACCESS_TOKEN=asdjfojdf9j82efknm9dsfjsdf

Install Python library for interfacing with DigitalOcean

References:

  • https://docs.digitalocean.com/reference/libraries/
  • https://www.digitalocean.com/community/tools/python-digitalocean
  • https://github.com/koalalorenzo/python-digitalocean
  • https://github.com/intel/dffml/pull/1244/files
    • We used this in the automated DO infra scripts which we will use later
$ pip install -U python-digitalocean
  • Unclear how to use spaces API...
    • https://docs.digitalocean.com/reference/api/api-reference/
    • https://docs.digitalocean.com/reference/api/spaces-api/
    • https://docs.digitalocean.com/products/spaces/reference/s3-sdk-examples/

Install Python dependency to interact with Spaces API.

$ pip install boto3

Go to https://cloud.digitalocean.com/account/api/tokens. Create the Space key with the name reflecting the scope pdxjohnny.results.contribute.shouldi.alice.nahdig.com. Export the key and secret as variables so we can use them later when we write the source and run the collector.

DO NOT use . in DigitalOcean names for Spaces or tokens / space key names. It seems fine at first but breaks under the hood when you go to use it.

$ export SPACES_KEY=sdfjfjasdofj0iew
$ export SPACES_SECRET=3j41ioj239012j3k12j3k12jlkj2

Write a Python script to attempt to query the space contents.

upload_static_file_contents_to_space.py

import os
import boto3

session = boto3.session.Session()
client = session.client(
    "s3",
    region_name="sfo3",
    endpoint_url="https://sfo3.digitaloceanspaces.com",
    aws_access_key_id=os.getenv("SPACES_KEY"),
    aws_secret_access_key=os.getenv("SPACES_SECRET"),
)

response = client.list_buckets()
spaces = [space["Name"] for space in response["Buckets"]]
print("Spaces List: %s" % spaces)

# Step 3: Call the put_object command and specify the file to upload.
client.put_object(
    Bucket="results-alice-shouldi-contribute",  # The path to the directory you want to upload the object to, starting with your Space name.
    Key="collector.txt",  # Object key, referenced whenever you want to access this file later.
    Body=b"{SDflkasdj}",  # The object's contents.
    ACL="public-read",  # Defines Access-control List (ACL) permissions, such as private or public.
)

Run the upload.

$ python3 upload_static_file_contents_to_space.py
Spaces List: ['results-alice-shouldi-contribute']

Boto3 Source

  • Metadata
    • Date: 2022-07-09 00:20 UTC -7
  • Convert existing upload POC to asyncio
    • Find asyncio boto3 calls
    • It's a separate library
    • https://aioboto3.readthedocs.io/en/latest/installation.html

upload_static_file_contents_to_space_asyncio.py

import os
import asyncio

import aioboto3


async def main():
    session = aioboto3.Session()
    async with session.client(
        "s3",
        region_name="sfo3",
        endpoint_url="https://sfo3.digitaloceanspaces.com",
        aws_access_key_id=os.getenv("SPACES_KEY"),
        aws_secret_access_key=os.getenv("SPACES_SECRET"),
    ) as client:
        # Grab the list of buckets
        response = await client.list_buckets()
        spaces = [space["Name"] for space in response["Buckets"]]
        print("Spaces List: %s" % spaces)
        # Call the put_object command and specify the file to upload.
        await client.put_object(
            Bucket="results-alice-shouldi-contribute",  # The path to the directory you want to upload the object to, starting with your Space name.
            Key="collector.txt",  # Object key, referenced whenever you want to access this file later.
            Body=b"{SDflkasdj}",  # The object's contents.
            ACL="public-read",  # Defines Access-control List (ACL) permissions, such as private or public.
        )

if __name__ == "__main__":
    asyncio.run(main())

See if it works, it does!

$ python3 upload_static_file_contents_to_space_asyncio.py
Spaces List: ['results-alice-shouldi-contribute']

We now begin creating a source based off MemorySource to save record data as a JSON file.

aioboto3_dffml_source.py

"""
Source for storing and retrieving data from S3
"""
import os
import string
import asyncio
from typing import Dict, List, AsyncIterator

import aioboto3

from dffml import (
    config,
    field,
    Record,
    BaseSourceContext,
    BaseSource,
    entrypoint,
    export,
)


class Boto3SourceContext(BaseSourceContext):
    async def update(self, record):
        await self.parent.client.put_object(
            Bucket=self.parent.config.bucket,
            Key="".join(
                [
                    character
                    for character in record.key.lower()
                    if character in string.ascii_lowercase
                ]
            )
            + ".json",
            Body=json.dumps(export(record)),
            ACL=self.parent.config.acl,
        )

    async def records(self) -> AsyncIterator[Record]:
        pass

    async def record(self, key: str) -> Record:
        return Record(key)


@config
class Boto3SourceConfig:
    """
    References:
    - https://aioboto3.readthedocs.io/en/latest/usage.html
    """
    region_name: str
    endpoint_url: str
    aws_access_key_id: str
    aws_secret_access_key: str
    bucket: str
    acl: str = field(
        "Permissions level required for others to access. Options: private|public-read",
        default="private",
    )


@entrypoint("boto3")
class Boto3Source(BaseSource):
    """
    Uploads a record to S3 style storage
    """

    CONFIG = Boto3SourceConfig
    CONTEXT = Boto3SourceContext

    async def __aenter__(self) -> "Boto3Source":
        await super().__aenter__(config)
        self.session = aioboto3.Session()
        self.client = await self.session.client(
            "s3",
            region_name=self.config.region_name,
            endpoint_url=self.config.endpoint_url,
            aws_access_key_id=self.config.aws_access_key_id,
            aws_secret_access_key=self.config.aws_secret_access_key,
        ).__aenter__()
        return self

    async def __aexit__(self, _exc_type, _exc_value, _traceback) -> None:
        await self.client.__aexit__(None, None, None)
        self.client = None
        self.session = None


import dffml.noasync

dffml.noasync.save(
    Boto3Source(
        bucket="results-alice-shouldi-contribute",
        region_name="sfo3",
        endpoint_url="https://sfo3.digitaloceanspaces.com",
        aws_access_key_id=os.getenv("SPACES_KEY"),
        aws_secret_access_key=os.getenv("SPACES_SECRET"),
        acl="public-read",
    ),
    Record(
        key="https://github.com/pdxjohnny/httptest",
        features={
            "hello": "world",
        },
    ),
)
  • Realized we could do this as an operation
    • https://intel.github.io/dffml/main/examples/shouldi.html
    • overlays/alice/shouldi/contribute/upload_collector_output_to_bucket.py

overlays/alice/shouldi/contribute/upload_collector_output_to_bucket.py

import os
import json
import string
import asyncio
import contextlib

import aioboto3
import aiobotocore.client

import dffml


AioBoto3Client = NewType("AioBoto3Client", aiobotocore.client.AioBaseClient)
AioBoto3RegionName = NewType("AioBoto3RegionName", str)
AioBoto3EndpointUrl = NewType("AioBoto3EndpointUrl", str)
AioBoto3AWSKeyId = NewType("AioBoto3AWSKeyId", str)
AioBoto3AWSAccessKey = NewType("AioBoto3AWSAccessKey", str)
AioBoto3AWSACL = NewType("AioBoto3AWSACL", str)
AioBoto3Bucket = NewType("AioBoto3Bucket", str)


MINIOServerShouldStart = NewType("MINIOServerShouldStart", bool)


@contextlib.asynccontextmanager
async def minio_server(
    should_start: MINIOServerShouldStart,
) -> AioBoto3EndpointUrl:
    # Bail out if not wanted, effectively auto start if wanted. Inclusion of this
    # operation within an overlay with the current overlay mechanisms at load
    # time happening in dffml_operations_innersource.cli and alice.cli for
    # shouldi and please contribute results in the operation getting combined
    # with the rest prior to first call to DataFlow.auto_flow.
    if not should_start:
        return
    with tempfile.TemporaryDirectory() as tempdir:
        # TODO Audit does this kill the container successfully aka clean it up
        # TODO We have no logger, can we pull from stack if we are in
        # MemoryOrchestrator?
        async for event, result in dffml.run_command_events(
            [
                "docker",
                "run",
                "quay.io/minio/minio",
                "server",
                "/data",
                "--console-address",
                ":9001",
            ],
            events=[
                dffml.Subprocess.STDOUT_READLINE,
                dffml.Subprocess.STDERR_READLINE,
            ],
        ):
            if (
                event is dffml.Subprocess.STDOUT_READLINE
                and result.startswith("API:")
            ):
                # API: http://172.17.0.2:9000  http://127.0.0.1:9000
                yield result.split()[1]


# **TODO** We have parsers for numpy style docstrings to config classes which
# can help us what was previously help field() argument.
@contextlib.asynccontextmanager
async def bucket_client_connect(
    endpoint_url: AioBoto3EndpointUrl,
    region_name: AioBoto3RegionName = None,
    aws_access_key_id: AioBoto3AWSKeyId = None,
    aws_secret_access_key: AioBoto3AWSAccessKey = None,
    acl: AioBoto3AWSACL = "private",
) -> AioBoto3Client:
    """
    Connect to an S3 bucket.

    References:
    - https://aioboto3.readthedocs.io/en/latest/usage.html

    This is the short description.

    This is the longer description.

    Parameters
    ----------
    acl : str
        Permissions level required for others to access. Options: private|public-read

    Returns
    -------
    str_super_cool_arg : AioBoto3Client
        The aiobotocore.client.AioBaseClient object

    Examples
    --------

    >>> async with connect_bucket_client(
    ...     region_name: str,
    ...     endpoint_url: str,
    ...     aws_access_key_id: str,
    ...     aws_secret_access_key: str,
    ...     acl: str = "private",
    ...     ,
    ... ) as client:
    ...
    """
    session = aioboto3.Session()
    async with session.client(
        "s3",
        region_name=config.region_name,
        endpoint_url=config.endpoint_url,
        aws_access_key_id=config.aws_access_key_id,
        aws_secret_access_key=config.aws_secret_access_key,
    ) as client:
        # Grab the list of buckets
        response = await client.list_buckets()
        buckets = [bucket["Name"] for bucket in response["Buckets"]]
        print("Buckets List: %s" % buckets)
        # Client initialization complete
        yield client


"""
# Old style runs into the issue where how do we provide the
# config server URL dynamically? So we experimented with this
# operation based approach with objects as inputs.
@dffml.op(
    inputs={"results": dffml.group_by_output},
    stage=dffml.Stage.OUTPUT,
    imp_enter={
        "client": (lambda self: aiohttp.ClientSession(trust_env=True))
    },
)
"""
async def upload_to_bucket(
    client: AioBoto3Client,
    bucket: AioBoto3Bucket,
    repo_url: dffml_feature_git.feature.defintions.URL,
    results: dffml.group_by_output,
) -> None:
    await client.put_object(
        Bucket=bucket,
        # TODO(security) Ensure we don't have collisions
        # with two different repo URLs generating the same
        # filename, pretty sure the bellow code has that
        # as an active issue!!!
        Key="".join(
            [
                character
                for character in repo_url.lower()
                if character in string.ascii_lowercase
            ]
        )
        + ".json",
        Body=json.dumps(export(results)),
        ACL=acl,
    )

Playing With Aysnc Context Managers as Data Flows

  • Metadata
    • Date: 2022-07-09 12:48 UTC -7
  • We will take our last example and attempt to modify @op so that when it sees something decorated with @contextlib.asynccontextmanager it creates an operation which calls the first operation to yield then resumes from the yield in a cleanup operation. We need to end up with two operations, one that's a cleanup and one that's the stage matching it's inputs within the auto flow.
    • This means that DataFlow also needs to be modified to unpack lists of operations if it receives a list of operations in place of where it would usually take a single instance of an operation. This should be *args in DataFlow.__init__ as well as operations and implementations within the keyword arguments.
      • May need to check DataFlow._fromdict() and DataFlow.export() as well.

pdxjohnny avatar Jun 29 '22 21:06 pdxjohnny

Moved to https://github.com/intel/dffml/discussions/1406

pdxjohnny avatar Jul 18 '22 15:07 pdxjohnny

Refactoring and Thinking About Locking of Repos for Contributions

  • Metadata
    • Date: 2022-07-27 13:00 UTC -7

pdxjohnny avatar Jul 27 '22 21:07 pdxjohnny