about icon indicating copy to clipboard operation
about copied to clipboard

As a interested "org" I want to see example of code bases before and after

Open mjy opened this issue 3 years ago • 8 comments

I understand, and completely concur with the need to develop human resources (i.e. community) around a code base. However, I want to balance that cost with the cost of making our code-base better. In other words engaging with third party organizations has a cost. How do I reconcile this cost with the benefits that the organization will receive in the short, mid, and long term?

Perhaps one concrete example to encourage those on the "developer" side to engage will be to show simple diffs of code repositories, before and after. This visualization will "speak" to developers, and encourage them to look further into aspects of community building they might not have considered. It will also act as teaching resource.

I'd propose, therefor, that the sooner I can visualize the difference in a code-base before it was stewarded by this org, and after the org felt it made an impact, in a way that illustrates all the good bits you reference (documentation, styles, CI, etc. etc.), across a complete project, the more likely I am to be convinced it's worth while spending energy looking further into those things this org promotes.

mjy avatar Dec 01 '21 03:12 mjy

Thanks for your contribution! That is indeed an interesting idea on how to visualize the effects of codebase stewardship. It might not be a simple task though to attribute all the diffs that are a result of stewardship.

Let's say a community as a result of stewardship formalize an engineering guideline. It might be easy to diff the actual documentation, and possibly even if there are some commits that are cleaning up code and aligning it. But the real value will be in the future, when commits and PRs from all contributors are being submitted along the guidelines from the start, saving everybody time in reviewing the code. And that would be very hard to create a diff for, as you can't know for sure how the contribution would have looked like without the guidelines. The same goes for many of the requirements in the Standard for Public Code, as it is meant to make future collaboration easier.

That being said, it would be very compelling to make case studies on codebases that become compliant with the Standard for Public Code through our stewardship. As no codebases have been certified yet, we are too early, but let's revisit this later.

Ainali avatar Dec 01 '21 08:12 Ainali

And that would be very hard to create a diff for, as you can't know for sure how the contribution would have looked like without the guidelines.

In the interests of wanting to see this org succeed, minor followup. To an outsider this feels a little like protecting the org from real scrutiny. It is certainly true for any one project and any one change but imagine 10s of thousands of code-bases, don't we enter a realm where enough samples demonstrate clear differences? Think statistics, and long term. One project won't show a pattern as you note, but if you're successful then as the number of projects grows there should be significant signal? You will want to be able to show (detect/recognize) the emergence (or absence) of this signal ("diff"!) in any one project, if anything in part to protect and scale the resources available to the org (e.g. "we tried, we really tried, but according to the code <points at "diff"> they won't listen, so we're not stewarding them any more").

A long-winded way of saying +1 for concentrating resources on this:

... it would be very compelling to make case studies on codebases that become compliant with the Standard for Public Code through our stewardship

mjy avatar Dec 01 '21 12:12 mjy

Yes, if we imagine tens of thousands of codebases, it would be easier to see some effect. But then again, for such a measure to make sense, we would also need to compare it to a control group that we did not steward to be able to see what impact our stewardship possibly had compared to a generally maturing codebase.

Hard numbers and diffs are of course not the only way to capture the impact. We could also collect stories from the communities we work with. People from a new organization might trust their peers better than us, especially if we provide ways to contact them bilaterally.

Ainali avatar Dec 01 '21 12:12 Ainali

+1 for stories. I think of diffs, or pointers to code, as a way of telling a story that resonates with part of the community that sustains code, those who write code.

The scenario here is markedly different that other charity-type/non-profit groups- everything is out in the open by definition, not hidden behind a org facade of hype-speak. This means that coming from an org perspective I personally immediately wanted to cut to the chase- who are they working with, and what does their code look like. I don't have to trust that you're feeding starving children, I can go see the full bellies myself (and better yet, assess how those bellies got full over time). I know enough to do the diff myself, and I know enough to not assume the stewards are making a difference, others could use shepherding to get to this point, and of course we could use shepherding to "get us to the next level."

Even very basic metadata that reflect the intersection of this org and that code-base, presented in a friendly way, would help tell the story I'm interested in hearing, given my level of experience:

  • We started stewardship at this SHA/checksum commit
  • This is the checksum range where the codebase met org with us for a documentation workshop
  • This is the number of commits before and after we engaged
  • This is the SHA where they introduced CI pattern that we helped illustrate here
  • These are exemplary PRs that occurred after we engaged
  • etc.

Other metadata is already part of what Github encourages throughout and in theory you could use CI to draw these data into your profiles/stories:

  • Here is the CONTRIBUTORS.md
  • Here is the development branch; here is the main branch
  • Here are the milestones
  • Here is the CODE_OF_CONDUCT.md
  • etc.

It strikes me that every single aspect of the Standard could (should?) have a metadata profile with it that points to code?

mjy avatar Dec 01 '21 15:12 mjy

Thank you! Those bullet point lists are useful, and aligned with our current thoughts, we have just not made user-friendly material for this yet.

However, I noticed that in your first and last comments mention some sort of 'after' state. When we engage in a codebase, it is with the intention to do it for the rest of their life cycle. That means that from our perspective 'after' will be when there is no longer a community around to maintain the codebase, and it is effectively retired.

It strikes me that every single aspect of the Standard could (should?) have a metadata profile with it that points to code?

I find this interesting, but I am not totally sure what you mean. Is it something along the way we track the progress for a codebase right now, for example, like here in Signalen?

Ainali avatar Dec 01 '21 15:12 Ainali

I find this interesting, but I am not totally sure what you mean. Is it something along the way we track the progress for a code-base right now, for example, like here in Signalen?

Yes, I think so, frankly I'm not sure either. Ideally you want to track many thing auto-magically, using a CI pipeline (according to the principles you are promoting ;)). Github has "Insights", specifically a community section, but also other data built in, e.g. https://github.com/SpeciesFileGroup/taxonworks/community. These data may be accessible via their API, if not a CI engine that replicates these checks is the idea.

So, imagine I have a template repository. It contains a YAML, or other simple text file with a few prompts.

I clone it for my org, and fill out the first prompt in config.yml

---
source_repository:  https://github.com/SpeciesFileGroup/taxonworks.git

I commit. This triggers a CI process (thanks to the template repo) that does a bunch of things:

  • It scaffolds my report page, like https://github.com/Amsterdam/signals/blob/master/docs/topics/signalen-and-standard-for-public-code.md
  • Where possible, it starts automatically assigning check-marks back into the config, same concept as in Github community profiles. As your AI/CI intelligence gets better more and more check-boxes you reference can be auto-filled.
  • It updates the config file itself, with things it has learned, so that the next time it is edited it manually, I can be prompted for things that it couldn't figure out. Non-standard place for LICENSE.md? Link to it here -> license: ..., first_sha_of_steward_interaction: AFF881, etc. As often as possible line items in your metadata profile link to actual files in the code repo, to exemplify "compliance". Human prompts are added intelligently, a little at a time, so that I'm not overwhelmed, i.e. the human editor is exposed to questions that would let the AI/CI maximize its inference capabilities and ability to fill out the metadata itself.

mjy avatar Dec 01 '21 19:12 mjy

I guess the details are in the "auto-magically". Since the Standard for Public Code is trying to not be prescribing any particular technology or platform, it makes it tough to create something that covers everything that is out there. Of course, making a tool that just covers a few platforms may help our own work long-term, but it is still quite an investment, so we'll need to see a few more codebases making a commitment to the standard to make it save time for us. Another reason we haven't prioritized it yet, is that most of the requirements needs a steward-in-the-loop anyway. But as more communities get interested in adhering to the standard, automation will of course be a good help. I think identifying the requirements that could be automated today on some platform would be a good start and set us on the right path.

Ainali avatar Dec 02 '21 07:12 Ainali

Another reason we haven't prioritized it yet, is that most of the requirements needs a steward-in-the-loop anyway.

Needing a human to evaluate the criteria is something that we continue to re-examine over time, e.g.:

  • How should we certify compliance?
    • https://github.com/publiccodenet/publiccode.net/issues/269
    • https://blog.publiccode.net/community%20call/2021/02/11/notes-from-community-call-4-february-2021.html
  • Add section on relation to coreinfrastructure best-practice-badge
    • https://github.com/publiccodenet/standard/issues/200

One idea was to create "levels", maybe "bronze" could be things which can be checked with a machine, maybe "silver" would be the most important items checked machine-or-not, and maybe "gold" would be everything. While that idea had some initial traction, it didn't help address the real underlying need to help organizations collaborate while procuring development/deployment services which result in reliable systems.

Regardless, to the main topic of the thread, while we can't do a rigorous study that would stand up to much academic examination, reviewing the impact of our stewardship and publishing what we learn will both help us improve and also help others think about stewardship as well.

ericherman avatar Dec 02 '21 08:12 ericherman

There are some good ideas here about automation, but I think for the future, that would be better in the Implementation guide or in the docs of the Standard itself. I will close this issue as we are not planning to do any updates to this repository.

Ainali avatar Mar 05 '24 10:03 Ainali