cti-python-stix2
cti-python-stix2 copied to clipboard
Instance fields should be mutable
At the moment I can not change values of fields after STIX2 object instance was created:
In [2]: m = stix2.Malware(name='test malware', labels=['foo', 'bar'])
In [3]: m.external_references = ['a']
---------------------------------------------------------------------------
ImmutableError Traceback (most recent call last)
<ipython-input-3-f8b616bebdc7> in <module>
----> 1 m.external_references = ['a']
~/.p/lib/python3.7/site-packages/stix2/base.py in __setattr__(self, name, value)
200 def __setattr__(self, name, value):
201 if not name.startswith("_"):
--> 202 raise ImmutableError(self.__class__, name)
203 super(_STIXBase, self).__setattr__(name, value)
204
ImmutableError: Cannot modify 'external_references' property in 'Malware' after creation.
I have to create instances only after I gathered all the necessary info. This limitation is very inconvenient because:
- in a complex packaging pipeline all information for the object is rarely available right away
- in order to optimise performance, I might choose to fetch data from DB and set specific fields on multiple STIX2 instances at the same time, after instances were created.
Proposed solution: STIX2 class instances should have mutable fields.
@traut - this is due to the requirements in the specification. It has to do with the way STIX describes versioning - see (https://docs.google.com/document/d/1ShNq4c3e1CkfANmD9O--mdZ5H0O_GLnjN28a_yrEaco/edit#heading=h.rye5q2hkacu).
However, I have always thought that this was clumsy for the API, for some of the reasons you describe. There should be some way to create an object "in memory", and then "finalize" it, which means you have created the new version which is now immutable.
My comment from another ticket stands here as well: I see no reason to enforce STIX2 spec on python objects, since enforcing it during serialisation would be enough. Until my objects serialised I think of them as non-finalised data, that will be finalised (and here your comment about version applies) during serialisation
@traut, this was an implementation decision - to treat python objects as fully realized STIX objects - which adheres fully to the specification. Remember, the json is just the serialization language and btw, STIX does NOT have to be serialized using JSON - see section 9.1, item 7, in part 1 of the spec.
However, as I said before, the API could support the use case of non-finalized data.
this was an implementation decision - to treat python objects as fully realized STIX objects
that's unfortunate and limits how I can use the library. At the moment I converted most of my code into code that works with dictionaries. Dictionaries can be freely modified and just a bit messier. I'm going to use stix2 lib only at the last step -- essentially reducing the usefulness of the library to a serialisation only. And for serialisation, I do not need stix2, I can use simple Marshmallow schemas.
Maybe it is time to re-evaluate that implementation decision.
Remember, the json is just the serialization language and btw, STIX does NOT have to be serialized using JSON - see section 9.1, item 7, in part 1 of the spec.
of course, but serialisation phase is still required -- STIX2 is not python objects. If you enforce STIX2 on a python object level
There is some related discussion in issue 177. That's about observables, but they're just another kind of immutable object. As described there, I have come to think of dicts as the "builder API" of stix2 objects. Just keep 'em as dicts until they're finalized. If at that point all properties are set, then yeah, you can JSON serialize directly from that and never need the stix2 library.
The comment about empty lists/values is a smaller deal, and I think it's something we could implement. If a list-valued property is assigned an empty list, just drop the property (error if it's required, etc); if a non-list property is assigned None, behave similarly. Only possible complaint might be that people think their properties are being ignored and don't understand why, because they aren't familiar enough with the spec.
With this API we wanted to make it hard for people to create invalid STIX content. Changing attributes on an object without bumping the modified
property violates the spec. It was thought to be simpler to check all requirements when the object is created, rather than any time a property value is modified or when the object is "exported" (whether serialized or saved to a data store or whatever). If we had some sort of finalize()
function, some users will likely forget to use it. Checking requirements when the object is created avoids worrying about the object being in an inconsistent or invalid state.
Like you say, you can build up a dictionary and pass it as kwargs to the python-stix2 class constructor, which in effect "finalizes" them. You could also use new_version()
to change/add/remove properties after the object is created, if you don't mind that modified
gets updated as well.
Aside from setting/updating properties, which specific features of the library would you like to use on intermediate, "unfinalized" objects?
With this API we wanted to make it hard for people to create invalid STIX content.
is there a reason, apart from educating people on STIX2 spec, to have spec's limitations for python objects instead of enforcing them in serialisation step?
Changing attributes on an object without bumping the modified property violates the spec
this is valid for STIX content but python objects are not STIX content - serialisation of these objects is.
With this API we wanted to make it hard for people to create invalid STIX content.
is there a reason, apart from educating people on STIX2 spec, to have spec's limitations for python objects instead of enforcing them in serialisation step?
I think it is probably also about the human tendency to make mistakes, not just about education. Anyway, automatic enforcement at JSON serialization time is fine if you're working with JSON, but not useful if you aren't (I think that was @rpiazza's point). Obviously if an error message is produced, it would have equal educational value whether it happened at object construction time, or at JSON serialization time. I guess the idea is that error messages would be less likely (and content errors correspondingly more likely) if the library relied on users working with JSON (or to remember to call a finalize()
method) to trigger them.
Changing attributes on an object without bumping the modified property violates the spec
this is valid for STIX content but python objects are not STIX content - serialisation of these objects is.
If we want to get really technical, some spec requirements are on the JSON serialization, but not all are. For example, a required property is required whether you're serializing in JSON, XML, YAML, or something else. "STIX content" could more properly be defined as anything which satisfies the abstract graph model defined in the spec. The spec fundamentally describes a data model which is independent of any particular serialization.
So, it makes sense to trigger validation of the JSON-specific spec requirements during JSON serialization. Other requirements could be checked at JSON serialization time too, but that validation should be available to all serializations, and even when you're not serializing at all. How do we ensure that these latter requirements are satisfied?
But I totally understand that people will come to the stix2 library expecting it to help them incrementally build the nodes and edges of STIX graph models (and serialize them as JSON), and be puzzled and disappointed that it doesn't.
is there a reason, apart from educating people on STIX2 spec, to have spec's limitations for python objects instead of enforcing them in serialisation step?
In addition to what @chisholm explained, it's simpler to do the checks once (at creation time) instead of every time it's serialized or exported. It also makes the code easier to reason about and avoids potential holes (if someone exports the object in a way we didn't consider).
This particular issue has just cropped up for me. I understand the desire to stick to the spec, however, there is a number of cases where having immutable objects adds significant processing overhead. Take the following case:
We want to be able to share the same data to multiple partners, however each partner has their own identity for our org. The typical process would be to go and build the bundle without adding the created_by_ref
, then update the field just prior to building the specific partner's bundle.
With the objects being immutable, we have to create all new objects for each partner.