parse5
parse5 copied to clipboard
Types of things like `Document`, etc, seem not exported in v7
I use the API of p5 to get certain nodes. I then transform those nodes. For that, I need their types. In v7, it seems that they are not exported, compared to @types/parse5
.
Am I missing something? If not, is there a good reason they’re not exported? If not, can they be added?
I seem to be able to access types with e.g. DefaultTreeAdapterMap['documentFragment']
, though things like attributes and location are behind something like import('parse5/dist/common/token.js').Attribute
, I wonder if there’s an easier way to expose everything?
The current set of exports is guided by what's needed by the other parse5 packages. They always use generic adapters, with the default adapter being the, well, default. I'm down to export the adapter types outside of the map.
For attributes & locations: These transcend tree adapters & have other uses. There is a Token
export that makes this a bit more accessible:
import { Token } from 'parse5';
const attrs: Token.Attribute[] = []
const loc: Token.Location = { ... }
I think it’s fair to say that other folks will want to walk and change the tree, so hence I’m for exposing them from a more visible place (and as part of semver).
I’m also finding that generating some nodes can be tough:
lib/index.js:63:5 - error TS2322: Type '"#document"' is not assignable to type 'NodeType.Document'.
63 nodeName: '#document',
~~~~~~~~
node_modules/parse5/dist/tree-adapters/default.d.ts:13:5
13 nodeName: NodeType.Document;
~~~~~~~~
The expected type comes from property 'nodeName' which is declared here on type 'Document'
lib/index.js:64:5 - error TS2322: Type '"quirks" | "no-quirks"' is not assignable to type 'DOCUMENT_MODE'.
Type '"quirks"' is not assignable to type 'DOCUMENT_MODE'.
64 mode: (node.data || {}).quirksMode ? 'quirks' : 'no-quirks',
~~~~
node_modules/parse5/dist/tree-adapters/default.d.ts:18:5
18 mode: DOCUMENT_MODE;
~~~~
The expected type comes from property 'mode' which is declared here on type 'Document'
As far as I’m aware, such things should be fine?
Oh, everything is TypeScript enums now. Could it be considered to use strings again, or a way to access those enums outside of TypeScript?
TypeScript enums are exported as JS objects. Eg. NodeType.Document
will also work in plain JS.
As far as I am aware everything is closed down in v7 and they are not exposed in the export map?
True, the enums currently aren't exported. One workaround for now is to call the adapter methods for creating nodes, but (as said before) I'm not opposed to exporting everything that's needed.
i do agree the enums can slow people down at times. its very easy to end up with a string rather than a string literal type, which is what wooorm saw in his example.
personally i'd avoid enums for the node types since they're a very common thing to want to construct on the fly.
node types should be imported from the adapter you use i think, btw. though we should probably export all the default ones from somewhere easy
@fb55 @wooorm the stuff mentioned above hit me today when trying to migrate lit to using the new tools package (WIP).
e.g.
const node: Document = {
nodeName: '#document',
mode: 'no-quirks',
childNodes: []
};
this will error because we must use the two enums (DOCUMENT_MODE
and NodeType
).
im not sure if those enums are exported top-level, but even if they are this seems a little painful especially in the case of well known string unions like mode
.
if we typed it as a string union instead, but still had a type for it, that'd be much easier IMO (e.g. export type DocumentMode = 'no-quirks'|'quirks'|'etc etc';
)
Not opposed to making this a string union 👍
The linked PR only exported error codes - not sure why it closed this issue? Seems totally unrelated
@SimenB it was github trying to be smart, the issue mentioned fix
so it auto closed this
though i do wonder if its worth us making a more specific issue, if there still is one (since this covered a few things).
you can import the types from the default tree adapter afaik (dist/tree-adapters/default.js
). doubt its in the package exports but it'll work for types at least. then pull the tree adapter itself from the index (it is exported in the root).
what is there that isn't exported that you want to get hold of?
The default tree adapter is in the package exports, no need for a deep import.
I am not sure what this issue is about specifically, it seems like several of the biggest concerns were dealt with already. Having some more specific issues replace this would be much appreciated.
This was a bit of a catch all issue for me, right after I tried to upgrade, about the things I was missing. Some has been dealt with already!
I finished upgrading now and the primary points I ran into:
- Enums are hard to use, particularly for
NS
and such -
State
is not exposed as far as I am aware, because it’s an enum, I now have to pass e.g.,72
, which will probably break if ever another state was added - A lot is marked as private, while a lot of other fields are public, and there doesn’t seem to be a clear reason behind each choice — so I have a ton of
// @ts-expect-error
s. E.g.,tokenizer.state
is public,tokenizer.returnState
is private. Of course, I understand that it’s dangerous to mingle with both these fields, but I hope that both are also not likely to be renamed in a patch release or so. So perhaps more fields can drop theirprivate
s? - A lot’s been shuffled around and it’s not really clear what APIs/types are covered by semver. I think it’s good to document which types are in the “contract”, while the rest could arbitrarily break because they‘re shuffled around more/no longer exposed?
Maybe @SimenB has more?
Enums are hard to use, particularly for
NS
and such
I can see that. We can transition more of these to string types as there is a need.
State is not exposed as far as I am aware, because it’s an enum, I now have to pass e.g.,
72
, which will probably break if ever another state was added
We do export a TokenizerMode
object that should include all states an external party would like to switch to. This set of states were static props on the tokenizer before; I am not certain if other states were exported previously.
Eg. the parser feedback simulator uses TokenizerMode
to update states:
https://github.com/inikulin/parse5/blob/1a505fa66e4158500610e1e39be795cdb95a390c/packages/parse5-sax-parser/lib/parser-feedback-simulator.ts#L100-L102
A lot is marked as private, while a lot of other fields are public, and there doesn’t seem to be a clear reason behind each choice — so I have a ton of
// @ts-expect-errors
. E.g.,tokenizer.state
is public,tokenizer.returnState
is private. Of course, I understand that it’s dangerous to mingle with both these fields, but I hope that both are also not likely to be renamed in a patch release or so. So perhaps more fields can drop theirprivate
s?
My rule of thumb was that any props that aren't used by other modules should be private. Eg. returnState
is only used for entity parsing, and none of the other packages in this repo hook into that behaviour.
More fields can drop the private
; feel free to open a PR or otherwise suggest which properties should be changed.
A lot’s been shuffled around and it’s not really clear what APIs/types are covered by semver. I think it’s good to document which types are in the “contract”, while the rest could arbitrarily break because they‘re shuffled around more/no longer exposed?
In my mind, everything that is exported and not explicitly marked private will only change on semver major updates. This is already necessitated by parse5's packages relying on them, as a not-semver major change would break them. Marking private props that are used elsewhere as public will help keep other packages working as well.
Hope that makes sense!