parse5 icon indicating copy to clipboard operation
parse5 copied to clipboard

Types of things like `Document`, etc, seem not exported in v7

Open wooorm opened this issue 2 years ago • 10 comments

I use the API of p5 to get certain nodes. I then transform those nodes. For that, I need their types. In v7, it seems that they are not exported, compared to @types/parse5. Am I missing something? If not, is there a good reason they’re not exported? If not, can they be added?

wooorm avatar May 25 '22 19:05 wooorm

I seem to be able to access types with e.g. DefaultTreeAdapterMap['documentFragment'], though things like attributes and location are behind something like import('parse5/dist/common/token.js').Attribute, I wonder if there’s an easier way to expose everything?

wooorm avatar May 25 '22 19:05 wooorm

The current set of exports is guided by what's needed by the other parse5 packages. They always use generic adapters, with the default adapter being the, well, default. I'm down to export the adapter types outside of the map.

For attributes & locations: These transcend tree adapters & have other uses. There is a Token export that makes this a bit more accessible:

import { Token } from 'parse5';

const attrs: Token.Attribute[] = []
const loc: Token.Location = { ... }

fb55 avatar May 25 '22 20:05 fb55

I think it’s fair to say that other folks will want to walk and change the tree, so hence I’m for exposing them from a more visible place (and as part of semver).

I’m also finding that generating some nodes can be tough:

lib/index.js:63:5 - error TS2322: Type '"#document"' is not assignable to type 'NodeType.Document'.

63     nodeName: '#document',
       ~~~~~~~~

  node_modules/parse5/dist/tree-adapters/default.d.ts:13:5
    13     nodeName: NodeType.Document;
           ~~~~~~~~
    The expected type comes from property 'nodeName' which is declared here on type 'Document'

lib/index.js:64:5 - error TS2322: Type '"quirks" | "no-quirks"' is not assignable to type 'DOCUMENT_MODE'.
  Type '"quirks"' is not assignable to type 'DOCUMENT_MODE'.

64     mode: (node.data || {}).quirksMode ? 'quirks' : 'no-quirks',
       ~~~~

  node_modules/parse5/dist/tree-adapters/default.d.ts:18:5
    18     mode: DOCUMENT_MODE;
           ~~~~
    The expected type comes from property 'mode' which is declared here on type 'Document'

As far as I’m aware, such things should be fine?

wooorm avatar May 27 '22 17:05 wooorm

Oh, everything is TypeScript enums now. Could it be considered to use strings again, or a way to access those enums outside of TypeScript?

wooorm avatar May 28 '22 13:05 wooorm

TypeScript enums are exported as JS objects. Eg. NodeType.Document will also work in plain JS.

fb55 avatar May 28 '22 15:05 fb55

As far as I am aware everything is closed down in v7 and they are not exposed in the export map?

wooorm avatar May 28 '22 15:05 wooorm

True, the enums currently aren't exported. One workaround for now is to call the adapter methods for creating nodes, but (as said before) I'm not opposed to exporting everything that's needed.

fb55 avatar May 28 '22 15:05 fb55

i do agree the enums can slow people down at times. its very easy to end up with a string rather than a string literal type, which is what wooorm saw in his example.

personally i'd avoid enums for the node types since they're a very common thing to want to construct on the fly.

node types should be imported from the adapter you use i think, btw. though we should probably export all the default ones from somewhere easy

43081j avatar Jun 03 '22 13:06 43081j

@fb55 @wooorm the stuff mentioned above hit me today when trying to migrate lit to using the new tools package (WIP).

e.g.

const node: Document = {
  nodeName: '#document',
  mode: 'no-quirks',
  childNodes: []
};

this will error because we must use the two enums (DOCUMENT_MODE and NodeType).

im not sure if those enums are exported top-level, but even if they are this seems a little painful especially in the case of well known string unions like mode.

if we typed it as a string union instead, but still had a type for it, that'd be much easier IMO (e.g. export type DocumentMode = 'no-quirks'|'quirks'|'etc etc';)

43081j avatar Jul 15 '22 20:07 43081j

Not opposed to making this a string union 👍

fb55 avatar Jul 22 '22 20:07 fb55

The linked PR only exported error codes - not sure why it closed this issue? Seems totally unrelated

SimenB avatar Oct 21 '22 09:10 SimenB

@SimenB it was github trying to be smart, the issue mentioned fix so it auto closed this

though i do wonder if its worth us making a more specific issue, if there still is one (since this covered a few things).

you can import the types from the default tree adapter afaik (dist/tree-adapters/default.js). doubt its in the package exports but it'll work for types at least. then pull the tree adapter itself from the index (it is exported in the root).

what is there that isn't exported that you want to get hold of?

43081j avatar Oct 21 '22 09:10 43081j

The default tree adapter is in the package exports, no need for a deep import.

I am not sure what this issue is about specifically, it seems like several of the biggest concerns were dealt with already. Having some more specific issues replace this would be much appreciated.

fb55 avatar Oct 21 '22 09:10 fb55

This was a bit of a catch all issue for me, right after I tried to upgrade, about the things I was missing. Some has been dealt with already!

I finished upgrading now and the primary points I ran into:

  1. Enums are hard to use, particularly for NS and such
  2. State is not exposed as far as I am aware, because it’s an enum, I now have to pass e.g., 72, which will probably break if ever another state was added
  3. A lot is marked as private, while a lot of other fields are public, and there doesn’t seem to be a clear reason behind each choice — so I have a ton of // @ts-expect-errors. E.g., tokenizer.state is public, tokenizer.returnState is private. Of course, I understand that it’s dangerous to mingle with both these fields, but I hope that both are also not likely to be renamed in a patch release or so. So perhaps more fields can drop their privates?
  4. A lot’s been shuffled around and it’s not really clear what APIs/types are covered by semver. I think it’s good to document which types are in the “contract”, while the rest could arbitrarily break because they‘re shuffled around more/no longer exposed?

Maybe @SimenB has more?

wooorm avatar Oct 27 '22 13:10 wooorm

Enums are hard to use, particularly for NS and such

I can see that. We can transition more of these to string types as there is a need.

State is not exposed as far as I am aware, because it’s an enum, I now have to pass e.g., 72, which will probably break if ever another state was added

We do export a TokenizerMode object that should include all states an external party would like to switch to. This set of states were static props on the tokenizer before; I am not certain if other states were exported previously.

Eg. the parser feedback simulator uses TokenizerMode to update states:

https://github.com/inikulin/parse5/blob/1a505fa66e4158500610e1e39be795cdb95a390c/packages/parse5-sax-parser/lib/parser-feedback-simulator.ts#L100-L102

A lot is marked as private, while a lot of other fields are public, and there doesn’t seem to be a clear reason behind each choice — so I have a ton of // @ts-expect-errors. E.g., tokenizer.state is public, tokenizer.returnState is private. Of course, I understand that it’s dangerous to mingle with both these fields, but I hope that both are also not likely to be renamed in a patch release or so. So perhaps more fields can drop their privates?

My rule of thumb was that any props that aren't used by other modules should be private. Eg. returnState is only used for entity parsing, and none of the other packages in this repo hook into that behaviour.

More fields can drop the private; feel free to open a PR or otherwise suggest which properties should be changed.

A lot’s been shuffled around and it’s not really clear what APIs/types are covered by semver. I think it’s good to document which types are in the “contract”, while the rest could arbitrarily break because they‘re shuffled around more/no longer exposed?

In my mind, everything that is exported and not explicitly marked private will only change on semver major updates. This is already necessitated by parse5's packages relying on them, as a not-semver major change would break them. Marking private props that are used elsewhere as public will help keep other packages working as well.

Hope that makes sense!

fb55 avatar Oct 28 '22 09:10 fb55