striplog icon indicating copy to clipboard operation
striplog copied to clipboard

`Component.from_text` not capturing all parts of text

Open Zabamund opened this issue 4 years ago • 1 comments

This method on Component seems to work fine in some cases but not always, here is an example:

from striplog import Component

sample0 = Component.from_text('Grey fine sandstone.', lexicon)
sample1 = Component.from_text('Light blue marl with interbedded shale with good shows', lexicon)

sample0 yields: image

while sample1 yields: image

Zabamund avatar May 07 '21 13:05 Zabamund

It just comes down to the lexicon. The text is parsed in a very naive way, and it's up to the user to compile an appropriate lexicon for their task.

That said, I think the default splitter 'with' should prevent components getting mixed like this. So that is a bug.

The other thing here is that 'marl' is not in the default lexicon, but 'mrl' is (as an abbreviation). If we compile a more comprehensize list for the 'lithology' part of the default lexicon, it's trivial to add it. So that could be an enhancement.

kwinkunks avatar May 07 '21 16:05 kwinkunks