jscpd icon indicating copy to clipboard operation
jscpd copied to clipboard

Pre-process lines being tokenized.

Open gk-patel opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. I have a few use cases, where I wish I could pre-process the line before it gets tokenized. Here is the usecase,

  1. For eg. sometime someone miss out copy pasting ";" at the end of the line and this goes undetected.
  2. Sometime an extra space is inserted and this gets undetected. In such cases, the pre-processing function could run a regEx to remove known nuances.

Another usecase, is to ignore certain lines from the processing itself. I know this can be done with the comments, but its not dynamic and we need to write these comments everywhere.

For eg, consider the "import" statements in the JS code. Many times they are similar between files and that is perfectly fine. But the CPD detectes it as a copy paste, which fails the build. So in this "pre-processing" funtion, there should be a possibility to return "null" for a line and then that line would be ignored. For eg, if line.includes("import") { return null } -> this is a simple usecase, but usecases could be complex.

Describe the solution you'd like In the "detectClones" function, I want to pass an argument "linePreProcessor" which is a function, that either return a "new line" (string) which shall be used for tokenizing or a "null" which means that the line is to be ignored from processing.

Describe alternatives you've considered None.

Additional context If you want help, I can help. Just point me to the right place and I can make a PR.

gk-patel avatar Aug 31 '21 13:08 gk-patel

Thank you for the issue, you can try to cover some tasks with mode

https://github.com/kucherenko/jscpd/blob/master/packages/core/src/mode.ts

https://github.com/kucherenko/jscpd/blob/4b569e694a399d8888c6d614b0d2c1f9a0506ea7/packages/core/src/interfaces/options.interface.ts#L13

Also I can say that tokenize process not work by lines

kucherenko avatar Aug 31 '21 14:08 kucherenko

Thank you for your quick reply. I checked out what you suggested. Unfortunately, this was only partially helpful. What I am looking for is maybe to invalidate (return false from mode function) for certain IToken Objects based on their neighbouring tokens. So, if the mode function could get the past N and the next N tokens, it would do the trick. (where N could be a config param). Would this be possible ?

gk-patel avatar Sep 03 '21 14:09 gk-patel