text icon indicating copy to clipboard operation
text copied to clipboard

Markdown files from external editors lose formatting on save

Open inthreedee opened this issue 5 years ago • 122 comments

Summary Markdown files that are created in external markdown and text apps will have their formatting completely overridden and/or lost when edited through the Nextcloud web interface. Newlines, tabs, and spaces are thrown out everywhere in the document--not just on the edited line--and lists will be changed to asterisks instead of dashes. See screenshots below.

Background I use external text editors for my markdown files and then sync them to my Nexctloud. I primarly use Atom for its markdown support but will also use basic text editors like Gedit on Linux or Notepad on Windows. That's the great thing about markdown files, after all, they're just plain text. When I edit these files through my Nextcloud web interface, the formatting of the entire file is destroyed.

Related: https://github.com/nextcloud/text/issues/390

To Reproduce Steps to reproduce the behavior:

  1. Create a markdown file in an external editor such as Atom/Gedit or a basic text editor like Notepad
  2. Add some spaces, tabs, newlines, lists, etc.
  3. Open the markdown file in the Nextcloud web editor and notice the formatting differences
  4. Make a minor edit to the file in the Nextcloud web interface
  5. View the file again in your external editor and notice the formatted has been destroyed.

Expected behavior I expect the Text app to not completely destroy any formatting created by external apps. If any special formatting is unsupported, it should be left alone for maximum compatibility with external apps. If any formatting changes must be made based on the current edits, those changes should apply ONLY to the current edits and not the entire file.

Screenshots

How it looks in Gedit under Linux: Screenshot from 2020-01-19 17-02-41

How it looks when opened in my Nextcloud web interface: Screenshot from 2020-01-22 17-04-36

Make an edit through the web interface and now the file's formatting has been destroyed when viewed in Gedit: Screenshot from 2020-01-19 17-12-49

Client details:

  • OS: Arch Linux
  • Browser: Firefox
  • Version: 72.0.1
  • Device: desktop
Server details

Text app version: 1.1.1

Operating system: Ubuntu 18.04.3

Web server: Apache

Database: Mysql 5.7.29

PHP version: 7.3.13

Nextcloud version: 17.0.2

inthreedee avatar Jan 19 '20 22:01 inthreedee

Second that!

EDIT: Just found another extremely problematic behaviour: Automatic escaping of footnotes. Also not specified in CommonMark, of course, but it should not simply escape the square brackets, because it fully breaks support in those third-party editors that support those (besides, having links defined in-text and then the link at the bottom is in fact common, which uses exactly the same format)

image

nathanlesage avatar Feb 06 '20 16:02 nathanlesage

We use pico cms Nextcloud-Integration and need to be able to edit those .md files via Nextcloud. However, this bug effects the ability to edit pico files, f.e. _meta.md looses all it's formatting and since it has a yaml header, it destroys parts of the website. This is especially annoying, since there is no possibility to use the markdown editor or any other editor anymore in NC 18...

zerknorscht avatar Mar 05 '20 19:03 zerknorscht

May I suggest that, while WYSIWYG in Markdown is cool to have, a plain code view would be the easiest solution for the meantime until the mode works for everyone, so that we can work with the files easily without fearing to lose any data?

As the plain view was already implemented, it shouldn't prove too difficult to re-enable that and simply offer people a switch between the two!

nathanlesage avatar Mar 05 '20 22:03 nathanlesage

A plain code view for the Text App would be wonderfull, as it would render the pico integration in NC 18 usable again

zerknorscht avatar Mar 06 '20 09:03 zerknorscht

somewhat like this would be great.... how can i throw bugs on a bounty for this? (<- noob) neweditor_mockup

zerknorscht avatar Mar 06 '20 09:03 zerknorscht

I've been having this issue as well. It disheartens me that we are now at NC19 and this still hasn't been addressed. The pico CMS is huge as it breaks the YAML code on the file.

JoshuaPettus avatar Jun 22 '20 20:06 JoshuaPettus

This seems to be a pretty prominent and relevant issue. Since this (and duplicates) have been in a limbo state for some time now, I'll ping some people (maybe you also know other people to ping who could help out or are responsible for this application).

@juliushaertl @jospoortvliet @rullzer

claell avatar Mar 13 '21 10:03 claell

To quote @juliushaertl from https://github.com/nextcloud/notes/pull/652#issuecomment-751993173:

I feat that we might never be able to fully preserve the original formatting. The main reason is that the WYSIWYG editor does not preserve the original format but converts the markdown document to a different document format (a json tree with different node types like h1/li and annotation types like italic/bold) and then on save converts this json format back to markdown. Taking lists as an example we just don't know when saving if the original markdown file was written with * or - as list indicators. This was decided to be an acceptable downside of the approach since we mainly aimed for a rich content editor here rather than a markdown editor. However we should of course preserve as much as possible so the open enhancement requests about frontmatter or tables for example should definitely be implemented at some point.

szaimen avatar Mar 13 '21 11:03 szaimen

Either there have to be improvements to the currently used system (like @juliushaertl suggested in the comment) or the system has to be changed (not sure whether it is possible to create a WYSIWYG editor without the conversion to a different document format in between, but the nextcloud notes editor looks like it does just that). The current state is not suitable for productive usage, I basically agree with https://github.com/nextcloud/notes/pull/652#issuecomment-753694466.

claell avatar Mar 13 '21 11:03 claell

To very briefly summarize that comment linked by @claell above:

The tradeoffs in the current implementation are reasonable for a rich content editor in general. They are not reasonable for one that is set up as the default editor of (and then destroyer of) markdown files. Two options:

  1. Make changes so the text app behaves like users expect a markdown editor to behave. Preserve all markdown formatting properly.
  2. Stop using the .md file extension by default. Instead, use an extension specifically for this "Nextcloud rich text format".

As much as I would prefer the first option, it seems like the app was designed with different goals in mind. In my opinion, the only mistake was choosing to use the .md extension to create files that are incompatible with every other markdown editor. Because of that, I'm more in favor of option two at the moment.

inthreedee avatar Mar 13 '21 14:03 inthreedee

@inthreedee Agreed. I mean, it would certainly be possible to facilitate all that WYSIWYG-stuff on Markdown files (and in earlier iterations, Nextcloud Text actually did a pretty good job at giving you both code and a preview), but in the end I'm also very much in favour of just having the plain markup …

nathanlesage avatar Mar 13 '21 17:03 nathanlesage

I'd be ok with option two if it is made sure there is another editor for Markdown files shipped by default, so one can still edit them online on existing Nextcloud instances.

However, if that editor is plain text, there will likely arise the request for a WYSIWYG markdown editor, which brings us back again to the original problem of possible having two document formats simultaneously for the same document. But maybe that problem is only related to the collaborative editing feature and can be achieved without problems in other editors.

I am still not convinced that option one is not feasible, and @juliushaertl already mentioned that improvements can be made. The question would be whether those improvements are enough and how much work they require (i.e. how soon they would ship).

claell avatar Mar 14 '21 14:03 claell

There is a proper markdown editor already available: https://apps.nextcloud.com/apps/files_markdown Note that it requires the Plain Text Editor app to be installed and Text should be disabled to avoid conflicts.

It doesn't integrate into the iOS mobile client and I assume it's the same for Android. If you disable Text, markdown files become read-only on mobile.

My current solution is to accept the reality that Text may remain incompatible with actual markdown files due to design choices made early on. I disabled Text and use the Markdown Editor app online. On my phone, I purchased a third party markdown editor that doesn't misbehave like Text does. It connects to my Nextcloud files through the iOS files integration. I believe similar things are possible on Android.

inthreedee avatar Mar 14 '21 15:03 inthreedee

I know of those existing editors. I was mostly talking about an official solution, not manual workarounds. Maybe those apps can be used for that official solution, however there is likely some work needed to be done here, since as you say at least the app you linked needs Text to be disabled, doesn't work nicely on mobile etc.

For the issue, I am experiencing, I require an official solution, since I don't host the Nextcloud instance where this is relevant myself and don't want to ask the admins to perform the steps for a manual solution to this.

claell avatar Mar 15 '21 08:03 claell

However, if that editor is plain text, there will likely arise the request for a WYSIWYG markdown editor,

In my opinion this objection should not lead to paralysis on resolving the main issue. I think if according to option 2. above, the Text app gets its own extension and is no longer the default app for .md, then there would simply be no app for collaborative editing of .md files. So what? If one needs collab editing of rich text, use Text instead. That situation would still be much better than what we have now, where Text breaks user data by default.

nilsbecker avatar Jun 30 '21 14:06 nilsbecker

Just found this bug while deciding to use my Nextcloud instance as an office on the go, and to do some coding in *.md files. On a lighter note, I noticed that if you want to embed Youtube videos which is a feature showed on the Nextcloud site as something that you can do, that it has the caveat of only be displaying properly while using the Plain Text Editor app. Not in Text or in Notes.

Is there an update on this? Since I find myself in the unfortunate situation of using and liking the Collectives app, which requires the Text app to function. But if I want to work on code with other people, or even by myself, I run the risk of a file being incorrectly opened by mistake with the wrong app, once, and having the code be broken. The best practice would be to simply disable the Text app altogether, however, if I do that, then Collectives fails to work.

I mean, I could use the versioning app and then go back on past versions of the file if the breaking does happen, but it seems like a weird sort of go-around to fix a problem that should not even exist in the first place.

It seems the reason this cannot be fixed at the software level is because the application Text is based off does these changes internally so, it would require a lot of resources to change that, if not impossible without a major rewrite. So, I too agree that perhaps Text should have its own extension. Assuming Nextcloud wants to use the Text app as a Google Docs analogue like someone else stated elsewhere, then the fix with the fewer moving parts would be the alternative extension route.

Since it seems counterproductive to have an app that breaks code internally as part of its processing of them be used to open *.md files, by default. Given how *.md files are used so widely in coding.

daffydock avatar Oct 26 '21 15:10 daffydock

Ping @karlitschek in the hope to get some good solution for this problem. I'd think that having nextcloud work reliably is an important point for many customers, and this is one issue where nextcloud doesn't manage to deliver reliability.

claell avatar Oct 27 '21 10:10 claell

@claell as a customer I suggest you file an issue with our support team, if it's in scope they will handle it according to the SLA. If you're not a customer, you're welcome to contribute of course - if you think it can be fixed.

From what I understand, Text parses the markdown to the internal format and exports it again, see also the link to Julius' comment above. Sadly, this is fundamentally lossy. Now as long as the other markdown editor also works with CommonMark, even if they reformat things a bit - I don't really see much of an issue. Other markdown apps might do the same, as long as they follow CommonMark it should not be a problem. If you use a markdown editor that has issues with CommonMark, that's perhaps something to address with that editor? Either way, that the look of a file gets changed by Text - as long as the rendered result is correct, it's a bit annoying (I feel the pain too sometimes) but it is only that - occasionally bothersome in a corner case.

Of course, if data gets lost instead of merely reformatted, that's annoying and should ideally not happen. But we developed text for use as a collaborative note taking app - not really for more advanced use, to be honest, and it works well for that use case. Changing the extension would inconvenience the majority of users for those few that might have problems - kind of like nuking a whole city because there's one person you don't like. I'd rather not go that far.

I think we should split this issue up:

  • [ ] a plain text view switch (still with wrap, as the current .txt view doesn't wrap which is... nearly unusable)
  • [ ] for each specific issue where Text causes data loss or serious formatting problems, create an issue to address it

The loss of formatting (spacing etc) itself is simply a won't fix (or can't fix rather) I'm afraid, but a plain text view and data/formatting problems can be addressed.

jospoortvliet avatar Oct 27 '21 12:10 jospoortvliet

IMO @jospoortvliet's conclusion is false. This is not just some random inconvenience, it's a breaking issue and a violation of the CommonMark specs.

CommonMark was specifically designed with existing Markdown editors in mind. Markdown's history is very important here: Markdown never was standardized. Even CommonMark is no widely accepted standard, they are trying (IMO pretty well), but CommonMark simply isn't widely accepted. There's a huge number of Markdown parsers out there and the only common denominator was some descriptive document from John Gruber and Gruber's original Markdown.pl parser for a very long time. His description of Markdown is very ambiguous, partially even contradicts itself and even the reference parser sometimes violates the rules. As a result, Markdown parsers behave more or less differently - especially in cases which aren't really covered by Gruber's document or in cases in which Markdown.pl produces impractical results. In the recent past some people started the CommonMark project to resolve these uncertainties. The project is indeed very promising and a lot Markdown parsers already said that they will try to improve compliance to the CommonMark specs. But still, they all must keep their existing user base in mind and thus it's unlikely that CommonMark will ever be the only standard. Just as a reminder, GitHub's Markdown editor I'm typing in right now is no CommonMark editor...

Thus CommonMark editors might indeed misinterpret contents, but they must not remove contents. Reformatting is removing contents, as Markdown is a language heavily relying on formatting. Pretending that CommonMark is the standard and all other editors doing things differently are at fault is just wrong, this is an issue with Text. I've never seen any other editor doing something like this - and it's not just some random corner case.

However, @jospoortvliet is right, we can't just switch file extensions and tell the majority of users to rename their files manually. I totally agree that a plain text view switch is very important and at least reducing the incompatibilities is a good thing, too. However, I'm not seeing this to happen in the near future. But there's a quite simple short-term solution: Allow admins to unregister Text as default editor for .md files. The only other solution right now is to remove Text.

PhrozenByte avatar Oct 27 '21 14:10 PhrozenByte

@jospoortvliet I was using this as a user in a corporate Nextcloud installation, however, I don't think I can get their IT to file a ticket with the nextcloud support team.

Still, I think this is relevant and a basic requirement that customers just take as granted (not something they want to even have to file tickets about). This is why I tried to raise some awareness.

As a prominent example, tables are "destroyed": https://github.com/nextcloud/text/issues/1524#issuecomment-798113205

I'll post this here again, so it is not forgotten:

Unfortunately this problem is even worse. When editing the file online with the editor, the line breaks are removed from the file.

Setup:

  • Create a Markdown file with tables on your Desktop (Windows if that matters) and synchronize it to the Cloud.
  • View the file online (one can now observe that line breaks for the table are not displayed correctly)
  • Edit the file online (doesn't matter where, I changed something many lines above the table)
  • The file gets synchronized back to the Client
  • Now the file on the client also lacks line breaks for the table

Example:

| Heading | Test | Heading | Test |
| --- | :--- | --- | --- |
| Content | | | |
| Content | | | |
| Content | Test | Content | Test |

gets turned into

| Heading | Test | Heading | Test | | --- | :--- | --- | --- | | Content | | | | | Content | | | | | Content | Test | Content | Test |

Now, one could argue that tables don't seem to be supported by CommonMark, either. But then I'd say at least don't touch them. Instead the formatting is changed and newlines get removed which breaks them for other editors.

claell avatar Oct 27 '21 15:10 claell

The line breaks things annoys me deeply too, it'd be great if that could be avoided - but that might be fixable in itself. I still think that needs a specific issue for it.

Allowing to set file open priorities etc would be a useful feature in any case, as well.

jospoortvliet avatar Oct 27 '21 17:10 jospoortvliet

Quoting @PhrozenByte for emphasis:

Thus CommonMark editors might indeed misinterpret contents, but they must not remove contents. Reformatting is removing contents

To summarize: In this thread, we've collectively demonstrated examples of how Nextcloud Text's behavior is destructive to the original content of the file: Tables, spaces, indents, line breaks, footnotes, and headers. This makes the resulting text files incompatible with other markdown editors and it sounds like most of these cannot be fixed based on how Nextcloud Text was designed.

Now as long as the other markdown editor also works with CommonMark, even if they reformat things a bit - I don't really see much of an issue. Other markdown apps might do the same, as long as they follow CommonMark it should not be a problem. If you use a markdown editor that has issues with CommonMark, that's perhaps something to address with that editor?

This reasoning makes the faulty assumption that Nextcloud Text is the "owner" of the files being edited and it's therefore the responsibility of other apps to be compatible with its formatting. The problem with this is that Nextcloud Text is now the default editor of any .md files we have already created, store in Nextcloud Files for portability, and are using in other editors.

Basically, we are unable to both use Nextcloud Text for its intended purpose (collaborative text editor) and preserve and edit all of our existing .md files. We are having to choose between one or the other.

Everyone here is saying, essentially, "keep your filthy mitts off my .md files!"

Changing the extension would inconvenience the majority of users for those few that might have problems - kind of like nuking a whole city because there's one person you don't like.

I believe the idea was that Nextcloud Text would begin using a different file extension going forward, but still also open .md files by default. Regular users would see no change in behavior. If an admin then installs any other editor from the app store that wants to handle .md files, Nextcloud Text would stop opening .md files and allow that editor to become the default editor of .md files. This still seems like the ideal solution to me as there is a "proper" markdown editor in the app store that handles the situation very well.

The suggestion of adding a non-destructive plain text view only works if we can be guaranteed that every one of our existing .md files is only ever opened in plain text mode so their formatting is not destroyed. This may be a reasonable alternative solution, but I don't see how that guarantee can be met under the current design.

inthreedee avatar Oct 27 '21 17:10 inthreedee

@jospoortvliet There have been separate issues for single aspects before (I think) but this issue here was seen as the best place to subsume all of them. So is there really a need for separate issues for formatting, or can we just use this one and fix them?

Maybe one safe solution for now would be to prevent Nextcloud Text from opening existing .md files by default (until the "situation" is resolved) and have an opt-in for admins who want this (knowing the current problems)?

claell avatar Oct 27 '21 18:10 claell

Maybe one safe solution for now would be to prevent Nextcloud Text from opening existing .md files by default (until the "situation" is resolved) and have an opt-in for admins who want this (knowing the current problems)?

I would greatly appreciate an option to prevent markdown files from opening within this demonic "text" editor. Is there a way to configure that?

sandrock avatar Mar 03 '22 14:03 sandrock

I would greatly appreciate an option to prevent markdown files from opening within this demonic "text" editor. Is there a way to configure that?

See https://github.com/icewind1991/files_markdown The page talks about the options you have.

timkgh avatar Mar 03 '22 14:03 timkgh

I think it's theoretically possible to fix this issue - or at least some aspects of it.

Markdown files are processed by text like this:

  1. turned into html by markdown-it
  2. loaded into tiptap and turned into tiptaps internal json format by according to parseHTML specs
  3. rendered to html for display in the browser according to renderHTML functions
  4. rendered to markdown for serialization according to toMarkdown functions / prosemirror-markdown.

So what we'd need to do is preserve the information about the original markdown across these steps:

  1. add data attributes to all the html tags that describe the initial markdown
  2. parse the data attributes and turn them into attributes on the tiptap nodes
  3. Either render the attributes to preserve formatting during copy and paste or not to keep html simple
  4. take the attributes into account in the toMarkdown functions.

I see two possible venues:

Attributes per style

Let's take a heading for example. It can take these forms:

# H1 heading

H1 heading
=========

So for this we would need to store which style is chosen, and how many equation signs there are under the heading. So something like this:

<h1 data-md-style="underline", data-md-length="9">H1 heading</h1>

Then we'd turn this into similarly named attributes in tiptap and render the same markdown when serializing it again.

Now the problem is that markdown is way too flexible. Let's take an <hr>... this should be simple, right. Except these are all valid:

 ****
   ***
 * * * * *
 * ** *** **** ***** **** *** ** *
----
____

So just encoding the char and length is not enough... You can have prefix whitespace and postfix and any number of interruptions. Basically this brings us to option 2.

Markdown in attributes

Instead of specifying some properties of the markdown in attributes we could store the entire markdown in the html and then the tiptap attributes:

<h1 data-md="H1 heading\n=========">H1 heading</h1>

Two problems arise though:

  • Elements are nested. So imagine a table or a nested list with images, headings etc. The topmost element would contain the entire markdown source as an attribute
  • Updates need to be applied.

Imagine i have the heading above but change the content:

<h1 data-md="H1 heading\n=========">Other heading</h1>

Now if we serialize that... how can we tell the data-md is outdated and cannot be used anymore?

Basically we now have a markdown cache an need to make sure to invalidate it whenever elements change. This seems doable but cumbersome. On the plus side we'd have a cache for the toMarkdown function that we call on every edit.

Conclusion

I think we will need to strike a balance here. In order of priority:

  1. Changes to markdown that break common mark markup - i.e. change its meaning are bugs and need to be fixed.
  2. Changes to markdown that break commonly used extensions - say tables, foodnotes etc. should be prevented
  3. Commonly used and helpful formatting such as aligning tables should be preserved if the content is not touched.
  4. We should document what markdown is preserved when using text.
  5. We might be able to preserve it all - but most likely only if that has other benefits that outweight the cost of added code complexity.

Maybe we could even make use of markdownlint or so for 4.

max-nextcloud avatar Mar 14 '22 12:03 max-nextcloud

@max-nextcloud that sounds like a very good idea!

But before inventing own tagging etc. I'd first very thoroughly investigate how pandoc does that as pandoc supports seamless conversion from HTML to markdown (and vice versa, of course) as well as to/from JSON while preserving everything from CommonMark as well as many other information.

Writing filters for pandoc is a breeze just because the intermediate JSON format contains really all the information in a very understandable and simple enough form.

dumblob avatar Mar 14 '22 13:03 dumblob

@dumblob Did i understand you correctly that with pandoc you can do markdown -> html -> markdown that preserves the markdown encoding? I'd be very impressed but also kind of surprised by that.

$ cat > test.md
- - - - - -
$ pandoc test.md -o test.html
$ cat test.html 
<hr />
$ pandoc test.html -o out.md
$ cat out.md 
------------------------------------------------------------------------

max-nextcloud avatar Mar 14 '22 14:03 max-nextcloud

Some years ago I've done that. I've used some parameters (because pandoc is huge - and by default it does only subset of what it can do) and don't remember whether I used some filters in the end to augment the given outputs & inputs. ~But yeah, md -> html -> md shall be possible with pandoc.~

~Your example sounds like it works, right?~

Edit: never mind, with HTML it doesn't preserve line breaks nor other things. With JSON (which 1:1 represents pandoc's intermediary format) it does preserve certain parts of user formatting, but e.g. spaces are not preserved.

dumblob avatar Mar 14 '22 15:03 dumblob

The example shows the same kind of problem that text has unfortunately, that the HTML representation is not able to keep the exact source markdown representation as the markdown code for the horizontal line is different before and after.

That discussion aside, I think that the steps and priority order from above make sense:

I think we will need to strike a balance here. In order of priority:

  1. Changes to markdown that break common mark markup - i.e. change its meaning are bugs and need to be fixed.
  2. Changes to markdown that break commonly used extensions - say tables, foodnotes etc. should be prevented
  3. Commonly used and helpful formatting such as aligning tables should be preserved if the content is not touched.
  4. We should document what markdown is preserved when using text.
  5. We might be able to preserve it all - but most likely only if that has other benefits that outweight the cost of added code complexity.

Maybe we could even make use of markdownlint or so for 4.

Maybe we can get the individual issues for 1-4 into separate tickets for easier tracking and keep this one for the focus on 5?

juliusknorr avatar Mar 16 '22 13:03 juliusknorr