eclipse.platform icon indicating copy to clipboard operation
eclipse.platform copied to clipboard

File corruption when moving file accross projects that have different charsets

Open Wittmaxi opened this issue 2 years ago • 2 comments

Current Behaviour

When moving a file across two projects with different charsets, a file with no specified charset gets corrupted.

Reproducing*

Version: 2023-09 (4.29)
Build id: I20230711-0440
  1. Create two projects
  2. Set the default encodings of the projects, set them to different charsets
  3. Create a file in one project
  4. Move the file from one project to the other
  5. See the File's implied charset not match the old implied charset

Expected Behaviour

I propose this behaviour:

  • When moving a file without explicitly set charset into a project where the new implicit charset would conflict with the current implicit charset, the old charset is set to an explicit charset in the file
  • When moving a file with explicitly set charset into a project which has the same default charset as the explicitly set charset, unset the charset preference

-- This issue mirrors this issue. What do you think is the expected behaviour here? Do you agree with my proposal?

Wittmaxi avatar Jul 24 '23 09:07 Wittmaxi

If the charsets conflict it should be double checked first if the file content really requires a charset change. For example pure ascii files could be copied into utf8 or windows-1252 without any other change.

if it's really conflicting the user should be asked a) convert file (should show error if conversion is not successful because of unsupported used character b) set encoding for target file c) set encoding of target's project d) ignore (default - old behaviour)

jukzi avatar Jul 25 '23 14:07 jukzi

Thank you for the input @jukzi. This issue has surpassed the scope of what I imagined it to be be - I will not fix this issue.

The proposed scenario

I can think of a few user-journeys where the proposed behaviour could/should be refined or atleast overthought.

  • The user moves is working on two projects with similar functionality at the same time. 30 files from Project "My Calculatorproject" to "My TodoListPlannerproject". The user is annoyed at clicking on "I want to move the charset" 30 times - obvious solution being to include a dialogue that asks for all files at once
  • The user is completely new to developement. The user has downloaded a few example projects and moves files between them. Suddenly they are confronted with a message about a charset. They don't know what a charset is and don't know what to set it to.

Having considered these, I agree with your proposal to implement a "double check in with the user". IMHO, just silently keeping track of the charset in the background is not harmful either.

Entry Point for anybody picking this up in the future

I have already implemented a unit-test that checks for this. However, as there are a few unknowns in expected behaviour (for example: if we move a file from "ascii" into "utf-8", we don't necessarily expect the charset to be sticky)

My test was implemented as shown down-below, in this file

/**
* https://bugs.eclipse.org/bugs/show_bug.cgi?id=380114
*
* When moving files between projects that have different charsets, we want the
* file to keep it's old charset.
*/
@Test
public void _testBug380114() throws CoreException {
  IWorkspace workspace = getWorkspace();
  final IProject pBase = workspace.getRoot().getProject("projectbase");
  final IProject pDest = workspace.getRoot().getProject("projectdest");
  try {
	  ensureExistsInWorkspace(pBase, false);
	  ensureExistsInWorkspace(pDest, false);
  
	  pBase.setDefaultCharset("FOOMW", getMonitor());
	  pDest.setDefaultCharset("BARMW", getMonitor());
	  assertEquals("Base Project's default Charset was set", pBase.getDefaultCharset(false), "FOOMW");
	  assertEquals("Destination Project's default Charset was set", pDest.getDefaultCharset(false), "BARMW");
  
	  final IFile file = pBase.getFile("file.txt");
	  ensureExistsInWorkspace(file, true);
	  assertEquals("File in Base Project implicitly inherits parent's charset", file.getCharset(true), "FOOMW");
  
	  final IFile file2 = pBase.getFile("file2.txt");
	  ensureExistsInWorkspace(file2, true);
	  file2.setCharset("A", getMonitor());
	  final IFile file3 = pBase.getFile("file3.txt");
	  ensureExistsInWorkspace(file3, true);
	  file3.setCharset("B", getMonitor());
	  final IFile file4 = pBase.getFile("file4.txt");
	  ensureExistsInWorkspace(file4, true);
	  file4.setCharset("C", getMonitor());
	  final IFile file5 = pBase.getFile("file5.txt");
	  ensureExistsInWorkspace(file5, true);
	  file5.setCharset("D", getMonitor());
  
	  file.move(pDest.getFullPath().append("file.txt"), IResource.NONE, getMonitor());
  
	  IFile movedFile = pDest.getFile("file.txt");
	  assertExistsInWorkspace(movedFile, false);
	  assertEquals("Moved File kept it's old encoding", movedFile.getCharset(true), "FOOMW");
  } finally {
	  ensureDoesNotExistInWorkspace(pBase);
	  ensureDoesNotExistInWorkspace(pDest);
  }
}

Wittmaxi avatar Aug 10 '23 09:08 Wittmaxi