annotate design.txt @ 659:a5cf64f2e7e4 smartgit-4.6

Poor performance when reading/collecting branch information. Respect new cache location for recent mercurial revisions. Use different algorithm to build branch cache
author Artem Tikhomirov <tikhomirov.artem@gmail.com>
date Fri, 05 Jul 2013 20:42:45 +0200
parents 31a89587eb04
children
rev   line source
1
a3576694a4d1 Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff changeset
1 FileStructureWalker (pass HgFile, HgFolder to callable; which can ask for VCS data from any file)
a3576694a4d1 Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff changeset
2 External uses: user browses files, selects one and asks for its history
a3576694a4d1 Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff changeset
3 Params: tip/revision;
a3576694a4d1 Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff changeset
4 Implementation: manifest
a3576694a4d1 Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff changeset
5
a3576694a4d1 Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff changeset
6 Log --rev
a3576694a4d1 Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff changeset
7 Log <file>
2
08db726a0fb7 Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 1
diff changeset
8 HgDataFile.history() or Changelog.history(file)?
08db726a0fb7 Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 1
diff changeset
9
08db726a0fb7 Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 1
diff changeset
10
08db726a0fb7 Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 1
diff changeset
11 Changelog.all() to return list with placeholder, not-parsed elements (i.e. read only compressedLen field and skip to next record), so that
08db726a0fb7 Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 1
diff changeset
12 total number of elements in the list is correct
1
a3576694a4d1 Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff changeset
13
a3576694a4d1 Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff changeset
14 hg cat
2
08db726a0fb7 Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 1
diff changeset
15 Implementation: logic to find file by name in the repository is the same with Log and other commands
08db726a0fb7 Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 1
diff changeset
16
08db726a0fb7 Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 1
diff changeset
17
08db726a0fb7 Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 1
diff changeset
18 Revlog
4
aa1912c70b36 Fix offset issue for inline revlogs. Commandline processing.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 2
diff changeset
19 What happens when big entry is added to a file - when it detects it can't longer fit into .i and needs .d? Inline flag and .i format changes?
aa1912c70b36 Fix offset issue for inline revlogs. Commandline processing.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 2
diff changeset
20
22
603806cd2dc6 Status of local working dir against non-tip base revision
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 21
diff changeset
21 What's hg natural way to see nodeids of specific files (i.e. when I do 'hg --debug manifest -r 11' and see nodeid of some file, and
603806cd2dc6 Status of local working dir against non-tip base revision
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 21
diff changeset
22 then would like to see what changeset this file came from)?
4
aa1912c70b36 Fix offset issue for inline revlogs. Commandline processing.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 2
diff changeset
23
aa1912c70b36 Fix offset issue for inline revlogs. Commandline processing.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 2
diff changeset
24 ----------
6
5abe5af181bd Ant script to build commands and run sample
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 5
diff changeset
25 + support patch from baseRev + few deltas (although done in a way patches are applied one by one instead of accumulated)
4
aa1912c70b36 Fix offset issue for inline revlogs. Commandline processing.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 2
diff changeset
26 + command-line samples (-R, filenames) (Log & Cat) to show on any repo
6
5abe5af181bd Ant script to build commands and run sample
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 5
diff changeset
27 +buildfile + run samples
9
d6d2a630f4a6 Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 6
diff changeset
28 *input stream impl + lifecycle. Step forward with FileChannel and ByteBuffer, although questionable accomplishment (looks bit complicated, cumbersome)
14
442dc6ee647b Show correct time
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 11
diff changeset
29 + dirstate.mtime
43
1b26247d7367 Calculate result length of the patch operarion, when unknown
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 41
diff changeset
30 +calculate sha1 digest for file to see I can deal with nodeid. +Do this correctly (smaller nodeid - first)
18
02ee376bee79 status operation against current working directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 17
diff changeset
31 *.hgignored processing
25
da8ccbfae64d Reflect Nodeid's array is exactly 20
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 22
diff changeset
32 +Nodeid to keep 20 bytes always, Revlog.Inspector to get nodeid array of meaningful data exact size (nor heading 00 bytes, nor 12 extra bytes from the spec)
26
71a9ba42cee8 Memory-mapped files for bigger files. Defect reading number of bytes greater than size of the buffer fixed
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 25
diff changeset
33 +DataAccess - implement memory mapped files,
49
26e3eeaa3962 branch and user filtering for log operation
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 43
diff changeset
34 +Changeset to get index (local revision number)
60
613c936d74e4 Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 55
diff changeset
35 +RevisionWalker (on manifest) and WorkingCopyWalker (io.File) talking to ? and/or dirstate (StatusCollector and WCSC)
613c936d74e4 Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 55
diff changeset
36 +RevlogStream - Inflater. Perhaps, InflaterStream instead? branch:wrap-data-access
613c936d74e4 Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 55
diff changeset
37 +repo.status - use same collector class twice, difference as external code. add external walker that keeps collected maps and use it in Log operation to give files+,files-
78
c25c5c348d1b Skip metadata in the beginning of a file content. Parse metadata, recognize copies/renames
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 64
diff changeset
38 + strip \1\n metadata out from RevlogStream
84
08754fce5778 updated design questions
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 78
diff changeset
39 + hash/digest long names for fncache
169
8c8e3f372fa1 Towards initial clone: refactor HgBundle to provide slightly higher-level structure of the bundle
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 136
diff changeset
40 +Strip off metadata from beg of the stream - DataAccess (with rebase/moveBaseOffset(int)) would be handy
8c8e3f372fa1 Towards initial clone: refactor HgBundle to provide slightly higher-level structure of the bundle
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 136
diff changeset
41 + hg status, compare revision and local file with kw expansion and eol extension
8c8e3f372fa1 Towards initial clone: refactor HgBundle to provide slightly higher-level structure of the bundle
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 136
diff changeset
42
8c8e3f372fa1 Towards initial clone: refactor HgBundle to provide slightly higher-level structure of the bundle
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 136
diff changeset
43 write code to convert inlined revlog to .i and .d
60
613c936d74e4 Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 55
diff changeset
44
613c936d74e4 Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 55
diff changeset
45 delta merge
613c936d74e4 Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 55
diff changeset
46 DataAccess - collect debug info (buffer misses, file size/total read operations) to find out better strategy to buffer size detection. Compare performance.
396
0ae53c32ecef Straighten out exceptions thrown when file access failed - three is too much
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 367
diff changeset
47 RevlogStream - inflater buffer (and other buffers) size may be too small for repositories out there (i.e. inflater buffer of 512 bytes for 200k revision)
41
858d1b2458cb Check integrity for bundle changelog. Sort nodeids when calculating hash
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 33
diff changeset
48
169
8c8e3f372fa1 Towards initial clone: refactor HgBundle to provide slightly higher-level structure of the bundle
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 136
diff changeset
49
128
44b97930570c Introduced ChangelogHelper to look up changesets files were modified in
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 93
diff changeset
50 Parameterize StatusCollector to produce copy only when needed. And HgDataFile.metadata perhaps should be moved to cacheable place?
44b97930570c Introduced ChangelogHelper to look up changesets files were modified in
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 93
diff changeset
51
18
02ee376bee79 status operation against current working directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 17
diff changeset
52 Status operation from GUI - guess, usually on a file/subfolder, hence API should allow for starting path (unlike cmdline, seems useless to implement include/exclide patterns - GUI users hardly enter them, ever)
60
613c936d74e4 Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 55
diff changeset
53 -> recently introduced FileWalker may perhaps help solving this (if starts walking from selected folder) for status op against WorkingDir?
9
d6d2a630f4a6 Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 6
diff changeset
54
84
08754fce5778 updated design questions
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 78
diff changeset
55 ? Can I use fncache (names from it - perhaps, would help for Mac issues Alex mentioned)
08754fce5778 updated design questions
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 78
diff changeset
56 ? Does fncache lists both .i and .d (iow, is it true hashed <long name>.d is different from hashed <long name>.i)
08754fce5778 updated design questions
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 78
diff changeset
57
15
865bf07f381f Basic hgignore handling
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 14
diff changeset
58 ??? encodings of fncache, .hgignore, dirstate
16
254078595653 Print manifest nodeid
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 15
diff changeset
59 ??? http://mercurial.selenic.com/wiki/Manifest says "Multiple changesets may refer to the same manifest revision". To me, each changeset
254078595653 Print manifest nodeid
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 15
diff changeset
60 changes repository, hence manifest should update nodeids of the files it lists, effectively creating new manifest revision.
15
865bf07f381f Basic hgignore handling
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 14
diff changeset
61
64
19e9e220bf68 Convenient commands constitute hi-level API. org.tmatesoft namespace, GPL2 statement
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 61
diff changeset
62 ? subrepos in log, status (-S) and manifest commands
19e9e220bf68 Convenient commands constitute hi-level API. org.tmatesoft namespace, GPL2 statement
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 61
diff changeset
63
197
3a7696fb457c Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 169
diff changeset
64 ? when p1 == -1, and p2 != -1, does HgStatusCollector.change() give correct result?
93
d55d4eedfc57 Switch to Path instead of String in filenames returned by various status operations
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 84
diff changeset
65
64
19e9e220bf68 Convenient commands constitute hi-level API. org.tmatesoft namespace, GPL2 statement
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 61
diff changeset
66 Commands to get CommandContext where they may share various caches (e.g. StatusCollector)
93
d55d4eedfc57 Switch to Path instead of String in filenames returned by various status operations
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 84
diff changeset
67 Perhaps, abstract classes for all Inspectors (i.e. StatusCollector.Inspector) for users to use as base classes to protect from change?
64
19e9e220bf68 Convenient commands constitute hi-level API. org.tmatesoft namespace, GPL2 statement
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 61
diff changeset
68
205
ffc5f6d59f7e HgLogCommand.Handler is used in few places, pull up to top-level class, HgChangesetHandler
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 202
diff changeset
69 -cancellation and progress support
ffc5f6d59f7e HgLogCommand.Handler is used in few places, pull up to top-level class, HgChangesetHandler
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 202
diff changeset
70 -timestamp check for revlog to recognize external changes
209
9ce3b26798c4 Few branches (distinct BranchChains from distinct heads) may end up with same nodes. Building BC structure fixed to reuse chain elements
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 205
diff changeset
71 -HgDate or any other better access to time info
9ce3b26798c4 Few branches (distinct BranchChains from distinct heads) may end up with same nodes. Building BC structure fixed to reuse chain elements
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 205
diff changeset
72 -(low) RepositoryComparator#calculateMissingBranches may query branches for the same head more than once
9ce3b26798c4 Few branches (distinct BranchChains from distinct heads) may end up with same nodes. Building BC structure fixed to reuse chain elements
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 205
diff changeset
73 (when there are few heads that end up with common nodes). e.g hg4j revision 7 against remote hg4j revision 206
205
ffc5f6d59f7e HgLogCommand.Handler is used in few places, pull up to top-level class, HgChangesetHandler
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 202
diff changeset
74
9
d6d2a630f4a6 Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 6
diff changeset
75 >>>> Effective file read/data access
d6d2a630f4a6 Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 6
diff changeset
76 ReadOperation, Revlog does: repo.getFileSystem().run(this.file, new ReadOperation(), long start=0, long end = -1)
d6d2a630f4a6 Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 6
diff changeset
77 ReadOperation gets buffer (of whatever size, as decided by FS impl), parses it and then reports if needs more data.
d6d2a630f4a6 Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 6
diff changeset
78 This helps to ensure streams are closed after reading, allows caching (if the same file (or LRU) is read few times in sequence)
d6d2a630f4a6 Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 6
diff changeset
79 and allows buffer management (i.e. reuse. Single buffer for all reads).
d6d2a630f4a6 Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 6
diff changeset
80 Scheduling multiple operations (in future, to deal with writes - single queue for FS operations - no locks?)
d6d2a630f4a6 Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 6
diff changeset
81
60
613c936d74e4 Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 55
diff changeset
82 WRITE: Need to register instances that cache files (e.g. dirstate or .hgignore) to FS notifier, so that cache may get cleared if the file changes (i.e. WriteOperation touches it).
613c936d74e4 Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 55
diff changeset
83
9
d6d2a630f4a6 Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 6
diff changeset
84 File access:
d6d2a630f4a6 Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 6
diff changeset
85 * NIO and mapped files - should be fast. Although seems to give less control on mem usage.
21
e929cecae4e1 Refactor to move revlog content to base class
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 20
diff changeset
86 * Regular InputStreams and chunked stream on top - allocate List<byte[]>, each (but last) chunk of fixed size (depending on initial file size)
9
d6d2a630f4a6 Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 6
diff changeset
87
129
645829962785 core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 128
diff changeset
88
645829962785 core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 128
diff changeset
89 * API
645829962785 core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 128
diff changeset
90 + rename in .core Cset -> HgChangeset,
645829962785 core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 128
diff changeset
91 + rename in .repo Changeset to HgChangelog.Changeset, Changeset.Inspector -> HgChangelog.Inspector
645829962785 core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 128
diff changeset
92 - CommandContext
645829962785 core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 128
diff changeset
93 - Data access - not bytes, but ByteChannel
645829962785 core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 128
diff changeset
94 - HgRepository constants (TIP, BAD, WC) to HgRevisions enum
131
aa1629f36482 Renamed .core classes to start with Hg prefix
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 129
diff changeset
95 - RevisionMap to replace TreeMap<Integer, ?>
aa1629f36482 Renamed .core classes to start with Hg prefix
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 129
diff changeset
96 + .core.* rename to Hg*
aa1629f36482 Renamed .core classes to start with Hg prefix
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 129
diff changeset
97 + RepositoryTreeWalker to ManifestCommand to match other command classes
aa1629f36482 Renamed .core classes to start with Hg prefix
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 129
diff changeset
98
aa1629f36482 Renamed .core classes to start with Hg prefix
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 129
diff changeset
99 * defects
136
947bf231acbb Strip off comments in config file
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 131
diff changeset
100 + ConfigFile to strip comments from values (#)
129
645829962785 core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 128
diff changeset
101
26
71a9ba42cee8 Memory-mapped files for bigger files. Defect reading number of bytes greater than size of the buffer fixed
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 25
diff changeset
102 <<<<<
197
3a7696fb457c Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 169
diff changeset
103 Performance.
3a7696fb457c Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 169
diff changeset
104 after pooling/caching in HgStatusCollector and HgChangeset
3a7696fb457c Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 169
diff changeset
105 hg log --debug -r 0:5000 and same via Log/HgLogCommand: approx. 220 seconds vs 279 seconds. Mem. cons. 20 vs 80 mb.
3a7696fb457c Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 169
diff changeset
106 after further changes in HgStatusCollector (to read ahead 5 elements, 50 max cache, fixed bug with -1) - hg4j dumps 5000 in
3a7696fb457c Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 169
diff changeset
107 93 seconds, memory consumption about 50-56 Mb
3a7696fb457c Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 169
diff changeset
108
198
33a7d76f067b Performance optimization: reduce memory to keep revlog cached info
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 197
diff changeset
109 IndexEntry(int offset, int baseRevision) got replaced with int[] arrays (offsets - optional)
33a7d76f067b Performance optimization: reduce memory to keep revlog cached info
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 197
diff changeset
110 for 69338 revisions from cpython repo 1109408 bytes reduced to 277368 bytes with the new int[] version.
33a7d76f067b Performance optimization: reduce memory to keep revlog cached info
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 197
diff changeset
111 I.e. total for changelog+manifest is 1,5 Mb+ gain
33a7d76f067b Performance optimization: reduce memory to keep revlog cached info
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 197
diff changeset
112
200
114c9fe7b643 Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 198
diff changeset
113 ParentWalker got arrays (Nodeid[] and int[]) instead of HashMap/LinkedHashSet. This change saves, per revision:
114c9fe7b643 Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 198
diff changeset
114 was: LinkedHashSet$Entry:32 + HashMap$Entry:24 + HashMap.entries[]:4 (in fact, up to 8, given entries size is power of 2, and 69000+
114c9fe7b643 Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 198
diff changeset
115 elements in cpython test repo resulted in entries[131072].
114c9fe7b643 Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 198
diff changeset
116 total: (2 HashMaps) 32+(24+4)*2 = 88 bytes
114c9fe7b643 Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 198
diff changeset
117 now: Nodeid[]:4 , int[]:4 bytes per entry. arrays of exact revlog size
114c9fe7b643 Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 198
diff changeset
118 total: (4 Nodeid[], 1 int[]) 4*4 + 4 = 20 bytes
114c9fe7b643 Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 198
diff changeset
119 for cpython test repo with 69338 revisions, 1 387 224 instead of 4 931 512 bytes. Mem usage (TaskManager) ~50 Mb when 10000 revs read
114c9fe7b643 Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 198
diff changeset
120
197
3a7696fb457c Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 169
diff changeset
121 <<<<<
26
71a9ba42cee8 Memory-mapped files for bigger files. Defect reading number of bytes greater than size of the buffer fixed
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 25
diff changeset
122
71a9ba42cee8 Memory-mapped files for bigger files. Defect reading number of bytes greater than size of the buffer fixed
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 25
diff changeset
123 Tests:
61
fac8e7fcc8b0 Simple test framework - capable of parsing Hg cmdline output to compare with Java result
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 60
diff changeset
124 DataAccess - readBytes(length > memBufferSize, length*2 > memBufferSize) - to check impl is capable to read huge chunks of data, regardless of own buffer size
fac8e7fcc8b0 Simple test framework - capable of parsing Hg cmdline output to compare with Java result
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 60
diff changeset
125
129
645829962785 core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 128
diff changeset
126 ExecHelper('cmd', OutputParser()).run(). StatusOutputParser, LogOutputParser extends OutputParser. construct java result similar to that of cmd, compare results
645829962785 core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 128
diff changeset
127
202
706bcc7cfee4 Basic test for HgIncomingCommand. Fix RepositoryComparator for cases when whole repository is unknown. Respect freshly initialized (empty) repositories in general.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 200
diff changeset
128 Need better MethodRule than ErrorCollector for tests run as java app (to print not only MultipleFailureException, but distinct errors)
706bcc7cfee4 Basic test for HgIncomingCommand. Fix RepositoryComparator for cases when whole repository is unknown. Respect freshly initialized (empty) repositories in general.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 200
diff changeset
129 Also consider using ExternalResource and TemporaryFolder rules.
367
2fadf8695f8a Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 209
diff changeset
130
2fadf8695f8a Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 209
diff changeset
131
2fadf8695f8a Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 209
diff changeset
132 =================
2fadf8695f8a Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 209
diff changeset
133 Naming:
2fadf8695f8a Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 209
diff changeset
134 nodeid: revision
2fadf8695f8a Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 209
diff changeset
135 int: revisionIndex (alternatives: revisionNumber, localRevisionNumber)
2fadf8695f8a Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 209
diff changeset
136 BUT, if class name bears Revision, may use 'index' and 'nodeid'
2fadf8695f8a Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 209
diff changeset
137 NOT nodeid because although fileNodeid and changesetNodeid are ok (less to my likening than fileRevision, however), it's not clear how
2fadf8695f8a Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 209
diff changeset
138 to name integer counterpart, just 'index' is unclear, need to denote nodeid and index are related. 'nodeidIndex' would be odd.
2fadf8695f8a Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 209
diff changeset
139 Unfortunately, Revision would be a nice name for a class <int, Nodeid>. As long as I don't want to keep methods to access int/nodeid separately
2fadf8695f8a Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 209
diff changeset
140 and not to stick to Revision struct only (to avoid massive instances of Revision<int,Nodeid> when only one is sufficient), I'll need to name
2fadf8695f8a Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 209
diff changeset
141 these separate methods anyway. Present opinion is that I don't need the object right now (will have to live with RevisionObject or RevisionDescriptor
427
31a89587eb04 FIXMEs: consistent names, throws for commands and their handlers. Use of checked exceptions in hi-level api
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 396
diff changeset
142 once change my mind)
31a89587eb04 FIXMEs: consistent names, throws for commands and their handlers. Use of checked exceptions in hi-level api
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 396
diff changeset
143
31a89587eb04 FIXMEs: consistent names, throws for commands and their handlers. Use of checked exceptions in hi-level api
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 396
diff changeset
144 Handlers (HgStatusHandler, HgManifestHandler, HgChangesetHandler, HgChangesetTreeHandler)
31a89587eb04 FIXMEs: consistent names, throws for commands and their handlers. Use of checked exceptions in hi-level api
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 396
diff changeset
145 methods DO NOT throw CancelledException. cancellation is separate from processing logic. handlers can implements CancelSupport to become a source of cancellation, if necessary
31a89587eb04 FIXMEs: consistent names, throws for commands and their handlers. Use of checked exceptions in hi-level api
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 396
diff changeset
146 methods DO throw HgCallbackTargetException to propagate own errors/exceptions
31a89587eb04 FIXMEs: consistent names, throws for commands and their handlers. Use of checked exceptions in hi-level api
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 396
diff changeset
147 methods are supposed to silently pass HgRuntimeExceptions (although callback implementers may decide to wrap them into HgCallbackTargetException)
31a89587eb04 FIXMEs: consistent names, throws for commands and their handlers. Use of checked exceptions in hi-level api
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents: 396
diff changeset
148 descriptive names for the methods, whenever possible (not bare #next)