Mercurial > hg4j
annotate design.txt @ 385:6150555eb41d
HgInvalidRevisionException for svn imported repositories (changeset 0 references nullid manifest)
author | Artem Tikhomirov <tikhomirov.artem@gmail.com> |
---|---|
date | Mon, 13 Feb 2012 14:19:36 +0100 |
parents | 2fadf8695f8a |
children | 0ae53c32ecef |
rev | line source |
---|---|
1
a3576694a4d1
Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff
changeset
|
1 FileStructureWalker (pass HgFile, HgFolder to callable; which can ask for VCS data from any file) |
a3576694a4d1
Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff
changeset
|
2 External uses: user browses files, selects one and asks for its history |
a3576694a4d1
Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff
changeset
|
3 Params: tip/revision; |
a3576694a4d1
Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff
changeset
|
4 Implementation: manifest |
a3576694a4d1
Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff
changeset
|
5 |
a3576694a4d1
Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff
changeset
|
6 Log --rev |
a3576694a4d1
Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff
changeset
|
7 Log <file> |
2
08db726a0fb7
Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
1
diff
changeset
|
8 HgDataFile.history() or Changelog.history(file)? |
08db726a0fb7
Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
1
diff
changeset
|
9 |
08db726a0fb7
Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
1
diff
changeset
|
10 |
08db726a0fb7
Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
1
diff
changeset
|
11 Changelog.all() to return list with placeholder, not-parsed elements (i.e. read only compressedLen field and skip to next record), so that |
08db726a0fb7
Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
1
diff
changeset
|
12 total number of elements in the list is correct |
1
a3576694a4d1
Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff
changeset
|
13 |
a3576694a4d1
Repository detection from local/specified directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
diff
changeset
|
14 hg cat |
2
08db726a0fb7
Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
1
diff
changeset
|
15 Implementation: logic to find file by name in the repository is the same with Log and other commands |
08db726a0fb7
Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
1
diff
changeset
|
16 |
08db726a0fb7
Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
1
diff
changeset
|
17 |
08db726a0fb7
Shaping out low-level Hg structures
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
1
diff
changeset
|
18 Revlog |
4
aa1912c70b36
Fix offset issue for inline revlogs. Commandline processing.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
2
diff
changeset
|
19 What happens when big entry is added to a file - when it detects it can't longer fit into .i and needs .d? Inline flag and .i format changes? |
aa1912c70b36
Fix offset issue for inline revlogs. Commandline processing.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
2
diff
changeset
|
20 |
22
603806cd2dc6
Status of local working dir against non-tip base revision
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
21
diff
changeset
|
21 What's hg natural way to see nodeids of specific files (i.e. when I do 'hg --debug manifest -r 11' and see nodeid of some file, and |
603806cd2dc6
Status of local working dir against non-tip base revision
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
21
diff
changeset
|
22 then would like to see what changeset this file came from)? |
4
aa1912c70b36
Fix offset issue for inline revlogs. Commandline processing.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
2
diff
changeset
|
23 |
aa1912c70b36
Fix offset issue for inline revlogs. Commandline processing.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
2
diff
changeset
|
24 ---------- |
6
5abe5af181bd
Ant script to build commands and run sample
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
5
diff
changeset
|
25 + support patch from baseRev + few deltas (although done in a way patches are applied one by one instead of accumulated) |
4
aa1912c70b36
Fix offset issue for inline revlogs. Commandline processing.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
2
diff
changeset
|
26 + command-line samples (-R, filenames) (Log & Cat) to show on any repo |
6
5abe5af181bd
Ant script to build commands and run sample
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
5
diff
changeset
|
27 +buildfile + run samples |
9
d6d2a630f4a6
Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
6
diff
changeset
|
28 *input stream impl + lifecycle. Step forward with FileChannel and ByteBuffer, although questionable accomplishment (looks bit complicated, cumbersome) |
14
442dc6ee647b
Show correct time
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
11
diff
changeset
|
29 + dirstate.mtime |
43
1b26247d7367
Calculate result length of the patch operarion, when unknown
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
41
diff
changeset
|
30 +calculate sha1 digest for file to see I can deal with nodeid. +Do this correctly (smaller nodeid - first) |
18
02ee376bee79
status operation against current working directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
17
diff
changeset
|
31 *.hgignored processing |
25
da8ccbfae64d
Reflect Nodeid's array is exactly 20
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
22
diff
changeset
|
32 +Nodeid to keep 20 bytes always, Revlog.Inspector to get nodeid array of meaningful data exact size (nor heading 00 bytes, nor 12 extra bytes from the spec) |
26
71a9ba42cee8
Memory-mapped files for bigger files. Defect reading number of bytes greater than size of the buffer fixed
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
25
diff
changeset
|
33 +DataAccess - implement memory mapped files, |
49
26e3eeaa3962
branch and user filtering for log operation
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
43
diff
changeset
|
34 +Changeset to get index (local revision number) |
60
613c936d74e4
Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
55
diff
changeset
|
35 +RevisionWalker (on manifest) and WorkingCopyWalker (io.File) talking to ? and/or dirstate (StatusCollector and WCSC) |
613c936d74e4
Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
55
diff
changeset
|
36 +RevlogStream - Inflater. Perhaps, InflaterStream instead? branch:wrap-data-access |
613c936d74e4
Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
55
diff
changeset
|
37 +repo.status - use same collector class twice, difference as external code. add external walker that keeps collected maps and use it in Log operation to give files+,files- |
78
c25c5c348d1b
Skip metadata in the beginning of a file content. Parse metadata, recognize copies/renames
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
64
diff
changeset
|
38 + strip \1\n metadata out from RevlogStream |
84
08754fce5778
updated design questions
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
78
diff
changeset
|
39 + hash/digest long names for fncache |
169
8c8e3f372fa1
Towards initial clone: refactor HgBundle to provide slightly higher-level structure of the bundle
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
136
diff
changeset
|
40 +Strip off metadata from beg of the stream - DataAccess (with rebase/moveBaseOffset(int)) would be handy |
8c8e3f372fa1
Towards initial clone: refactor HgBundle to provide slightly higher-level structure of the bundle
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
136
diff
changeset
|
41 + hg status, compare revision and local file with kw expansion and eol extension |
8c8e3f372fa1
Towards initial clone: refactor HgBundle to provide slightly higher-level structure of the bundle
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
136
diff
changeset
|
42 |
8c8e3f372fa1
Towards initial clone: refactor HgBundle to provide slightly higher-level structure of the bundle
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
136
diff
changeset
|
43 write code to convert inlined revlog to .i and .d |
60
613c936d74e4
Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
55
diff
changeset
|
44 |
613c936d74e4
Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
55
diff
changeset
|
45 delta merge |
613c936d74e4
Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
55
diff
changeset
|
46 DataAccess - collect debug info (buffer misses, file size/total read operations) to find out better strategy to buffer size detection. Compare performance. |
41
858d1b2458cb
Check integrity for bundle changelog. Sort nodeids when calculating hash
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
33
diff
changeset
|
47 |
169
8c8e3f372fa1
Towards initial clone: refactor HgBundle to provide slightly higher-level structure of the bundle
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
136
diff
changeset
|
48 |
128
44b97930570c
Introduced ChangelogHelper to look up changesets files were modified in
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
93
diff
changeset
|
49 Parameterize StatusCollector to produce copy only when needed. And HgDataFile.metadata perhaps should be moved to cacheable place? |
44b97930570c
Introduced ChangelogHelper to look up changesets files were modified in
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
93
diff
changeset
|
50 |
18
02ee376bee79
status operation against current working directory
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
17
diff
changeset
|
51 Status operation from GUI - guess, usually on a file/subfolder, hence API should allow for starting path (unlike cmdline, seems useless to implement include/exclide patterns - GUI users hardly enter them, ever) |
60
613c936d74e4
Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
55
diff
changeset
|
52 -> recently introduced FileWalker may perhaps help solving this (if starts walking from selected folder) for status op against WorkingDir? |
9
d6d2a630f4a6
Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
6
diff
changeset
|
53 |
84
08754fce5778
updated design questions
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
78
diff
changeset
|
54 ? Can I use fncache (names from it - perhaps, would help for Mac issues Alex mentioned) |
08754fce5778
updated design questions
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
78
diff
changeset
|
55 ? Does fncache lists both .i and .d (iow, is it true hashed <long name>.d is different from hashed <long name>.i) |
08754fce5778
updated design questions
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
78
diff
changeset
|
56 |
15
865bf07f381f
Basic hgignore handling
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
14
diff
changeset
|
57 ??? encodings of fncache, .hgignore, dirstate |
16
254078595653
Print manifest nodeid
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
15
diff
changeset
|
58 ??? http://mercurial.selenic.com/wiki/Manifest says "Multiple changesets may refer to the same manifest revision". To me, each changeset |
254078595653
Print manifest nodeid
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
15
diff
changeset
|
59 changes repository, hence manifest should update nodeids of the files it lists, effectively creating new manifest revision. |
15
865bf07f381f
Basic hgignore handling
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
14
diff
changeset
|
60 |
64
19e9e220bf68
Convenient commands constitute hi-level API. org.tmatesoft namespace, GPL2 statement
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
61
diff
changeset
|
61 ? subrepos in log, status (-S) and manifest commands |
19e9e220bf68
Convenient commands constitute hi-level API. org.tmatesoft namespace, GPL2 statement
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
61
diff
changeset
|
62 |
197
3a7696fb457c
Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
169
diff
changeset
|
63 ? when p1 == -1, and p2 != -1, does HgStatusCollector.change() give correct result? |
93
d55d4eedfc57
Switch to Path instead of String in filenames returned by various status operations
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
84
diff
changeset
|
64 |
64
19e9e220bf68
Convenient commands constitute hi-level API. org.tmatesoft namespace, GPL2 statement
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
61
diff
changeset
|
65 Commands to get CommandContext where they may share various caches (e.g. StatusCollector) |
93
d55d4eedfc57
Switch to Path instead of String in filenames returned by various status operations
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
84
diff
changeset
|
66 Perhaps, abstract classes for all Inspectors (i.e. StatusCollector.Inspector) for users to use as base classes to protect from change? |
64
19e9e220bf68
Convenient commands constitute hi-level API. org.tmatesoft namespace, GPL2 statement
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
61
diff
changeset
|
67 |
205
ffc5f6d59f7e
HgLogCommand.Handler is used in few places, pull up to top-level class, HgChangesetHandler
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
202
diff
changeset
|
68 -cancellation and progress support |
ffc5f6d59f7e
HgLogCommand.Handler is used in few places, pull up to top-level class, HgChangesetHandler
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
202
diff
changeset
|
69 -timestamp check for revlog to recognize external changes |
209
9ce3b26798c4
Few branches (distinct BranchChains from distinct heads) may end up with same nodes. Building BC structure fixed to reuse chain elements
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
205
diff
changeset
|
70 -HgDate or any other better access to time info |
9ce3b26798c4
Few branches (distinct BranchChains from distinct heads) may end up with same nodes. Building BC structure fixed to reuse chain elements
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
205
diff
changeset
|
71 -(low) RepositoryComparator#calculateMissingBranches may query branches for the same head more than once |
9ce3b26798c4
Few branches (distinct BranchChains from distinct heads) may end up with same nodes. Building BC structure fixed to reuse chain elements
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
205
diff
changeset
|
72 (when there are few heads that end up with common nodes). e.g hg4j revision 7 against remote hg4j revision 206 |
205
ffc5f6d59f7e
HgLogCommand.Handler is used in few places, pull up to top-level class, HgChangesetHandler
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
202
diff
changeset
|
73 |
9
d6d2a630f4a6
Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
6
diff
changeset
|
74 >>>> Effective file read/data access |
d6d2a630f4a6
Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
6
diff
changeset
|
75 ReadOperation, Revlog does: repo.getFileSystem().run(this.file, new ReadOperation(), long start=0, long end = -1) |
d6d2a630f4a6
Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
6
diff
changeset
|
76 ReadOperation gets buffer (of whatever size, as decided by FS impl), parses it and then reports if needs more data. |
d6d2a630f4a6
Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
6
diff
changeset
|
77 This helps to ensure streams are closed after reading, allows caching (if the same file (or LRU) is read few times in sequence) |
d6d2a630f4a6
Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
6
diff
changeset
|
78 and allows buffer management (i.e. reuse. Single buffer for all reads). |
d6d2a630f4a6
Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
6
diff
changeset
|
79 Scheduling multiple operations (in future, to deal with writes - single queue for FS operations - no locks?) |
d6d2a630f4a6
Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
6
diff
changeset
|
80 |
60
613c936d74e4
Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
55
diff
changeset
|
81 WRITE: Need to register instances that cache files (e.g. dirstate or .hgignore) to FS notifier, so that cache may get cleared if the file changes (i.e. WriteOperation touches it). |
613c936d74e4
Log operation to output mode detailed (added, removed) files
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
55
diff
changeset
|
82 |
9
d6d2a630f4a6
Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
6
diff
changeset
|
83 File access: |
d6d2a630f4a6
Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
6
diff
changeset
|
84 * NIO and mapped files - should be fast. Although seems to give less control on mem usage. |
21
e929cecae4e1
Refactor to move revlog content to base class
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
20
diff
changeset
|
85 * Regular InputStreams and chunked stream on top - allocate List<byte[]>, each (but last) chunk of fixed size (depending on initial file size) |
9
d6d2a630f4a6
Access to underlaying file data wrapped into own Access object, implemented with FileChannel and ByteBuffer
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
6
diff
changeset
|
86 |
129
645829962785
core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
128
diff
changeset
|
87 |
645829962785
core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
128
diff
changeset
|
88 * API |
645829962785
core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
128
diff
changeset
|
89 + rename in .core Cset -> HgChangeset, |
645829962785
core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
128
diff
changeset
|
90 + rename in .repo Changeset to HgChangelog.Changeset, Changeset.Inspector -> HgChangelog.Inspector |
645829962785
core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
128
diff
changeset
|
91 - CommandContext |
645829962785
core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
128
diff
changeset
|
92 - Data access - not bytes, but ByteChannel |
645829962785
core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
128
diff
changeset
|
93 - HgRepository constants (TIP, BAD, WC) to HgRevisions enum |
131
aa1629f36482
Renamed .core classes to start with Hg prefix
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
129
diff
changeset
|
94 - RevisionMap to replace TreeMap<Integer, ?> |
aa1629f36482
Renamed .core classes to start with Hg prefix
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
129
diff
changeset
|
95 + .core.* rename to Hg* |
aa1629f36482
Renamed .core classes to start with Hg prefix
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
129
diff
changeset
|
96 + RepositoryTreeWalker to ManifestCommand to match other command classes |
aa1629f36482
Renamed .core classes to start with Hg prefix
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
129
diff
changeset
|
97 |
aa1629f36482
Renamed .core classes to start with Hg prefix
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
129
diff
changeset
|
98 * defects |
136
947bf231acbb
Strip off comments in config file
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
131
diff
changeset
|
99 + ConfigFile to strip comments from values (#) |
129
645829962785
core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
128
diff
changeset
|
100 |
26
71a9ba42cee8
Memory-mapped files for bigger files. Defect reading number of bytes greater than size of the buffer fixed
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
25
diff
changeset
|
101 <<<<< |
197
3a7696fb457c
Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
169
diff
changeset
|
102 Performance. |
3a7696fb457c
Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
169
diff
changeset
|
103 after pooling/caching in HgStatusCollector and HgChangeset |
3a7696fb457c
Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
169
diff
changeset
|
104 hg log --debug -r 0:5000 and same via Log/HgLogCommand: approx. 220 seconds vs 279 seconds. Mem. cons. 20 vs 80 mb. |
3a7696fb457c
Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
169
diff
changeset
|
105 after further changes in HgStatusCollector (to read ahead 5 elements, 50 max cache, fixed bug with -1) - hg4j dumps 5000 in |
3a7696fb457c
Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
169
diff
changeset
|
106 93 seconds, memory consumption about 50-56 Mb |
3a7696fb457c
Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
169
diff
changeset
|
107 |
198
33a7d76f067b
Performance optimization: reduce memory to keep revlog cached info
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
197
diff
changeset
|
108 IndexEntry(int offset, int baseRevision) got replaced with int[] arrays (offsets - optional) |
33a7d76f067b
Performance optimization: reduce memory to keep revlog cached info
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
197
diff
changeset
|
109 for 69338 revisions from cpython repo 1109408 bytes reduced to 277368 bytes with the new int[] version. |
33a7d76f067b
Performance optimization: reduce memory to keep revlog cached info
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
197
diff
changeset
|
110 I.e. total for changelog+manifest is 1,5 Mb+ gain |
33a7d76f067b
Performance optimization: reduce memory to keep revlog cached info
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
197
diff
changeset
|
111 |
200
114c9fe7b643
Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
198
diff
changeset
|
112 ParentWalker got arrays (Nodeid[] and int[]) instead of HashMap/LinkedHashSet. This change saves, per revision: |
114c9fe7b643
Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
198
diff
changeset
|
113 was: LinkedHashSet$Entry:32 + HashMap$Entry:24 + HashMap.entries[]:4 (in fact, up to 8, given entries size is power of 2, and 69000+ |
114c9fe7b643
Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
198
diff
changeset
|
114 elements in cpython test repo resulted in entries[131072]. |
114c9fe7b643
Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
198
diff
changeset
|
115 total: (2 HashMaps) 32+(24+4)*2 = 88 bytes |
114c9fe7b643
Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
198
diff
changeset
|
116 now: Nodeid[]:4 , int[]:4 bytes per entry. arrays of exact revlog size |
114c9fe7b643
Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
198
diff
changeset
|
117 total: (4 Nodeid[], 1 int[]) 4*4 + 4 = 20 bytes |
114c9fe7b643
Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
198
diff
changeset
|
118 for cpython test repo with 69338 revisions, 1 387 224 instead of 4 931 512 bytes. Mem usage (TaskManager) ~50 Mb when 10000 revs read |
114c9fe7b643
Performance optimization: reduce memory ParentWalker hogs
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
198
diff
changeset
|
119 |
197
3a7696fb457c
Investigate optimization options to allow fast processing of huge repositories. Fix defect in StatusCollector that lead to wrong result comparing first revision to empty repo (-1 to 0), due to same TIP constant value
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
169
diff
changeset
|
120 <<<<< |
26
71a9ba42cee8
Memory-mapped files for bigger files. Defect reading number of bytes greater than size of the buffer fixed
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
25
diff
changeset
|
121 |
71a9ba42cee8
Memory-mapped files for bigger files. Defect reading number of bytes greater than size of the buffer fixed
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
25
diff
changeset
|
122 Tests: |
61
fac8e7fcc8b0
Simple test framework - capable of parsing Hg cmdline output to compare with Java result
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
60
diff
changeset
|
123 DataAccess - readBytes(length > memBufferSize, length*2 > memBufferSize) - to check impl is capable to read huge chunks of data, regardless of own buffer size |
fac8e7fcc8b0
Simple test framework - capable of parsing Hg cmdline output to compare with Java result
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
60
diff
changeset
|
124 |
129
645829962785
core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
128
diff
changeset
|
125 ExecHelper('cmd', OutputParser()).run(). StatusOutputParser, LogOutputParser extends OutputParser. construct java result similar to that of cmd, compare results |
645829962785
core.Cset renamed to HgChangeset; repo.Changeset moved into HgChangelog
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
128
diff
changeset
|
126 |
202
706bcc7cfee4
Basic test for HgIncomingCommand. Fix RepositoryComparator for cases when whole repository is unknown. Respect freshly initialized (empty) repositories in general.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
200
diff
changeset
|
127 Need better MethodRule than ErrorCollector for tests run as java app (to print not only MultipleFailureException, but distinct errors) |
706bcc7cfee4
Basic test for HgIncomingCommand. Fix RepositoryComparator for cases when whole repository is unknown. Respect freshly initialized (empty) repositories in general.
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
200
diff
changeset
|
128 Also consider using ExternalResource and TemporaryFolder rules. |
367
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
129 |
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
130 |
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
131 ================= |
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
132 Naming: |
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
133 nodeid: revision |
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
134 int: revisionIndex (alternatives: revisionNumber, localRevisionNumber) |
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
135 BUT, if class name bears Revision, may use 'index' and 'nodeid' |
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
136 NOT nodeid because although fileNodeid and changesetNodeid are ok (less to my likening than fileRevision, however), it's not clear how |
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
137 to name integer counterpart, just 'index' is unclear, need to denote nodeid and index are related. 'nodeidIndex' would be odd. |
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
138 Unfortunately, Revision would be a nice name for a class <int, Nodeid>. As long as I don't want to keep methods to access int/nodeid separately |
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
139 and not to stick to Revision struct only (to avoid massive instances of Revision<int,Nodeid> when only one is sufficient), I'll need to name |
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
140 these separate methods anyway. Present opinion is that I don't need the object right now (will have to live with RevisionObject or RevisionDescriptor |
2fadf8695f8a
Use 'revision index' instead of the vague 'local revision number' concept in the API
Artem Tikhomirov <tikhomirov.artem@gmail.com>
parents:
209
diff
changeset
|
141 once change my mind) |