If you want to know more about the rationale for the changes and planned future improvements, start by reading from this post until the end of the thread for elaborations of various aspects of the update.If you think you found a bug, post the EXACT QUERY you were using, not a vague description of it.2022-11-25 - Minor Fix- Fixed an issue where some more characters in uploader usernames were not properly searchable.
2022-11-21 - Improvements- Added some significant optimizations for a frequently used search strategy for when multiple name+tag/comment search terms are used and at least one of the name+tag terms has less than 10000 hits. (For some cases this will reduce processing time by >90%).
- The search query parser will now handle various cases where repeated or redundant search qualifiers are used, such as weak:tag:foo or tag:tag:tag:bar.
2022-11-18 - Fixes- The publish date adjustment for galleries created with the old uploaders (predating October 2021) has been completed. This should fix the remaining quirkiness with gallery sort placement as well as with the seek/jump mechanism. Note that these galleries are now considered "published" when the gallery was created rather than when it was actually published, though in most cases this would only shift the date by a few minutes to a few hours.
2022-11-17 - Minor Fixes- When searching for comments, if the search term was too short after being stripped of non-indexable characters, the term was silently ignored. It now properly fails the search with an error message instead.
- Fixed tags hidden under My Tags not being displayed with search results when filters are disabled.
2022-11-16 - Deployment + Fixes- This update is now fully deployed.
- Fixed an issue with how some dynamic stats were generated that only manifested under high load.
2022-11-15 - Minor Fixes- Fixed a bug in favorite searching where, depending on internal state and order of operation, title-only searches could break when multiple terms were used.
- The wording of "default filters" was changed to "custom filters" to make it clearer that it is referring to your personalized/customized tag, uploader and language filters, rather than some global default filter.
2022-11-13 - Minor Fixes- Fixed some more search issues with uploader usernames with leading or trailing underscores as well as multiple consecutive spaces/underscores.
- We now avoid using the /uploader/ shorthand URLs for uploader usernames containing forward slashes since the resulting URLs are broken.
2022-11-11 - Minor Additions/Tweaks- When searching for tags (or titles+tags) where there is just one tag match and you have that tag filtered, the system will now specifically ignore that filter. If you actually want the tag filtered, you can use the title: qualifier.
- The search engine will now stop looking for more results for a page if more than 1000 galleries have been filtered. (This is mostly relevant in edge cases where you are intentionally searching for things you heavily filtered.)
- Fixed search warnings not being displayed for favorite searches.
- Added a setting to remove the "Your default filters removed XX galleries from this page" message.
- Added a new qualifier "weak:" to search for weak tags. This replaces the "Search Low-Power Tags" checkbox. Using weak: in front of a keyword works the same as using tag: except it will search weak tags (<10 power) instead of active (10+) ones.
This change allows for some additional flexibility, since you can now search for various combinations of weak tags and active tags - for example, all galleries with an active parody tag from a particular series, and weak character tags from said series.
Weak tags cannot be used for exclusions or searched in favorites. Additionally, if you are using OR searches, either all or none of the OR terms must use the weak: qualifier.
It is not possible to search for both active and weak instances of the same tag at the same time, or mix normal and weak OR terms in general, since they use different indexes. These are not artificial limitations. The weak tag search is there to aid in tagging and cleanup in order to either get rid of them or make them into active tags, not to get "more results" in casual browsing.
2022-11-07 - Bugfixes - Corrected an issue with tag/name searching in uploader results.
- Corrected glitchy behavior with the new jump/seek selector on the favorite page, as well as an issue with the favorite checkbox selector positioning.
- Corrected seek/jump offsets not being kept if you switched display mode (minimal/compact/etc) right after using it.
- Corrected an issue where some characters weren't properly stripped for name index lookups.
- Corrected an issue where, when encountering terms that were long enough to search but that contained characters that are not valid in tags, it would still attempt to parse it as a tag except with those characters stripped, but if there were less than 3 stripped characters, it would then fail the term as being too short. Terms with characters that cannot be used in tags are now instead parsed as title-only unless a different qualifier is used.
2022-11-06 - Minor Addition - Incorporated a clickable jump/seek selector based on a suggested code addition from
FabulousCupcake.
Note that the date selector uses the built-in browser one, and as such it will use your browser's locale for the date format. (This is automatically translated to the site's date format by your browser.)
2022-11-05 - UpdateNew Feature: Seek/Jump NavigationYou can now do arbitrary jumps (number of days/weeks/months/years) backwards and forwards in search results, as well as arbitrary seeks to a specific date in the search results, by clicking the new Jump/Seek button in the navigation bar and entering a number or date in the box that appears.
Entering a number will make it jump backwards or forwards by the specified number of days, aligned to the start or end of each day. Adding w, m or y to the number will make it jump by that number of weeks, months or years instead. When jumping forwards (Jump >), the jump is based off the posted time of the oldest (bottom-most) gallery on the current page. When jumping backwards (< Jump), the jump is based off the posted time of the newest (topmost) gallery on the current page.
Entering a date with the YYYY-MM-DD will make it seek to that date in the search result (inclusive). Note that the semantics of < Seek and Seek > is somewhat different than < Next/Jump and Next/Jump > - specifically, which button you use determines whether it uses the date as the starting point or the ending point.
You can also use the YYYY-MM shorthand date. In this case, it will start from the first day in the month when going backwards and the last day in the month when going forward. (In other words, in either case it will include that entire month.)
If you only enter a number (not followed by d w m or y) and it is between 2007 and 2099, it will be interpreted as a year. In this case, it will seek to the last day the year when going forwards and the first day of the year when going forwards.
With the YYYY-MM-DD and YYYY-MM formats, the two first Ys can be left out - in other words, 22-11-05 will be interpreted as 2022-11-05.
Seeks and Jumps to galleries posted before October 2021 or so will be wonky until I run a script to make some fixes to the publish timestamps to match the behavior of newer galleries. This correction will happen shorty after the update is fully deployed.Bugfixes- Corrected an issue where galleries were no longer displayed under favorites if they are unavailable.
- Corrected an issue where, when using the /tag/ URLs (such as when clicking tags from the gallery page), it would keep adding additional quotes if you clicked the navigation links.
- Corrected some issues with uploader usernames with underscores and spaces. Note that for syntax and visual ambiguity reasons, underscores and spaces are now considered equivalent in uploader username searches.
- Corrected excluded categories still appearing on the Popular Pane. (They are still supposed to appear with file, gid and favorite searches.)
- Corrected a potential issue where the file/gid searches weren't including expunged galleries even though they were supposed to.
- Corrected an issue with dashes/hyphens in name searches where they weren't properly stripped for the index lookup.
- Corrected an issue where if you were using advanced search and *only* picked a minimum rating, the navigation wouldn't include it, so it would reset between pages.
2022-11-01 - Original PostThis update is a complete rewrite of the gallery search engine, meaning that the usage and behavior of searches has changed in a number of more or less significant ways.
The most significant and visible fundamental change is that the internal segmenting of search results is now done by gallery ID (GID) ranges rather than "pages". While this means jumping to an arbitrary "page" in the result is no longer supported, this is arguably an improvement since you can now jump to an arbitrary GID instead. This also means each page of results will be fixed on the same set of galleries even if it is refreshed after new galleries are added. The page navigation has been reworked to reflect this.
This also fundamentally fixes a long-standing issue where going backwards in the results via the page navigation (as opposed to the browser back button) would often include results from the following page if you were using any form of filtering.
Overall, these changes allow for massive performance improvements (three orders of magnitude in some common cases) as well as significant new functionality (keep reading), and there are no longer any limits to how large a search result can be. Search terms that were previously capped to 100,000 results (like say "big breasts" which is tagged on 350K+ galleries) can now be browsed in their entirety.
OR Tag SearchingOR searching is now supported for tags. (Probably the most requested feature of all time.)
To use OR tag searching, prefix the keyword with ~
Example: ~yuri ~"females only" ~f:sole_female$
Specifically, if you have at least two keywords with the OR operator, the search will return all galleries that contain at least one of the tags in question. Using the OR operator will imply the tag: qualifier. If you use it with any other qualifier that isn't a tag namespace, the OR operator is ignored and the keyword will run as a standard AND search.
Using OR searching will "consume" one of the allowed inclusion search terms. If you only specify one OR term, it will be treated as an AND tag-only term. There are no specific limits to how many OR terms you can specify, though it will still be practically limited by the search string length cap. It will additionally bail if the overall OR search is matching more than 1000 tags internally, so consider using exact tags to allow for more terms.
Wildcards cannot be used for OR terms.
Exclude-Only SearchingYou can now do exclude-only searches. (Probably the other most requested feature of all time.)
Example: -yaoi -m:footjob -"glory hole" -sole_male$ -title:"novel ai" -comment:pixiv -uploader:BigDickDave69
You can use up to 10 comment+favnote exclusion terms and 10 tag (or hybrid tag+name) exclusion terms in a search.
The gid, uploader, uploaduid and title qualifiers are not specifically limited for exclusions, though they will still be practically limited by the search string length cap.
Tag WatchingThe time cutoff for the tag watching page has been significantly increased:
- For non-donators, the cutoff was increased from one week to at least one month. The exact cutoff depends on internal segmenting, the rate new galleries are added, and the total index count for your watched tags. It will generally be somewhere between one and six months.
- For donators (gold star+), there are no longer any cutoffs. In other words, you can browse and search watched tags back to the launch of the site if you want. Note however that searching for terms that have few matches in your watched tags may produce fewer than expected results per page.
UI => Search Syntax ChangesThe "Search Gallery Name", "Search Gallery Tags" and "Search Gallery Description" checkboxes as well as the corresponding search checkboxes on the Favorite page have all been removed; this functionality is now part of the search syntax instead.
By default, each search term will be interpreted as a hybrid tag+title search, and will match the gallery name (both english/romaji and japanese) as well as the gallery tags.
To only match gallery names, prefix the term with the title: qualifier
* Example: title:keyword -title:"string of keywords"
To only match gallery tags, prefix the term with a tag namespace, or tag: for all namespaces, or use the exact tag operator $, or use the OR operator ~
* Example: f:"big breasts" tag:group -futanari$ ~twintails
To search uploader gallery comments, prefix the term with the comment: qualifier
* Example: comment:"insightful uploader musings" -comment:"less insightful ones"
Favorite searches only: To search favorite notes, prefix the term with the favnote: qualifier
* Example: favnote:"this is my favorite gallery" -favnote:"on the citadel"
Note that this means combined tag+name+comment/favnote search terms are no longer supported.
Search Parsing Changes- When doing unquoted searches with unqualified short and/or non-indexable words (a, an, ai, to, the, and, so, on, and so on), as well as some common adjectives (small, big, huge, gigantic), they will now be automatically appended or combined with the following priority:
* If there is a non-qualified search term immediately following the short word, it will be combined with that one.
For example, searching for "a dick in a box" without quotes will be searched as "a dick" "in a box". Everyone's new favorite "ai generated" without quotes will be searched as if it had quotes.
* If there is a non-qualified search term immediately preceding the short word, it will be combined with that one.
For example, searching for "novel ai" without quotes will be searched as if it had quotes.
* If there are only short words, they will be combined into one quoted word if there is more than one.
For example, searching for "ex on the ox" without quotes will be searched as if it had quotes.
* If there is just one short word, or the short words are between qualified search terms, it will be searched as an exact tag. A warning is printed in this case.
For example, searching for "9s artist:a 2b" without quotes will be searched as "tag:9s$" "artist:a$" "tag:2b$"
To combine short words with a different priority, use quotes or underscores. ("word1 word2 word3" and word1_word2_word3 are equivalent.)
Note that there is a single two-character word "3d" that was specifically whitelisted for title searches, but it is not an indexable word for comment searches so it cannot be used for that.
- Support for single-character wildcarding was dropped, and the * wildcard can now only be used at the end of keywords. Title, comment and favnote searches are implicitly wildcarded for indexing reasons, so adding a wildcard will only affect tag searching.
Search Term LimitsExclusions and inclusions now have separate limits. A query can have up to 5 name+tag inclusion terms, 10 name+tag exclusion terms, and 10 comment+favnote inclusion+exclusion terms.
For both inclusions and exclusions, uploader:, uploadid: and gid: terms aren't specifically limited, but would still be limited by the max length of the search string (200 chars).
For exclusions, title: terms are also not limited.
GID SearchingYou can now use the gid: search qualifier to search (publicly visible) galleries by Gallery ID. If you search a GID that has been replaced, it will list the current gallery instead.
Inclusion gid: terms cannot be combined with keyword searches or used in watch mode. This does not apply to exclusion terms. If used for exclusion, it will not exclude any galleries that replaced the provided GID.
You can specify multiple gid: terms in the same query for an implicit OR search.
This search mode will show both normal and expunged galleries. Default tag, language and uploader filters are automatically disabled for these searches.
Result CountingFor performance reasons, the search engine will no longer count the exact number of results in large result sets; instead result counts will usually be approximated based on various metrics. It will say "about" if the count is an estimate.
For complex multi-term searches with large result sets, it may not have enough information to give a reasonable estimate. In these cases, rather than showing a potentially wildly inaccurate one, it will just show "many". This only affects the count readout, navigation for these search results works the same as for smaller ones.
Smaller result sets (i.e. those that fit on one page) should return the exact count in all cases. Filtered galleries are included in this count, to match the behavior for estimates.
The page range filter, exclusion search terms and default language/uploader/tag filters will not generally be reflected in approximate result count estimates.
If you use the category, rating or torrent filters, it will use precomputed adjustment factors to correct the estimate. For some searches this estimate may be fairly inaccurate, say if you search for terms that are mostly applicable for specific categories then unselect other categories.
Result counts are not displayed in favorite searches or on the popular page. In the former case, it would only be able to display one for small result sets, and in the latter, it's all one page of results anyway. You can however still see the total for each favorite category.
Tag Search Behavior- Tag searching now defaults to matching on word boundaries to reduce unwanted matches. In other words, searching for "tag:mana" will still match all tags that have "mana" as one of the words (like "secret of mana" [=> seiken densetsu] or "mana inuyama"), but it does not match "manabe", "manatsu", "manami" and so on. Searching for "tag:mana*" will restore the previous behavior.
- If there are too many tag matches for a term, it will now automatically rerun the term as an exact search instead of erroring out.
- Selecting "Search Low-Power Tags" will now only search low-power tags. This mode will also not do hybrid title/tag searches, so if a term is left unqualified (i.e. "big breasts") it will only search the tag. You can still search titles by using the title: qualifier.
- The "Search Downvoted Tags" option was removed.
Comment Search BehaviorUploader comments and favorite notes are now searched using the comment: and favnote: qualifiers. favnote: is only available in favorite searches.
The way comments are indexed have been fundamentally changed, and there will be some subtle differences between normal text searches and favorite + exclusion-only text searches, since the former will usually use indexes while the latter do not.
Most notably, some otherwise-searchable common words (like "this" and "with") are not comment-searchable when the index is used but will be searchable when it is not. Also, when the index is used, words starting with these short words will not be matched unless you search for that exactly (like "with" and "withhold").
Furthermore, when the index is used it will only find word matches that start with the string, but when it's not it will also find matches that have the string as part of a word.
The index is only used for normal inclusion comment searches, but even for those it may not be used for some words and searches depending on various internal factors and thresholds, so you should not rely on this behavior.
Other Changes- Various issues and limitations with favorite searches have been resolved. Searches in favorites should now behave the same as normal searches except for the noted comment/favnote search behavior.
- Exclusion searches for titles, tags (except for exact tags), comments and favnotes will now match any part of a word; i.e. -"laughter" will exclude "slaughter".
- Indexes are now generally updated immediately when the underlying data changes, which should reduce the delay until changes are reflected in searches. (Due to caching, there can still be some delay.)
- Whenever a gallery title has a mixed string of unicode and latin characters without any spaces or other breakable characters, like romaji漢字moreromaji, it would previously only be searchable with terms starting with "rom...", "漢字..." and "字mo..". It is now also searchable for "mor...".
- The "Your default filters removed..." message is now more consistent and specifically counts all galleries filtered by your default uploader, tag and language search filter settings. (When using both filters and exclusions and a gallery would have been removed by both, it is counted as an exclusion.)
- Selecting "Search Expunged Galleries" will now only search expunged galleries in normal searches. (File searches, GID searches and favorite searches will always display both normal and expunged galleries.)
- File searches can no longer be combined with keyword searches or other filters. This search mode will show both normal and expunged galleries. Default tag, language and uploader filters are now automatically disabled for these searches.
- Excessively narrow page range filters (min > 1000, max < 10, min/max > 0.8, min-max < 20) are no longer allowed.
- The max number of results per page is now 100. Paging Enlargement III was removed and will be refunded Soon™.
Known Issues/Quirks/Complaints/Workingasintendedisms-
You may sometimes see galleries appear out-of-order when going from one page to the next - in other words, going by the posted date, you would have expected the gallery to be on another page. This mostly applies to older galleries that predated the latest uploader update. This is because, prior to said update, a gallery could have been assigned a GID long before it was actually posted. This might eventually be addressed after a future redesign of the gallery metadata tables by renumbering galleries that are significantly out of order.- If you are browsing from the end of a search results (backwards browsing mode) all the way to the start, the "last" page in the result (the one with the oldest results) will have a full page of results and the "first" page in the result (with the most recent ones) will have the remainder. This is working as intended.
- If you go backwards in a search result and get to the "first" page (with the most recent results), the "<< First" link will be lit up to flip back to the first page in forwards browsing mode even if there are no further pages and "< Prev" is disabled. This is working as intended.
- If you search for several AND inclusion tag terms (or hybrid title+tag terms), where every term has many results (~10K+) and some have a lot of results (~100K+), and there is a low degree of overlap between the tags, you may see fewer than expected results per page. You can usually use exact tags to avoid this.
- In general, "results per page" should be considered a target rather than a guarantee. For example, as an internal optimization, if a result page is at least 95% full after a search cycle, it may return with a couple of results "missing" instead of starting another search cycle (which can be expensive). This does not mean it's withholding results from you, you'll find them on the next page.
- "But $tool/$script needs the ability to access arbitrary pages in search results and/or accurate search result counts" is out of scope/wontfix. Update it to use the new gid-based navigation. And no, the old search engine was not "working just fine the way it was", it was failing on an ever-increasing number of searches due to running out of RAM when building results and badly needed a fundamental redesign to cope with the ever-increasing size of the index.
This is likely the most complicated update in the site's history, so there will probably be bugs and other subtle behavioral changes. Please don't hestiate to ask whether something is intentional if it's not noted in these patch notes.